The best library to plot data

Requires the ggalluvial package, on top of ggplot2 (via Mathieu Perona).

Alluvial diagrams are like this (my own production):

You can also use networkD3 with the function sankeyNetwork (via Antoine Belgodere): more here. An interactive example can be found here (via Pauline R.).

Simple bar plot

Suppose you have those data in df:

year n
2017 381
2018 315

You want to plot how many n are in each year in a simple bar plot.

plot <- ggplot(df, aes(x = year, y = n)) +
  geom_bar(stat = "identity")

Do NOT forget to add stat = "identity" in geom_bar(). It tells ggplot2 to “count” n.

Fig. 1: Here's the result of the aforementioned code. Of course you can customize its look if you are not happy with the colours, and so on.

Percentage bar plot

plot <- ggplot(df, aes(x = var1, y = (..count..)/sum(..count..))) + 

Source and more details.

More sophisticated code, not tested yet:

ggplot(df, aes(x= var1,  group=var_group)) + 
  geom_bar(aes(y = ..prop.., fill = factor(..x..)), stat="count") +
    geom_text(aes( label = scales::percent(..prop..),
                   y= ..prop.. ), stat= "count", vjust = -.5) +
    labs(y = "Percent", fill="var1") +
    facet_grid(~var_group) +
    scale_y_continuous(labels = scales::percent)

Requires the library scales.

(Same source)

Plot several series in one single bar plot

source (in French

ggplot(df, aes(x=var1, y=(..count..), fill=var2)) +
  geom_bar(stat="identity", position="dodge")

If the variable used with fill is kind of categorical (for instance it's years), beware to be sure it's a factor (as.factor(var2)). Otherwise it may be treated as a continuous variable, and displayed incorrectly.

(..count..) means that ggplot2 will display the count of var1 for each value of var2.

Enter your comment. Wiki syntax is allowed:
If you can't read the letters on the image, download this .wav file to get them read to you.
  • Last modified: 8 weeks ago
  • by Olivier Simard-Casanova