contact@statdoe.com

Tutorials

Barplots for the Results of a Three-Factor Experiment


This tutorial presents some suggestions on how to organize and show results of a three factor experiment using a barplot.

If you prefer a video-tutorial, you can watch the tutorial Barplots for the Results of a Three-Factor Experiment at my YouTube channel.

We are going to use the data of an experiment that measured the the cold tolerance of the grass species Echinochloa crus-galli.1 The factors studied were the Type, with the levels ‘Quebec’ and ‘Mississippi’ giving the origin of the plant, the Treatment, ‘nonchilled’ and ‘chilled’, and the ambient CO2 concentration (mL L-1). The response variable is the CO2 uptake (µmol m-2 s-1).

We are going to start by loading the appropriate libraries: the ggplot2 and the ggthemes for the plots, the dplyr for data manipulation, the multcompView for the compact letter display, and the egg to label the plots.

# loading the appropriate libraries
library(ggplot2)
library(ggthemes)
library(dplyr)
library(multcompView)
library(egg)


Before starting the analysis and the plots, let’s take a look at the data set.

# data file
str(CO2)
## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame':   84 obs. of  5 variables:
##  $ Plant    : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
##  $ Type     : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
##  $ conc     : num  95 175 250 350 500 675 1000 95 175 250 ...
##  $ uptake   : num  16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
##  - attr(*, "formula")=Class 'formula'  language uptake ~ conc | Plant
##   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##  - attr(*, "outer")=Class 'formula'  language ~Treatment * Type
##   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##  - attr(*, "labels")=List of 2
##   ..$ x: chr "Ambient carbon dioxide concentration"
##   ..$ y: chr "CO2 uptake rate"
##  - attr(*, "units")=List of 2
##   ..$ x: chr "(uL/L)"
##   ..$ y: chr "(umol/m^2 s)"
head(CO2)
##   Plant   Type  Treatment conc uptake
## 1   Qn1 Quebec nonchilled   95   16.0
## 2   Qn1 Quebec nonchilled  175   30.4
## 3   Qn1 Quebec nonchilled  250   34.8
## 4   Qn1 Quebec nonchilled  350   37.2
## 5   Qn1 Quebec nonchilled  500   35.3
## 6   Qn1 Quebec nonchilled  675   39.2


The output of the str() and head() functions shows that the data set has 5 columns, the first, ‘Plant’ is the plant identification, then we have the columns for ‘Type’, ‘Treatment’’ and conc (concentration), and finally the response variable uptake. Type and Treatment are defined as Factor and conc and uptake are numeric variables.

Type and Treatment were tested at two levels each, and the CO2 concentration was tested at seven levels: 95, 175, 250, 350, 500, 675, and 1000 mL/L.


Analysis and organisation of the data

Before diving into the plots, we need to organize the data for it.

The next code chunk runs the analysis of variance (aov()) and Tukey’s test (TukeyHSD()), and uses the multcompLetters4() function to get the letters indicating significant differences.

Finally, we are going to build a table with the mean, the standard deviation and the letters indications significant differences for each treatment. The information on this table is the one that will be used to build the plots.

If you need a detailed explanation on this code, you can check the tutorial on Two-Way ANOVA.

# analysis of variance
anova <- aov(uptake ~ factor(conc)*Type*Treatment, data = CO2)
summary(anova)
##                             Df Sum Sq Mean Sq F value   Pr(>F)    
## factor(conc)                 6   4069     678  80.548  < 2e-16 ***
## Type                         1   3366    3366 399.758  < 2e-16 ***
## Treatment                    1    988     988 117.368 2.32e-15 ***
## factor(conc):Type            6    374      62   7.412 7.24e-06 ***
## factor(conc):Treatment       6    101      17   1.999   0.0811 .  
## Type:Treatment               1    226     226  26.812 3.15e-06 ***
## factor(conc):Type:Treatment  6    112      19   2.216   0.0547 .  
## Residuals                   56    471       8                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Tukey's test and compact letter display
Tukey <- TukeyHSD(anova)
cld <- multcompLetters4(anova, Tukey)

# Table with the mean, the standard deviation and the letters indications significant differences for each treatment
dt <- group_by(CO2, conc, Type, Treatment) %>%
  summarise(uptake_mean=mean(uptake), sd=sd(uptake)) %>%
  arrange(desc(uptake_mean))
cld <- as.data.frame.list(cld$`factor(conc):Type:Treatment`)
dt$Tukey <- cld$Letters

print(dt)
## # A tibble: 28 x 6
## # Groups:   conc, Type [14]
##     conc Type   Treatment  uptake_mean    sd Tukey
##    <dbl> <fct>  <fct>            <dbl> <dbl> <chr>
##  1  1000 Quebec nonchilled        43.2  3.06 a    
##  2   675 Quebec nonchilled        41.5  2.35 a    
##  3  1000 Quebec chilled           40.8  1.91 ab   
##  4   350 Quebec nonchilled        40.4  2.75 ab   
##  5   500 Quebec nonchilled        39.6  3.90 abc  
##  6   675 Quebec chilled           37.5  2.10 abcd 
##  7   250 Quebec nonchilled        37.4  2.76 abcd 
##  8   500 Quebec chilled           36.7  3.61 abcde
##  9   350 Quebec chilled           35.8  2.62 abcde
## 10   250 Quebec chilled           34.5  3.93 abcde
## # … with 18 more rows


The new table (dt) presents the columns for the three factors, the mean of the uptake, the standard deviation and the letters indicating significant differences among means.

This data will be used to build the bar plots.


Basic Barplot

As the experiment has three explanatory variables, the first question is: “Which variable to use in the horizontal axis of the plot?”

As a rule of thumb, I always choose the variable with the highest number of levels since it will lower the number of combinations presented as colours and shapes in the plot legend.

In this case, I will choose the CO2 concentration for the horizontal axis. This means the plot will show four different treatments: the combination of two different plant Types with two Treatments.

Hypothetically, if I have chosen the Type as horizontal axis, I would have 14 treatments in the legend, the combination of seven CO2 concentrations and two Treatments.

Thus, I will start by defining the CO2 concentration as the x-axis and the uptake_mean as the y-axis, and the fill (of the bars) is the combination between Type and Treatment. The bars will be positioned side-by-side with the argument position = "dodge" in the geom_bar, and I am also adding the error bars as the standard deviation of the experimental data.

For the colours, I have chosen the “Greens” palette.

I also have expressions for the x and y-axis and I have chosen the theme_few from the ggthemes package.


# barplot
ggplot(dt, aes(x = factor(conc), y = uptake_mean, fill = Type:Treatment)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_errorbar(aes(ymax = uptake_mean + sd, ymin = uptake_mean - sd),
                position = position_dodge(0.9), width = 0.25, color = "Gray25") +
  xlab(expression(CO[2]~Concentration~'('~mL~L^-1~')')) +
  ylab(expression(CO[2]~Uptake~'('~µmol~m^2~s^-1~')')) +
  scale_fill_brewer(palette = "Greens") +
  theme_few()


In the resulting plot, each combination of plant type and treatment is identified by its colour. And the information is conveyed even if it is printed in grey-scale.

I believe that a lot of people would use a plot like this one to present the results. However, this plot, although it shows all the information, the effect of each explanatory variable on the response is not readily understandablet.

In a nutshell, the plot is good, but we can improve it.


Spliting the visualisation into two plots

To improve the readability of the plot above, I am going to use one of my favorite functions from ggplot2: the facet_grid() function, and split the plot into two, using the plant Type and use only the Treatment for the fill.

# barplot
ggplot(dt, aes(x = factor(conc), y = uptake_mean, fill = Treatment)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_errorbar(aes(ymax = uptake_mean + sd, ymin = uptake_mean - sd),
                position = position_dodge(0.9), width = 0.25, color = "Gray25") +
  xlab(expression(CO[2]~Concentration~'('~mL~L^-1~')')) +
  ylab(expression(CO[2]~Uptake~'('~µmol~m^2~s^-1~')')) +
  scale_fill_brewer(palette = "Greens") +
  theme_few() +
  facet_grid(.~Type, labeller = label_both)


In this new plot, we can also readily see that the Quebec type presents higher CO2 uptake compared to the Mississipi type. Regarding the Treatment, the non-chilled plants present higher CO2 uptake compared with the chilled ones for the same plant type; and, moreover, we can easily see that the effect of the Treatment is stronger for the Mississipi type.

The interpretation in the paragraph above can also be made with the simple plot, but when we split the results by Type, the data behaviour becomes more clear.

The next step is to add the letters indicating significant differences in the means comparison (Tukey’s test) using the geom_text() function. The labels are in the column Tukey of the dt table.

I also have used the function ylim() to increase the limits of the vertical axis and avoid cutting the letters.

Additionally, I will transfer the legend to inside the right plot by using theme(legend.position = c(0.58, 0.75)).


# barplot
ggplot(dt, aes(x = factor(conc), y = uptake_mean, fill = Treatment)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_errorbar(aes(ymax = uptake_mean + sd, ymin = uptake_mean - sd),
                position = position_dodge(0.9), width = 0.25, color = "Gray25") +
  xlab(expression(CO[2]~Concentration~'('~mL~L^-1~')')) +
  ylab(expression(CO[2]~Uptake~'('~µmol~m^2~s^-1~')')) +
  theme_few() +
  theme(legend.position = c(0.58, 0.8)) +
  scale_fill_brewer(palette = "Greens") +
  facet_grid(.~Type, labeller = label_both) +
  geom_text(aes(label=Tukey, y = uptake_mean + sd + 2), size = 3, color = "Gray25",
            show.legend = FALSE,
            position = position_dodge(0.9)) +
  ylim(0,50)


The resulting plot can be used as it is.

The information is clear and readable even when printed in gray-scale.

However, it is usual to use letters to identify multiple plots, instead of the titles at the top of them. We can easily tag the ggplot sub-plots using the tag_facet() function from the egg package.

To do it, we are going to store the ggplot in an object p and this object is the argument of the tag_facet() function.

We need also relocate the legend a bit.


# barplot
p <- ggplot(dt, aes(x = factor(conc), y = uptake_mean, fill = Treatment)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_errorbar(aes(ymax = uptake_mean + sd, ymin = uptake_mean - sd),
                position = position_dodge(0.9), width = 0.25, color = "Gray25") +
  xlab(expression(CO[2]~Concentration~'('~mL~L^-1~')')) +
  ylab(expression(CO[2]~Uptake~'('~µmol~m^2~s^-1~')')) +
  theme_few() +
  theme(legend.position = c(0.58, 0.76)) +
  scale_fill_brewer(palette = "Greens") +
  facet_grid(.~Type, labeller = label_both) +
  geom_text(aes(label=Tukey, y = uptake_mean + sd + 2), size = 3, color = "Gray25",
            show.legend = FALSE,
            position = position_dodge(0.9)) +
  ylim(0,50)

tag_facet(p)


This plot now has one issue that bothers me: just looking at it we cannot identify the plant type; the information will be in the written in the figure caption. To make it readily understandable without the need of looking into the caption, we can add the information on the type to the tag_facet() function. And we can also remove the legend title to avoid an excess of information o a plot that is already full of it.

  • theme(legend.title = element_blank()) will remove the legend title;
  • theme(legend.position = c(0.58, 0.80)) readjusts the legend position;
  • fontface = 1 will change the font from bold to plain;
  • tag_pool = c("(a) Quebec", "(b) Mississipi") customises the tag labels;
  • scale_fill_manual() to customise the final colours since the current ones seem a little too pale.


# barplot
p <- ggplot(dt, aes(x = factor(conc), y = uptake_mean, fill = Treatment)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_errorbar(aes(ymax = uptake_mean + sd, ymin = uptake_mean - sd),
                position = position_dodge(0.9), width = 0.25, color = "Gray25") +
  xlab(expression(CO[2]~Concentration~'('~mL~L^-1~')')) +
  ylab(expression(CO[2]~Uptake~'('~µmol~m^2~s^-1~')')) +
  theme_few() +
  theme(legend.position = c(0.58, 0.80)) +
  theme(legend.title = element_blank()) +
  scale_fill_manual(values = c("#C1D5A5", "#84A17C")) +
  facet_grid(.~Type, labeller = label_both) +
  geom_text(aes(label=Tukey, y = uptake_mean + sd + 2), size = 3, color = "Gray25",
            show.legend = FALSE,
            position = position_dodge(0.9)) +
  ylim(0,50)

tag_facet(p, fontface = 1, tag_pool = c("(a) Quebec",
                                        "(b) Mississipi"),
          open = NULL, close = NULL, hjust = -0.05)

Now we have it ready to be used!!!

Saving the plot

The figures with two rows of plots are to be saved as 9 x 6 inches and the ones with only one row of plots are to be saved as 9 x 4 inches.

# save as a png file
ggsave("CO2_barplot.png", width = 9, height = 3.5, dpi = 1000)

  1. Potvin, C., Lechowicz, M. J. and Tardif, S. (1990) “The statistical analysis of ecophysiological response curves obtained from experiments involving repeated measures”, Ecology, 71, 1389–1400.↩︎

9 Responses

  1. Dear Rosane Rech,

    Thank you so much for your nice cooperation and sharing your nice R-script and console output. It has tremendous impact of my understanding. I am highly grateful to you.

    I am a faculty member from Bangabandhu Sheikh Mujibur Rahman Agricultural University, Bangladesh. I have keen interest to learn R-programming for data analysis in the Biological Sciences. I do request your consistent support for my better understanding in future. Best Regards-Jahidul Hassan

  2. Omar Rosas

    Dear Rosane Rech,

    Thank you for this great tutorial and the elegant plot that you made. As someone that is learning R and biostatistics, this is useful.

    I have a question regarding the analysis that sometimes for me it is confusing. When we made the ANOVA with two or three factors and the interaction is not significant, we can still make the Tukey’s test and report the comparison between the three factors (following this example will be “factor(conc):Type: Treatment”)

    Best regards,
    Omar B.

  3. Anik

    Dear Rosane Rech Madam,
    Thank you for your exclusive R tutorials.
    I have an issue regarding the two way anova and post hoc test eg. Tukes HSD. When I put the code
    multcompLetters4 on my object(anova) and a pairwise comparison result(Tukeys HSD), I got an error like this:
    Error in comp[[i]] : subscript out of bounds

    If you kindly help me to get out of this, it will be so helpful. Thanks in advance.

  4. Em

    Hi Rosane,

    Your work is so clearly laid out and easy to follow! Wonderful! I’m working to see if I can easily swap out sd for se. Do you know that off the top of your head?

  5. Fozia

    Hi Dear Rosane Rech
    Thank you for your comprehensive R tutorials for two or more than two variables
    When I added this “cld <- multcompLetters2(anova, Tukey)"
    Can't understand what this "cld" refere to????
    How would I change according to my data set????
    Got stuck here

Leave a Reply