Barplots for the Results of a Three-Factor Experiment
Bar Plots for presenting three-factors experiments in R
ggplot, error bars, compact letter display, Tukey’s test
This tutorial presents some suggestions on how to organize and show results of a three factor experiment using a barplot.
If you prefer a video-tutorial, you can watch the tutorial Barplots for the Results of a Three-Factor Experiment at my YouTube channel.
We are going to use the data of an experiment that measured the the cold tolerance of the grass species Echinochloa crus-galli.1 The factors studied were the Type, with the levels ‘Quebec’ and ‘Mississippi’ giving the origin of the plant, the Treatment, ‘nonchilled’ and ‘chilled’, and the ambient CO2 concentration (mL L-1). The response variable is the CO2 uptake (µmol m-2 s-1).
We are going to start by loading the appropriate libraries: the ggplot2
and the ggthemes
for the plots, the dplyr
for data manipulation, the multcompView
for the compact letter display, and the egg
to label the plots.
# loading the appropriate libraries
library(ggplot2)
library(ggthemes)
library(dplyr)
library(multcompView)
library(egg)
Before starting the analysis and the plots, let’s take a look at the data set.
# data file
str(CO2)
## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 84 obs. of 5 variables:
## $ Plant : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
## $ Type : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
## $ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
## $ conc : num 95 175 250 350 500 675 1000 95 175 250 ...
## $ uptake : num 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
## - attr(*, "formula")=Class 'formula' language uptake ~ conc | Plant
## .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## - attr(*, "outer")=Class 'formula' language ~Treatment * Type
## .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## - attr(*, "labels")=List of 2
## ..$ x: chr "Ambient carbon dioxide concentration"
## ..$ y: chr "CO2 uptake rate"
## - attr(*, "units")=List of 2
## ..$ x: chr "(uL/L)"
## ..$ y: chr "(umol/m^2 s)"
head(CO2)
## Plant Type Treatment conc uptake
## 1 Qn1 Quebec nonchilled 95 16.0
## 2 Qn1 Quebec nonchilled 175 30.4
## 3 Qn1 Quebec nonchilled 250 34.8
## 4 Qn1 Quebec nonchilled 350 37.2
## 5 Qn1 Quebec nonchilled 500 35.3
## 6 Qn1 Quebec nonchilled 675 39.2
The output of the str()
and head()
functions shows that the data set has 5 columns, the first, ‘Plant’ is the plant identification, then we have the columns for ‘Type’, ‘Treatment’’ and conc (concentration), and finally the response variable uptake. Type and Treatment are defined as Factor and conc and uptake are numeric variables.
Type and Treatment were tested at two levels each, and the CO2 concentration was tested at seven levels: 95, 175, 250, 350, 500, 675, and 1000 mL/L.
Analysis and organisation of the data
Before diving into the plots, we need to organize the data for it.
The next code chunk runs the analysis of variance (aov()
) and Tukey’s test (TukeyHSD()
), and uses the multcompLetters4()
function to get the letters indicating significant differences.
Finally, we are going to build a table with the mean, the standard deviation and the letters indications significant differences for each treatment. The information on this table is the one that will be used to build the plots.
If you need a detailed explanation on this code, you can check the tutorial on Two-Way ANOVA.
# analysis of variance
<- aov(uptake ~ factor(conc)*Type*Treatment, data = CO2)
anova summary(anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(conc) 6 4069 678 80.548 < 2e-16 ***
## Type 1 3366 3366 399.758 < 2e-16 ***
## Treatment 1 988 988 117.368 2.32e-15 ***
## factor(conc):Type 6 374 62 7.412 7.24e-06 ***
## factor(conc):Treatment 6 101 17 1.999 0.0811 .
## Type:Treatment 1 226 226 26.812 3.15e-06 ***
## factor(conc):Type:Treatment 6 112 19 2.216 0.0547 .
## Residuals 56 471 8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Tukey's test and compact letter display
<- TukeyHSD(anova)
Tukey <- multcompLetters4(anova, Tukey)
cld
# Table with the mean, the standard deviation and the letters indications significant differences for each treatment
<- group_by(CO2, conc, Type, Treatment) %>%
dt summarise(uptake_mean=mean(uptake), sd=sd(uptake)) %>%
arrange(desc(uptake_mean))
<- as.data.frame.list(cld$`factor(conc):Type:Treatment`)
cld $Tukey <- cld$Letters
dt
print(dt)
## # A tibble: 28 x 6
## # Groups: conc, Type [14]
## conc Type Treatment uptake_mean sd Tukey
## <dbl> <fct> <fct> <dbl> <dbl> <chr>
## 1 1000 Quebec nonchilled 43.2 3.06 a
## 2 675 Quebec nonchilled 41.5 2.35 a
## 3 1000 Quebec chilled 40.8 1.91 ab
## 4 350 Quebec nonchilled 40.4 2.75 ab
## 5 500 Quebec nonchilled 39.6 3.90 abc
## 6 675 Quebec chilled 37.5 2.10 abcd
## 7 250 Quebec nonchilled 37.4 2.76 abcd
## 8 500 Quebec chilled 36.7 3.61 abcde
## 9 350 Quebec chilled 35.8 2.62 abcde
## 10 250 Quebec chilled 34.5 3.93 abcde
## # … with 18 more rows
The new table (dt) presents the columns for the three factors, the mean of the uptake, the standard deviation and the letters indicating significant differences among means.
This data will be used to build the bar plots.
Basic Barplot
As the experiment has three explanatory variables, the first question is: “Which variable to use in the horizontal axis of the plot?”
As a rule of thumb, I always choose the variable with the highest number of levels since it will lower the number of combinations presented as colours and shapes in the plot legend.
In this case, I will choose the CO2 concentration for the horizontal axis. This means the plot will show four different treatments: the combination of two different plant Types with two Treatments.
Hypothetically, if I have chosen the Type as horizontal axis, I would have 14 treatments in the legend, the combination of seven CO2 concentrations and two Treatments.
Thus, I will start by defining the CO2 concentration as the x-axis and the uptake_mean as the y-axis, and the fill (of the bars) is the combination between Type and Treatment. The bars will be positioned side-by-side with the argument position = "dodge"
in the geom_bar
, and I am also adding the error bars as the standard deviation of the experimental data.
For the colours, I have chosen the “Greens” palette.
I also have expressions for the x and y-axis and I have chosen the theme_few
from the ggthemes
package.
# barplot
ggplot(dt, aes(x = factor(conc), y = uptake_mean, fill = Type:Treatment)) +
geom_bar(stat = "identity", position = "dodge") +
geom_errorbar(aes(ymax = uptake_mean + sd, ymin = uptake_mean - sd),
position = position_dodge(0.9), width = 0.25, color = "Gray25") +
xlab(expression(CO[2]~Concentration~'('~mL~L^-1~')')) +
ylab(expression(CO[2]~Uptake~'('~µmol~m^2~s^-1~')')) +
scale_fill_brewer(palette = "Greens") +
theme_few()
In the resulting plot, each combination of plant type and treatment is identified by its colour. And the information is conveyed even if it is printed in grey-scale.
I believe that a lot of people would use a plot like this one to present the results. However, this plot, although it shows all the information, the effect of each explanatory variable on the response is not readily understandablet.
In a nutshell, the plot is good, but we can improve it.
Spliting the visualisation into two plots
To improve the readability of the plot above, I am going to use one of my favorite functions from ggplot2
: the facet_grid()
function, and split the plot into two, using the plant Type and use only the Treatment for the fill.
# barplot
ggplot(dt, aes(x = factor(conc), y = uptake_mean, fill = Treatment)) +
geom_bar(stat = "identity", position = "dodge") +
geom_errorbar(aes(ymax = uptake_mean + sd, ymin = uptake_mean - sd),
position = position_dodge(0.9), width = 0.25, color = "Gray25") +
xlab(expression(CO[2]~Concentration~'('~mL~L^-1~')')) +
ylab(expression(CO[2]~Uptake~'('~µmol~m^2~s^-1~')')) +
scale_fill_brewer(palette = "Greens") +
theme_few() +
facet_grid(.~Type, labeller = label_both)
In this new plot, we can also readily see that the Quebec type presents higher CO2 uptake compared to the Mississipi type. Regarding the Treatment, the non-chilled plants present higher CO2 uptake compared with the chilled ones for the same plant type; and, moreover, we can easily see that the effect of the Treatment is stronger for the Mississipi type.
The interpretation in the paragraph above can also be made with the simple plot, but when we split the results by Type, the data behaviour becomes more clear.
The next step is to add the letters indicating significant differences in the means comparison (Tukey’s test) using the geom_text()
function. The labels are in the column Tukey of the dt table.
I also have used the function ylim()
to increase the limits of the vertical axis and avoid cutting the letters.
Additionally, I will transfer the legend to inside the right plot by using theme(legend.position = c(0.58, 0.75))
.
# barplot
ggplot(dt, aes(x = factor(conc), y = uptake_mean, fill = Treatment)) +
geom_bar(stat = "identity", position = "dodge") +
geom_errorbar(aes(ymax = uptake_mean + sd, ymin = uptake_mean - sd),
position = position_dodge(0.9), width = 0.25, color = "Gray25") +
xlab(expression(CO[2]~Concentration~'('~mL~L^-1~')')) +
ylab(expression(CO[2]~Uptake~'('~µmol~m^2~s^-1~')')) +
theme_few() +
theme(legend.position = c(0.58, 0.8)) +
scale_fill_brewer(palette = "Greens") +
facet_grid(.~Type, labeller = label_both) +
geom_text(aes(label=Tukey, y = uptake_mean + sd + 2), size = 3, color = "Gray25",
show.legend = FALSE,
position = position_dodge(0.9)) +
ylim(0,50)
The resulting plot can be used as it is.
The information is clear and readable even when printed in gray-scale.
However, it is usual to use letters to identify multiple plots, instead of the titles at the top of them. We can easily tag the ggplot sub-plots using the tag_facet()
function from the egg
package.
To do it, we are going to store the ggplot in an object p
and this object is the argument of the tag_facet()
function.
We need also relocate the legend a bit.
# barplot
<- ggplot(dt, aes(x = factor(conc), y = uptake_mean, fill = Treatment)) +
p geom_bar(stat = "identity", position = "dodge") +
geom_errorbar(aes(ymax = uptake_mean + sd, ymin = uptake_mean - sd),
position = position_dodge(0.9), width = 0.25, color = "Gray25") +
xlab(expression(CO[2]~Concentration~'('~mL~L^-1~')')) +
ylab(expression(CO[2]~Uptake~'('~µmol~m^2~s^-1~')')) +
theme_few() +
theme(legend.position = c(0.58, 0.76)) +
scale_fill_brewer(palette = "Greens") +
facet_grid(.~Type, labeller = label_both) +
geom_text(aes(label=Tukey, y = uptake_mean + sd + 2), size = 3, color = "Gray25",
show.legend = FALSE,
position = position_dodge(0.9)) +
ylim(0,50)
tag_facet(p)
This plot now has one issue that bothers me: just looking at it we cannot identify the plant type; the information will be in the written in the figure caption. To make it readily understandable without the need of looking into the caption, we can add the information on the type to the tag_facet()
function. And we can also remove the legend title to avoid an excess of information o a plot that is already full of it.
theme(legend.title = element_blank())
will remove the legend title;theme(legend.position = c(0.58, 0.80))
readjusts the legend position;fontface = 1
will change the font from bold to plain;tag_pool = c("(a) Quebec", "(b) Mississipi")
customises the tag labels;scale_fill_manual()
to customise the final colours since the current ones seem a little too pale.
# barplot
<- ggplot(dt, aes(x = factor(conc), y = uptake_mean, fill = Treatment)) +
p geom_bar(stat = "identity", position = "dodge") +
geom_errorbar(aes(ymax = uptake_mean + sd, ymin = uptake_mean - sd),
position = position_dodge(0.9), width = 0.25, color = "Gray25") +
xlab(expression(CO[2]~Concentration~'('~mL~L^-1~')')) +
ylab(expression(CO[2]~Uptake~'('~µmol~m^2~s^-1~')')) +
theme_few() +
theme(legend.position = c(0.58, 0.80)) +
theme(legend.title = element_blank()) +
scale_fill_manual(values = c("#C1D5A5", "#84A17C")) +
facet_grid(.~Type, labeller = label_both) +
geom_text(aes(label=Tukey, y = uptake_mean + sd + 2), size = 3, color = "Gray25",
show.legend = FALSE,
position = position_dodge(0.9)) +
ylim(0,50)
tag_facet(p, fontface = 1, tag_pool = c("(a) Quebec",
"(b) Mississipi"),
open = NULL, close = NULL, hjust = -0.05)
Now we have it ready to be used!!!
Saving the plot
The figures with two rows of plots are to be saved as 9 x 6 inches and the ones with only one row of plots are to be saved as 9 x 4 inches.
# save as a png file
ggsave("CO2_barplot.png", width = 9, height = 3.5, dpi = 1000)
Potvin, C., Lechowicz, M. J. and Tardif, S. (1990) “The statistical analysis of ecophysiological response curves obtained from experiments involving repeated measures”, Ecology, 71, 1389–1400.↩︎
9 Responses
Leave a Reply
You must be logged in to post a comment.
Dear Rosane Rech,
Thank you so much for your nice cooperation and sharing your nice R-script and console output. It has tremendous impact of my understanding. I am highly grateful to you.
I am a faculty member from Bangabandhu Sheikh Mujibur Rahman Agricultural University, Bangladesh. I have keen interest to learn R-programming for data analysis in the Biological Sciences. I do request your consistent support for my better understanding in future. Best Regards-Jahidul Hassan
Amazing!!! Thank you for such a nice website!
Dear Rosane Rech,
Thank you for this great tutorial and the elegant plot that you made. As someone that is learning R and biostatistics, this is useful.
I have a question regarding the analysis that sometimes for me it is confusing. When we made the ANOVA with two or three factors and the interaction is not significant, we can still make the Tukey’s test and report the comparison between the three factors (following this example will be “factor(conc):Type: Treatment”)
Best regards,
Omar B.
For comparing all the individual treatments among them by Tukey’s test, it is necessary to run the complete model, with all factors and interactions.
It’s a complete tutorial package for factorial experiment. Indeed, It’s a great work. Really appreciate your effort from my heart.
Thanks! 🙂
Dear Rosane Rech Madam,
Thank you for your exclusive R tutorials.
I have an issue regarding the two way anova and post hoc test eg. Tukes HSD. When I put the code
multcompLetters4 on my object(anova) and a pairwise comparison result(Tukeys HSD), I got an error like this:
Error in comp[[i]] : subscript out of bounds
If you kindly help me to get out of this, it will be so helpful. Thanks in advance.
Hi Rosane,
Your work is so clearly laid out and easy to follow! Wonderful! I’m working to see if I can easily swap out sd for se. Do you know that off the top of your head?
Hi Dear Rosane Rech
Thank you for your comprehensive R tutorials for two or more than two variables
When I added this “cld <- multcompLetters2(anova, Tukey)"
Can't understand what this "cld" refere to????
How would I change according to my data set????
Got stuck here