Two-Way ANOVA in R – Step-by-Step Tutorial
Two-Way ANOVA in R – Step-by-Step Tutorial
ANOVA, Tukey’s test, summarised table, compact letter display, barplot, scatterplot
Rosane Rech
- Loading the appropriate libraries
- Loading and checking the data
- Creating a simple plot for data visualisation
- Analysis of variance for two factors – Two-Way ANOVA
- Creating a table with factors, means and standard deviation
- Comparing means by Tukey’s test
- Compact letter display to indicate significant differences
- Plots for the final presentation of the results.
- Related video tutorials
In this tutorial, we are going to analyse the results of a two-factor factorial design using analysis of variance (ANOVA).
You are going to learn how to:
- create a simple plot for the data visualisation
- perform analysis of variance and Tukey’s test
- obtain the compact letter display (cld) to indicate significant differences
- build a table with the summarised data: mean, standard deviation and cld
At the end of the tutorial you will also find examples of barplots and scatterplots for the final presentation of the results.
The data1 presents the results of an experiment conducted to study the influence of the operating temperature (100˚C, 125˚C and 150˚C) and three faceplate glass types (A, B and C) in the light output of an oscilloscope tube.
To reproduce the results, you can download the file GTL.csv.
Loading the appropriate libraries
We are going to start by loading the appropriate libraries, the readr
to load the data from a csv file, the ggplot2
for the plots, the dplyr
for building a table with the summarised data (mean and standard deviation) and the multcompView
to use the compact letter display to indicate significant differences in Tukey’s test.
# loading the appropriate libraries
library(readr)
library(ggplot2)
library(multcompView)
library(dplyr)
Loading and checking the data
The first step of the analysis is to load the data file. We will use the head
function to see the first rows and the str
function for the file structure.
# loading and checking the data
<- read_csv("GTL.csv")
GTL head(GTL)
## # A tibble: 6 x 3
## Glass Temp Light
## <chr> <dbl> <dbl>
## 1 A 100 580
## 2 A 100 568
## 3 A 100 570
## 4 B 100 550
## 5 B 100 530
## 6 B 100 579
str(GTL)
## spec_tbl_df [27 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Glass: chr [1:27] "A" "A" "A" "B" ...
## $ Temp : num [1:27] 100 100 100 100 100 100 100 100 100 125 ...
## $ Light: num [1:27] 580 568 570 550 530 579 546 575 599 1090 ...
## - attr(*, "spec")=
## .. cols(
## .. Glass = col_character(),
## .. Temp = col_double(),
## .. Light = col_double()
## .. )
We can see that we have 27 observations of three variables (columns): Glass, Temp (factors) and Light (response variable). Glass is defined as a character, and Temp and Light are numeric variables.
Creating a simple plot for data visualisation
It is always a good idea to visualise the behaviour of the experimental data before diving into the analysis. We will use the function qplot
from the ggplot2
library.
# building a simple plot for data visualisation
qplot(x = Temp, y = Light, geom = "point", data = GTL) +
facet_grid(.~Glass, labeller = label_both)
We can see that the light output increases with the temperature for faceplate glass types A and B; however, faceplate glass type C has a maximum at 125 ºC. The plots indicate that we will probably have at least the temperature and the interaction between temperature and glass as significant.
Analysis of variance for two factors – Two-Way ANOVA
For the ANOVA table, we need to have the independent variables in the data file defined as factors. For Glass, which is originally defined as a character, we will convert it into a factor. For temperature, which is a numeric variable, we will duplicate it, creating a new variable, Temp_Factor, defined as a factor. We are not going to convert it because it is possible we are going to need it as numeric for further analysis or plots.
# creating a variable as factor for the ANOVA
$Glass <- as.factor(GTL$Glass)
GTL$Temp_Factor <- as.factor(GTL$Temp)
GTLstr(GTL)
## spec_tbl_df [27 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Glass : Factor w/ 3 levels "A","B","C": 1 1 1 2 2 2 3 3 3 1 ...
## $ Temp : num [1:27] 100 100 100 100 100 100 100 100 100 125 ...
## $ Light : num [1:27] 580 568 570 550 530 579 546 575 599 1090 ...
## $ Temp_Factor: Factor w/ 3 levels "100","125","150": 1 1 1 1 1 1 1 1 1 2 ...
## - attr(*, "spec")=
## .. cols(
## .. Glass = col_character(),
## .. Temp = col_double(),
## .. Light = col_double()
## .. )
Now we can see that the data file has a new column, Temp_Factor, defined as a factor, and we kept the original Temp variable as numeric. The next step is to perform the analysis of variance using the aov()
function.
# analysis of variance
<- aov(Light ~ Glass*Temp_Factor, data = GTL)
anova summary(anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## Glass 2 150865 75432 206.4 3.89e-13 ***
## Temp_Factor 2 1970335 985167 2695.3 < 2e-16 ***
## Glass:Temp_Factor 4 290552 72638 198.7 1.25e-14 ***
## Residuals 18 6579 366
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Considering the level of significance 0.05, the ANOVA table shows that are significant the temperature, the faceplate glass type, and the interaction between them. This result is expected considering the previous visualisation of the data performed in the previous section.
Creating a table with factors, means and standard deviation
Now we are going to build a table with the mean and the standard deviation of the light output for each treatment (combination of faceplate glass and temperature). When creating de table, we are going to arrange the means in a decreasing order, which will be necessary for adding the superscript letters from Tukey’s test. This code uses the dplyr
package.
# table with factors, means and standard deviation
<- group_by(GTL, Glass, Temp) %>%
data_summary summarise(mean=mean(Light), sd=sd(Light)) %>%
arrange(desc(mean))
print(data_summary)
## # A tibble: 9 x 4
## # Groups: Glass [3]
## Glass Temp mean sd
## <fct> <dbl> <dbl> <dbl>
## 1 A 150 1386 6
## 2 B 150 1313 14.5
## 3 A 125 1087. 2.52
## 4 C 125 1055. 10.6
## 5 B 125 1035 35
## 6 C 150 887. 18.6
## 7 C 100 573. 26.5
## 8 A 100 573. 6.43
## 9 B 100 553 24.6
Comparing means by Tukey’s test
The means comparison by Tukey’s test can be run on the object resulting from the analysis of variance. The result (below) is an extensive table with all pairwise comparisons and the \(p\)-value for each one of them. However, this data can be tricky to interpret, and it is usual to use letters to indicate significant differences among the means.
# Tukey's test
<- TukeyHSD(anova)
tukey print(tukey)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Light ~ Glass * Temp_Factor, data = GTL)
##
## $Glass
## diff lwr upr p adj
## B-A -48.33333 -71.33487 -25.3318 0.0001206
## C-A -177.11111 -200.11265 -154.1096 0.0000000
## C-B -128.77778 -151.77932 -105.7762 0.0000000
##
## $Temp_Factor
## diff lwr upr p adj
## 125-100 492.6667 469.6651 515.6682 0
## 150-100 628.8889 605.8874 651.8904 0
## 150-125 136.2222 113.2207 159.2238 0
##
## $`Glass:Temp_Factor`
## diff lwr upr p adj
## B:100-A:100 -19.6666667 -74.36273 35.029396 0.9307049
## C:100-A:100 0.6666667 -54.02940 55.362729 1.0000000
## A:125-A:100 514.6666667 459.97060 569.362729 0.0000000
## B:125-A:100 462.3333333 407.63727 517.029396 0.0000000
## C:125-A:100 482.0000000 427.30394 536.696063 0.0000000
## A:150-A:100 813.3333333 758.63727 868.029396 0.0000000
## B:150-A:100 740.3333333 685.63727 795.029396 0.0000000
## C:150-A:100 314.0000000 259.30394 368.696063 0.0000000
## C:100-B:100 20.3333333 -34.36273 75.029396 0.9179607
## A:125-B:100 534.3333333 479.63727 589.029396 0.0000000
## B:125-B:100 482.0000000 427.30394 536.696063 0.0000000
## C:125-B:100 501.6666667 446.97060 556.362729 0.0000000
## A:150-B:100 833.0000000 778.30394 887.696063 0.0000000
## B:150-B:100 760.0000000 705.30394 814.696063 0.0000000
## C:150-B:100 333.6666667 278.97060 388.362729 0.0000000
## A:125-C:100 514.0000000 459.30394 568.696063 0.0000000
## B:125-C:100 461.6666667 406.97060 516.362729 0.0000000
## C:125-C:100 481.3333333 426.63727 536.029396 0.0000000
## A:150-C:100 812.6666667 757.97060 867.362729 0.0000000
## B:150-C:100 739.6666667 684.97060 794.362729 0.0000000
## C:150-C:100 313.3333333 258.63727 368.029396 0.0000000
## B:125-A:125 -52.3333333 -107.02940 2.362729 0.0670029
## C:125-A:125 -32.6666667 -87.36273 22.029396 0.5065610
## A:150-A:125 298.6666667 243.97060 353.362729 0.0000000
## B:150-A:125 225.6666667 170.97060 280.362729 0.0000000
## C:150-A:125 -200.6666667 -255.36273 -145.970604 0.0000000
## C:125-B:125 19.6666667 -35.02940 74.362729 0.9307049
## A:150-B:125 351.0000000 296.30394 405.696063 0.0000000
## B:150-B:125 278.0000000 223.30394 332.696063 0.0000000
## C:150-B:125 -148.3333333 -203.02940 -93.637271 0.0000006
## A:150-C:125 331.3333333 276.63727 386.029396 0.0000000
## B:150-C:125 258.3333333 203.63727 313.029396 0.0000000
## C:150-C:125 -168.0000000 -222.69606 -113.303937 0.0000001
## B:150-A:150 -73.0000000 -127.69606 -18.303937 0.0045830
## C:150-A:150 -499.3333333 -554.02940 -444.637271 0.0000000
## C:150-B:150 -426.3333333 -481.02940 -371.637271 0.0000000
Compact letter display to indicate significant differences
The use of letters to indicate significant differences in pairwise comparisons, also called compact letter display, and can simplify the visualisation and discussion of significant differences among means. We are going to use the multcompLetters4()
function from the multcompView
package.
# creating the compact letter display
<- multcompLetters4(anova, tukey)
tukey.cld print(tukey.cld)
## $Glass
## A B C
## "a" "b" "c"
##
## $Temp_Factor
## 150 125 100
## "a" "b" "c"
##
## $`Glass:Temp_Factor`
## A:150 B:150 A:125 C:125 B:125 C:150 C:100 A:100 B:100
## "a" "b" "c" "c" "c" "d" "e" "e" "e"
The code output shows the compact letter display for the faceplate glass, for the temperature, and for each treatment. This last one is the one we are going to use and the next code extract and adds the results of the compact letter display for each treatment to the table with the data summary.
# adding the compact letter display to the table with means and sd
<- as.data.frame.list(tukey.cld$`Glass:Temp_Factor`)
cld $Tukey <- cld$Letters
data_summaryprint(data_summary)
## # A tibble: 9 x 5
## # Groups: Glass [3]
## Glass Temp mean sd Tukey
## <fct> <dbl> <dbl> <dbl> <chr>
## 1 A 150 1386 6 a
## 2 B 150 1313 14.5 b
## 3 A 125 1087. 2.52 c
## 4 C 125 1055. 10.6 c
## 5 B 125 1035 35 c
## 6 C 150 887. 18.6 d
## 7 C 100 573. 26.5 e
## 8 A 100 573. 6.43 e
## 9 B 100 553 24.6 e
The table can be exported as a csv file using the write.csv()
function.
write.csv("GTL_summary.csv")
If the idea is to present the final data as a table (as shown below), the analysis can end here. However, it is always interesting to present the results in a visual form. So the next sections show codes to build bar and line plots suitable for the presentation of this particular result.
Temperature
|
|||
---|---|---|---|
Glass Type | 100 ºC | 125 ºC | 150 ºC |
A | 573 ± 06 e | 1087 ± 3 c | 1386 ± 06 a |
B | 553 ± 25 e | 1035 ± 35 c | 1313 ± 15 b |
C | 573 ± 27 e | 1055 ± 11 c | 887 ± 19 d |
Plots for the final presentation of the results.
This section shows examples of bar and scatter plots suitable for presentations, posters and written reports. The plots show the mean, the standard deviation and the compact letter display for each treatment.
Barplots
The coloured barplot presents the faceplate glass labels at the base of the columns, making it possible to interpret the results even when printed in grey-scale or by a colourblind individual. In this case of the grey-scale barplot, the glass labels at the base of the columns are not necessary since the shades of grey are clear enough to identify the results.
A step-by-step tutorial on how to build these plots can be found on building barplots for two factors in R.
Scatterplot
Another way of conveying the final results for this particular example is by using scatterplots, as one of the variables, the temperature, is a continuous numeric variable. The different faceplate glass types can be identified by the marker shape and line type in both plots, and also by the colour in the coloured plot. A slight transparency was applied on the markers for the errorbars to be visible.
A step-by-step tutorial on how to build these plots can be found on building scatterplots for two factors in R.
Data source: Design and analysis of experiments / Douglas C. Montgomery. — Eighth edition↩︎
13 Responses
Leave a Reply
You must be logged in to post a comment.
Hi Rosane, your explanation was very helpful.
I have a question. I want to visualize my data but I couldn’t get through. I want to compare a character and a variable in a one factor experiment.
I want to compare anther extrusion (numeric) in wheat genotypes (characters). How do I solve this issue.
Thank you
Agetha
Hi Agetha, you can always use categorical factors in your horizontal axis.
Hi Rosane, thank you so much for it. It was really helpful. Can you kindly tell me how to make the table you attached here with the mean+sd+letters all in a single column. Does it have a code ?
Thanks in advance..
Unfortunatly I don’t have a code for it.
I have another question My tukey results showing letters like a, a, ab, bc, cd. Is there any way to stop repeating the letters and just to keep it as single letter like, a, b, c, d?
Thank you.
Hi Silmoon, if those are the results of your Tukey’s test, you cannot manually change them.
Hi Rosane, I hope you are very well.
I have a question. Why when I run the tukey with my data: tukey.cld <- multcompLetters4(anova, tukey), There is a erro and apparece one message like this : Error in value[Lvls, Lvls] : subscript out of bounds.
. How do I solve this issue?
Thanks for your answer
Victor Raul
Hello Victor,
I am sorry, but I’ve never had this error and without your code and data I cannot solve it.
Hi Victor,
I also ran into the same problem recently and a quick online search led me to the following site:
https://www.programmingr.com/r-error-messages/subscript-out-of-bounds-r/
I do not know if this could help you or you have already found a solution.
Best,
Dimitra
I do not understand. Ihave tried but still cannot fix that. Do you have another solution?
Dear Rosane,
Thank you for your very helpful tutorials in R!!!!
I am performing ANOVA2 and Tukey test in R using your code as a compass. When I get to the point where asking to create the compact letters display, I am receiving the following error:
Error in x[, “p adj”] : object of type ‘closure’ is not subsettable
I did some extensive search in Internet about this issue, but I really could not fix it.
The code I am using has many similarities to yours. It is only modified in order to perform ANOVA2.
My goal is to achieve the cld after ANOVA2 and Tukey test and then, depict the significance difference letters on the box-plot.
This is the code I have been using:
# Setting the working directory #
setwd(“C:/Users/dp91nyfu/Documents”)
# Loading the data #
my_data <- read.table("SAMT_gene_expression.csv",
header=T, sep=",")
# Setting my variables #
Micr=my_data$Microbe #int#
Herb=my_data$Herbivory #int#
Expr=my_data$SAMT #num#
# Convert character vector to factor vector #
my_data$Treatment <- as.factor(my_data$Treatment)
str(my_data)
# I need to log-transform my data to normally distribute them #
logExpr <- log(Expr, 10)
logExpr
# Load the needed packages and set theme
library(ggplot2); library(dplyr)
# Plotting the basic ggplot2 box-plot #
boxplot(my_data$SAMT ~ my_data$Treatment , col=terrain.colors(6))
# Perform two-way ANOVA #
anova <- aov(logExpr~ Micr*Herb, data=my_data)
summary(anova)
# Creating table with factors, means and standard deviation
data_summary %
summarise(mean=mean(SAMT), sd=sd(SAMT)) %>%
arrange(desc(mean))
print(data_summary)
# Tukey test to study each pair of treatment ERROR!!!#
TukeyHSD(aov(logExpr~Treatment, data=my_data))
# I need to group the treatments that are not
# different each other together #
library(multcompView)
generate_label_df <- function(TUKEY, variable){
# Extract labels and factor levels from Tukey post-hoc
Tukey.levels <- TUKEY[[variable]][,4]
Tukey.labels <- data.frame(multcompLetters(Tukey.levels)['Letters'])
#I need to put the labels in the same order as in the boxplot :
Tukey.labels$Treatment=rownames(Tukey.labels)
Tukey.labels=Tukey.labels[order(Tukey.labels$Treatment) , ]
return(Tukey.labels)
}
# Creating the compact letter display
tukey.cld <- multcompLetters4(anova,TukeyHSD)
print(tukey.cld)
I would be grateful if you could give me a hint!
Best wishes,
Dimitra
You have made some good points there. I looked on the net to learn more about the issue and found most individuals will go along with your views on this site.|
Hi Rosane Rech, thank you so much for your valuable posting.
I tried to run the script for analysis two-ways anova using R but I got the problem like this
“Error in comp[[i]] : subscript out of bounds”
I tried to find the solution from internet but I have not got it yet.
here is my data https://drive.google.com/file/d/1m-TNimLqnxXSsJniMjoB1lwKbU1uHIOE/view
and my script
library(datasets)
library(ggplot2)
library(multcompView)
library(dplyr)
library(datasets)
library(tidyverse)
library(multcomp)
Data = read.csv(“data.csv”, h= TRUE)
qplot(x = species, y = Ratio, geom = “point”, data = Data) +
facet_grid(.~Strain)
# creating a variable as factor for the ANOVA
Data$Strain <- as.factor(Data$Strain)
Data$species <- as.factor(Data$species)
str(Data)
# analysis of variance
anova <- aov(Ratio ~ Strain*factor(species), data = Data)
summary(anova)
# table with factors, means and standard deviation
data_summary %
summarise(mean=mean(Ratio), sd=sd(Ratio)) %>%
arrange(desc(mean))
print(data_summary)
# Tukey’s test
tukey <- TukeyHSD(anova)
print(tukey)
# compact letter display
coba = multcompLetters4(anova, tukey)
print(coba)
# creating the compact letter display
tukey.cld <- multcompLetters4(anova, tukey)
print(tukey.cld)
I want to get compact letter display