contact@statdoe.com

Tutorials

Two-Way ANOVA in R – Step-by-Step Tutorial

In this tutorial, we are going to analyse the results of a two-factor factorial design using analysis of variance (ANOVA).

You are going to learn how to:

  • create a simple plot for the data visualisation
  • perform analysis of variance and Tukey’s test
  • obtain the compact letter display (cld) to indicate significant differences
  • build a table with the summarised data: mean, standard deviation and cld

At the end of the tutorial you will also find examples of barplots and scatterplots for the final presentation of the results.

The data1 presents the results of an experiment conducted to study the influence of the operating temperature (100˚C, 125˚C and 150˚C) and three faceplate glass types (A, B and C) in the light output of an oscilloscope tube.

To reproduce the results, you can download the file GTL.csv.


Loading the appropriate libraries

We are going to start by loading the appropriate libraries, the readr to load the data from a csv file, the ggplot2 for the plots, the dplyr for building a table with the summarised data (mean and standard deviation) and the multcompView to use the compact letter display to indicate significant differences in Tukey’s test.

# loading the appropriate libraries
library(readr)
library(ggplot2)
library(multcompView)
library(dplyr)


Loading and checking the data

The first step of the analysis is to load the data file. We will use the head function to see the first rows and the str function for the file structure.

# loading and checking the data
GTL <- read_csv("GTL.csv")
head(GTL)
## # A tibble: 6 x 3
##   Glass  Temp Light
##   <chr> <dbl> <dbl>
## 1 A       100   580
## 2 A       100   568
## 3 A       100   570
## 4 B       100   550
## 5 B       100   530
## 6 B       100   579
str(GTL)
## spec_tbl_df [27 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Glass: chr [1:27] "A" "A" "A" "B" ...
##  $ Temp : num [1:27] 100 100 100 100 100 100 100 100 100 125 ...
##  $ Light: num [1:27] 580 568 570 550 530 579 546 575 599 1090 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Glass = col_character(),
##   ..   Temp = col_double(),
##   ..   Light = col_double()
##   .. )

We can see that we have 27 observations of three variables (columns): Glass, Temp (factors) and Light (response variable). Glass is defined as a character, and Temp and Light are numeric variables.


Creating a simple plot for data visualisation

It is always a good idea to visualise the behaviour of the experimental data before diving into the analysis. We will use the function qplot from the ggplot2 library.

# building a simple plot for data visualisation
qplot(x = Temp, y = Light, geom = "point", data = GTL) +
  facet_grid(.~Glass, labeller = label_both)

We can see that the light output increases with the temperature for faceplate glass types A and B; however, faceplate glass type C has a maximum at 125 ºC. The plots indicate that we will probably have at least the temperature and the interaction between temperature and glass as significant.


Analysis of variance for two factors – Two-Way ANOVA

For the ANOVA table, we need to have the independent variables in the data file defined as factors. For Glass, which is originally defined as a character, we will convert it into a factor. For temperature, which is a numeric variable, we will duplicate it, creating a new variable, Temp_Factor, defined as a factor. We are not going to convert it because it is possible we are going to need it as numeric for further analysis or plots.

# creating a variable as factor for the ANOVA
GTL$Glass <- as.factor(GTL$Glass)
GTL$Temp_Factor <- as.factor(GTL$Temp)
str(GTL)
## spec_tbl_df [27 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Glass      : Factor w/ 3 levels "A","B","C": 1 1 1 2 2 2 3 3 3 1 ...
##  $ Temp       : num [1:27] 100 100 100 100 100 100 100 100 100 125 ...
##  $ Light      : num [1:27] 580 568 570 550 530 579 546 575 599 1090 ...
##  $ Temp_Factor: Factor w/ 3 levels "100","125","150": 1 1 1 1 1 1 1 1 1 2 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Glass = col_character(),
##   ..   Temp = col_double(),
##   ..   Light = col_double()
##   .. )

Now we can see that the data file has a new column, Temp_Factor, defined as a factor, and we kept the original Temp variable as numeric. The next step is to perform the analysis of variance using the aov() function.

# analysis of variance
anova <- aov(Light ~ Glass*Temp_Factor, data = GTL)
summary(anova)
##                   Df  Sum Sq Mean Sq F value   Pr(>F)    
## Glass              2  150865   75432   206.4 3.89e-13 ***
## Temp_Factor        2 1970335  985167  2695.3  < 2e-16 ***
## Glass:Temp_Factor  4  290552   72638   198.7 1.25e-14 ***
## Residuals         18    6579     366                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Considering the level of significance 0.05, the ANOVA table shows that are significant the temperature, the faceplate glass type, and the interaction between them. This result is expected considering the previous visualisation of the data performed in the previous section.


Creating a table with factors, means and standard deviation

Now we are going to build a table with the mean and the standard deviation of the light output for each treatment (combination of faceplate glass and temperature). When creating de table, we are going to arrange the means in a decreasing order, which will be necessary for adding the superscript letters from Tukey’s test. This code uses the dplyr package.

# table with factors, means and standard deviation
data_summary <- group_by(GTL, Glass, Temp) %>%
  summarise(mean=mean(Light), sd=sd(Light)) %>%
  arrange(desc(mean))
print(data_summary)
## # A tibble: 9 x 4
## # Groups:   Glass [3]
##   Glass  Temp  mean    sd
##   <fct> <dbl> <dbl> <dbl>
## 1 A       150 1386   6   
## 2 B       150 1313  14.5 
## 3 A       125 1087.  2.52
## 4 C       125 1055. 10.6 
## 5 B       125 1035  35   
## 6 C       150  887. 18.6 
## 7 C       100  573. 26.5 
## 8 A       100  573.  6.43
## 9 B       100  553  24.6


Comparing means by Tukey’s test

The means comparison by Tukey’s test can be run on the object resulting from the analysis of variance. The result (below) is an extensive table with all pairwise comparisons and the \(p\)-value for each one of them. However, this data can be tricky to interpret, and it is usual to use letters to indicate significant differences among the means.

# Tukey's test
tukey <- TukeyHSD(anova)
print(tukey)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Light ~ Glass * Temp_Factor, data = GTL)
## 
## $Glass
##           diff        lwr       upr     p adj
## B-A  -48.33333  -71.33487  -25.3318 0.0001206
## C-A -177.11111 -200.11265 -154.1096 0.0000000
## C-B -128.77778 -151.77932 -105.7762 0.0000000
## 
## $Temp_Factor
##             diff      lwr      upr p adj
## 125-100 492.6667 469.6651 515.6682     0
## 150-100 628.8889 605.8874 651.8904     0
## 150-125 136.2222 113.2207 159.2238     0
## 
## $`Glass:Temp_Factor`
##                     diff        lwr         upr     p adj
## B:100-A:100  -19.6666667  -74.36273   35.029396 0.9307049
## C:100-A:100    0.6666667  -54.02940   55.362729 1.0000000
## A:125-A:100  514.6666667  459.97060  569.362729 0.0000000
## B:125-A:100  462.3333333  407.63727  517.029396 0.0000000
## C:125-A:100  482.0000000  427.30394  536.696063 0.0000000
## A:150-A:100  813.3333333  758.63727  868.029396 0.0000000
## B:150-A:100  740.3333333  685.63727  795.029396 0.0000000
## C:150-A:100  314.0000000  259.30394  368.696063 0.0000000
## C:100-B:100   20.3333333  -34.36273   75.029396 0.9179607
## A:125-B:100  534.3333333  479.63727  589.029396 0.0000000
## B:125-B:100  482.0000000  427.30394  536.696063 0.0000000
## C:125-B:100  501.6666667  446.97060  556.362729 0.0000000
## A:150-B:100  833.0000000  778.30394  887.696063 0.0000000
## B:150-B:100  760.0000000  705.30394  814.696063 0.0000000
## C:150-B:100  333.6666667  278.97060  388.362729 0.0000000
## A:125-C:100  514.0000000  459.30394  568.696063 0.0000000
## B:125-C:100  461.6666667  406.97060  516.362729 0.0000000
## C:125-C:100  481.3333333  426.63727  536.029396 0.0000000
## A:150-C:100  812.6666667  757.97060  867.362729 0.0000000
## B:150-C:100  739.6666667  684.97060  794.362729 0.0000000
## C:150-C:100  313.3333333  258.63727  368.029396 0.0000000
## B:125-A:125  -52.3333333 -107.02940    2.362729 0.0670029
## C:125-A:125  -32.6666667  -87.36273   22.029396 0.5065610
## A:150-A:125  298.6666667  243.97060  353.362729 0.0000000
## B:150-A:125  225.6666667  170.97060  280.362729 0.0000000
## C:150-A:125 -200.6666667 -255.36273 -145.970604 0.0000000
## C:125-B:125   19.6666667  -35.02940   74.362729 0.9307049
## A:150-B:125  351.0000000  296.30394  405.696063 0.0000000
## B:150-B:125  278.0000000  223.30394  332.696063 0.0000000
## C:150-B:125 -148.3333333 -203.02940  -93.637271 0.0000006
## A:150-C:125  331.3333333  276.63727  386.029396 0.0000000
## B:150-C:125  258.3333333  203.63727  313.029396 0.0000000
## C:150-C:125 -168.0000000 -222.69606 -113.303937 0.0000001
## B:150-A:150  -73.0000000 -127.69606  -18.303937 0.0045830
## C:150-A:150 -499.3333333 -554.02940 -444.637271 0.0000000
## C:150-B:150 -426.3333333 -481.02940 -371.637271 0.0000000


Compact letter display to indicate significant differences

The use of letters to indicate significant differences in pairwise comparisons, also called compact letter display, and can simplify the visualisation and discussion of significant differences among means. We are going to use the multcompLetters4() function from the multcompView package.

# creating the compact letter display
tukey.cld <- multcompLetters4(anova, tukey)
print(tukey.cld)
## $Glass
##   A   B   C 
## "a" "b" "c" 
## 
## $Temp_Factor
## 150 125 100 
## "a" "b" "c" 
## 
## $`Glass:Temp_Factor`
## A:150 B:150 A:125 C:125 B:125 C:150 C:100 A:100 B:100 
##   "a"   "b"   "c"   "c"   "c"   "d"   "e"   "e"   "e"

The code output shows the compact letter display for the faceplate glass, for the temperature, and for each treatment. This last one is the one we are going to use and the next code extract and adds the results of the compact letter display for each treatment to the table with the data summary.

# adding the compact letter display to the table with means and sd
cld <- as.data.frame.list(tukey.cld$`Glass:Temp_Factor`)
data_summary$Tukey <- cld$Letters
print(data_summary)
## # A tibble: 9 x 5
## # Groups:   Glass [3]
##   Glass  Temp  mean    sd Tukey
##   <fct> <dbl> <dbl> <dbl> <chr>
## 1 A       150 1386   6    a    
## 2 B       150 1313  14.5  b    
## 3 A       125 1087.  2.52 c    
## 4 C       125 1055. 10.6  c    
## 5 B       125 1035  35    c    
## 6 C       150  887. 18.6  d    
## 7 C       100  573. 26.5  e    
## 8 A       100  573.  6.43 e    
## 9 B       100  553  24.6  e

The table can be exported as a csv file using the write.csv() function.

write.csv("GTL_summary.csv")

If the idea is to present the final data as a table (as shown below), the analysis can end here. However, it is always interesting to present the results in a visual form. So the next sections show codes to build bar and line plots suitable for the presentation of this particular result.

Light output of an oscilloscope tube with the temperature and faceplate glass type.
Temperature
Glass Type 100 ºC 125 ºC 150 ºC
A 573 ± 06 e 1087 ± 3 c 1386 ± 06 a
B 553 ± 25 e 1035 ± 35 c 1313 ± 15 b
C 573 ± 27 e 1055 ± 11 c 887 ± 19 d


Plots for the final presentation of the results.

This section shows examples of bar and scatter plots suitable for presentations, posters and written reports. The plots show the mean, the standard deviation and the compact letter display for each treatment.

Barplots

The coloured barplot presents the faceplate glass labels at the base of the columns, making it possible to interpret the results even when printed in grey-scale or by a colourblind individual. In this case of the grey-scale barplot, the glass labels at the base of the columns are not necessary since the shades of grey are clear enough to identify the results.

A step-by-step tutorial on how to build these plots can be found on building barplots for two factors in R.



Scatterplot

Another way of conveying the final results for this particular example is by using scatterplots, as one of the variables, the temperature, is a continuous numeric variable. The different faceplate glass types can be identified by the marker shape and line type in both plots, and also by the colour in the coloured plot. A slight transparency was applied on the markers for the errorbars to be visible.

A step-by-step tutorial on how to build these plots can be found on building scatterplots for two factors in R.




  1. Data source: Design and analysis of experiments / Douglas C. Montgomery. — Eighth edition↩︎

13 Responses

  1. Agetha

    Hi Rosane, your explanation was very helpful.
    I have a question. I want to visualize my data but I couldn’t get through. I want to compare a character and a variable in a one factor experiment.
    I want to compare anther extrusion (numeric) in wheat genotypes (characters). How do I solve this issue.
    Thank you
    Agetha

  2. Silmoon Islam

    Hi Rosane, thank you so much for it. It was really helpful. Can you kindly tell me how to make the table you attached here with the mean+sd+letters all in a single column. Does it have a code ?
    Thanks in advance..

  3. Silmoon Islam

    I have another question My tukey results showing letters like a, a, ab, bc, cd. Is there any way to stop repeating the letters and just to keep it as single letter like, a, b, c, d?

    Thank you.

  4. Victor Raul

    Hi Rosane, I hope you are very well.
    I have a question. Why when I run the tukey with my data: tukey.cld <- multcompLetters4(anova, tukey), There is a erro and apparece one message like this : Error in value[Lvls, Lvls] : subscript out of bounds.
    . How do I solve this issue?
    Thanks for your answer
    Victor Raul

  5. Dimitra Papantoniou

    Dear Rosane,
    Thank you for your very helpful tutorials in R!!!!

    I am performing ANOVA2 and Tukey test in R using your code as a compass. When I get to the point where asking to create the compact letters display, I am receiving the following error:

    Error in x[, “p adj”] : object of type ‘closure’ is not subsettable

    I did some extensive search in Internet about this issue, but I really could not fix it.

    The code I am using has many similarities to yours. It is only modified in order to perform ANOVA2.
    My goal is to achieve the cld after ANOVA2 and Tukey test and then, depict the significance difference letters on the box-plot.

    This is the code I have been using:

    # Setting the working directory #

    setwd(“C:/Users/dp91nyfu/Documents”)

    # Loading the data #

    my_data <- read.table("SAMT_gene_expression.csv",
    header=T, sep=",")

    # Setting my variables #
    Micr=my_data$Microbe #int#
    Herb=my_data$Herbivory #int#
    Expr=my_data$SAMT #num#

    # Convert character vector to factor vector #
    my_data$Treatment <- as.factor(my_data$Treatment)

    str(my_data)

    # I need to log-transform my data to normally distribute them #
    logExpr <- log(Expr, 10)
    logExpr

    # Load the needed packages and set theme

    library(ggplot2); library(dplyr)

    # Plotting the basic ggplot2 box-plot #

    boxplot(my_data$SAMT ~ my_data$Treatment , col=terrain.colors(6))

    # Perform two-way ANOVA #
    anova <- aov(logExpr~ Micr*Herb, data=my_data)
    summary(anova)

    # Creating table with factors, means and standard deviation
    data_summary %
    summarise(mean=mean(SAMT), sd=sd(SAMT)) %>%
    arrange(desc(mean))
    print(data_summary)

    # Tukey test to study each pair of treatment ERROR!!!#
    TukeyHSD(aov(logExpr~Treatment, data=my_data))

    # I need to group the treatments that are not
    # different each other together #

    library(multcompView)

    generate_label_df <- function(TUKEY, variable){

    # Extract labels and factor levels from Tukey post-hoc
    Tukey.levels <- TUKEY[[variable]][,4]
    Tukey.labels <- data.frame(multcompLetters(Tukey.levels)['Letters'])

    #I need to put the labels in the same order as in the boxplot :
    Tukey.labels$Treatment=rownames(Tukey.labels)
    Tukey.labels=Tukey.labels[order(Tukey.labels$Treatment) , ]
    return(Tukey.labels)
    }

    # Creating the compact letter display
    tukey.cld <- multcompLetters4(anova,TukeyHSD)
    print(tukey.cld)

    I would be grateful if you could give me a hint!

    Best wishes,
    Dimitra

  6. woony

    Hi Rosane Rech, thank you so much for your valuable posting.
    I tried to run the script for analysis two-ways anova using R but I got the problem like this

    “Error in comp[[i]] : subscript out of bounds”

    I tried to find the solution from internet but I have not got it yet.
    here is my data https://drive.google.com/file/d/1m-TNimLqnxXSsJniMjoB1lwKbU1uHIOE/view
    and my script

    library(datasets)
    library(ggplot2)
    library(multcompView)
    library(dplyr)
    library(datasets)
    library(tidyverse)
    library(multcomp)

    Data = read.csv(“data.csv”, h= TRUE)
    qplot(x = species, y = Ratio, geom = “point”, data = Data) +
    facet_grid(.~Strain)

    # creating a variable as factor for the ANOVA
    Data$Strain <- as.factor(Data$Strain)
    Data$species <- as.factor(Data$species)
    str(Data)

    # analysis of variance
    anova <- aov(Ratio ~ Strain*factor(species), data = Data)
    summary(anova)

    # table with factors, means and standard deviation
    data_summary %
    summarise(mean=mean(Ratio), sd=sd(Ratio)) %>%
    arrange(desc(mean))
    print(data_summary)

    # Tukey’s test
    tukey <- TukeyHSD(anova)
    print(tukey)

    # compact letter display
    coba = multcompLetters4(anova, tukey)
    print(coba)

    # creating the compact letter display
    tukey.cld <- multcompLetters4(anova, tukey)
    print(tukey.cld)

    I want to get compact letter display

Leave a Reply