contact@statdoe.com

Tutorials

Scatterplot for Two Factors in R


In this tutorial we are going to see how to build a high-quality scatterplot for two explanatory variables.

The data1 presents the results of an experiment conducted to study the influence of the operating temperature (100˚C, 125˚C and 150˚C) and three faceplate glass types (A, B and C) in the light output of an oscilloscope tube.

To build the scatterplots, we are going to use the summarised data, with the mean, the standard deviation and the letters indicating significant differences by Tukey’s test (compact letter display). You can download the csv file with the summarised data or you can follow the Two-Way ANOVA in R – Step-by-Step Tutorial to build it.

If you prefer a video-tutorial, you can watch the tutorial Publication-Quality Scatterplots for Two Factors with ggplot – Two-Way ANOVA with R – tutorial 4 at my YouTube channel.


Loading the appropriate libraries

We are going to start by loading the appropriate libraries, the readr to load the data from a csv file and the ggplot2 for the plots.

# loading the appropriate libraries
library(readr)
library(ggplot2)


Loading and checking the data

The first step of the analysis is to load and check the data file (GTL_summary.csv).

# loading and checking the data
data_summary <- read_csv("GTL_summary.csv")
print(data_summary)
## # A tibble: 9 x 5
##   Glass  Temp  mean    sd Tukey
##   <chr> <dbl> <dbl> <dbl> <chr>
## 1 A       150 1386   6    a    
## 2 B       150 1313  14.5  b    
## 3 A       125 1087.  2.52 c    
## 4 C       125 1055. 10.6  c    
## 5 B       125 1035  35    c    
## 6 C       150  887. 18.6  d    
## 7 C       100  573. 26.5  e    
## 8 A       100  573.  6.43 e    
## 9 B       100  553  24.6  e

We can see that we have nine observations (rows) and five columns: Glass, Temp (factors), mean (response variable), sd (standard deviation), and the last column shows the compact letter display from Tukey’s test. Glass and Tukey are defined as character (chr), and Temp, mean and sd are numeric variables (dbl).

The next sections show the step-by-step codes to build scatterplots suitable for the presentation of this particular result.


Basic scatterplot

We are going to use the function ggplot() to build the scatterplots. The first argument is the data file, data_summary, and the second argument is the aesthetics aes(), where we define the x and y variables, Temp and mean. However, if we run only this code, we will have a blank plot. We need also to define the geom, and in this case geom_point() for the scatterplot.

# scatterplot
ggplot(data_summary, aes(Temp, mean)) + 
  geom_point()


Using colours and shapes to split the results

To split the results by the glass type we can define a different colour for each glass type using the argument color = Glass in the aesthetics of the ggplot() function; and we can also define a different marker shape using the argument shape = Glass in the aesthetics of the geom_point() function.

# scatterplot
ggplot(data_summary, aes(Temp, mean, color = Glass)) + 
  geom_point(aes(shape = Glass))

Now the three glass types can be identified by the colour and the marker shape.


Adding lines to connect the points

We can also add lines to connect the markers using the geom_lines() function. We will link the line type to the glass type using the argument linetype = Glass in the aesthetics.

# coloured scatterplot
ggplot(data_summary, aes(Temp, mean, color = Glass)) + 
  geom_point(aes(shape = Glass)) +
  geom_line(aes(linetype = Glass))

Now the three glass types can be identified both by the colour, the marker shape and the line type.


Avoiding the overlay of the data (position dodge)

The plot above shows some overlap among the results, making it hard do distinguish a particular point. One technique to avoid overlap is to spread the data on the x-axis. To do it, we are going to use the argument position=position_dodge(width=5) in the geom_point() and geom_line() functions. The width = 5 means the data is going to spread over a width of 5 units considering the x-axis units.

# coloured scatterplot
ggplot(data_summary, aes(Temp, mean, color = Glass)) + 
  geom_point(aes(shape = Glass), position=position_dodge(width=5)) +
  geom_line(aes(linetype = Glass), position=position_dodge(width=5))


Adding error bars

Now that we have avoided the overlap, we can add the error bars using the geom_errorbar() function. The arguments are the upper and lower limits that I have defined as the mean ± sd columns from the data_summary data set. We must also use the argument position=position_dodge(width=5).

# coloured scatterplot
ggplot(data_summary, aes(Temp, mean, color = Glass)) + 
  geom_point(aes(shape = Glass), position=position_dodge(width=5)) +
  geom_line(aes(linetype = Glass), position=position_dodge(width=5)) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), position=position_dodge(width=5))

Now we are going to correct two details:

  • The error bars are too wide, so we are going to define their width using width = 5.

  • The legend on the right does not show the line type anymore (compare with the previous plot). This happens because a straight line for the legend of the error bars was added over it. We can use the argument show.legend = FALSE in the geom_errorbar() function to avoid it.

# coloured scatterplot
ggplot(data_summary, aes(Temp, mean, color = Glass)) + 
  geom_point(aes(shape = Glass), position=position_dodge(width=5)) +
  geom_line(aes(linetype = Glass), position=position_dodge(width=5)) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), position=position_dodge(width=5), 
                width = 5, show.legend = FALSE)

Now the plot has all the elements we need and we can start customising it.


Customising the x and y titles

Let’s start by customising the x and y titles using the function labs.

# coloured scatterplot
ggplot(data_summary, aes(Temp, mean, color = Glass)) + 
  geom_point(aes(shape = Glass), position=position_dodge(width=5)) +
  geom_line(aes(linetype = Glass), position=position_dodge(width=5)) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), position=position_dodge(width=5), 
                width = 5, show.legend = FALSE) +
  labs(x="Temperature (˚C)", y="Light Output")


Customising the theme and legend position

The next step is to change the overall theme of the plot. I have chosen the theme_bw.

Additionally, I will delete the major and minor grid lines, as they are usually not used in scientific plots using the function theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()).

And we are also going to transfer the legend to the upper left corner inside the plot using the function theme(legend.position = c(0.1, 0.7)). The arguments c(0.1, 0.7) mean 10 % of the plot width and 70 % of the plot height.

# coloured scatterplot
ggplot(data_summary, aes(Temp, mean, color = Glass)) + 
  geom_point(aes(shape = Glass), position=position_dodge(width=5)) +
  geom_line(aes(linetype = Glass), position=position_dodge(width=5)) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), position=position_dodge(width=5), 
                width = 5, show.legend = FALSE) +
  labs(x="Temperature (˚C)", y="Light Output") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  theme(legend.position = c(0.1, 0.7))


Adding the compact letter display

Let’s add the compact letter display to the plot using the geom_text() function. The label is the column Tukey in the data file. We need to use the argument position = position_dodge(0.90) in the same way we have used for the error bars. The argument vjust adjust the text location, and we are also defining show.legend = FALSE since we do not want a legend for the letters.

# coloured scatterplot
ggplot(data_summary, aes(Temp, mean, color = Glass)) + 
  geom_point(aes(shape = Glass), position=position_dodge(width=5)) +
  geom_line(aes(linetype = Glass), position=position_dodge(width=5)) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), position=position_dodge(width=5), 
                width = 5, show.legend = FALSE) +
  labs(x="Temperature (˚C)", y="Light Output") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  theme(legend.position = c(0.1, 0.7)) +
  geom_text(aes(label=Tukey), size = 3, 
            position = position_dodge(width=5), vjust=1.5, show.legend = FALSE)


Customising the x and y limits and axis breaks

The next adjustment is to customise the range and/or breaks for the x- and y-axis using the functions scale_x_continuous() and scale_y_continuous().

The x-axis’ range is already good, but we are going to define the axis breaks to agree with the data using the argument breaks=c(100,125,150).

For the y-axis, we are going to expand it a bit using limits = c(450, 1450) and define the breaks as a sequence starting at 500, finishing at 1400, and step 300: breaks=seq(500,1400,300).

# coloured scatterplot
ggplot(data_summary, aes(Temp, mean, color = Glass)) + 
  geom_point(aes(shape = Glass), position=position_dodge(width=5)) +
  geom_line(aes(linetype = Glass), position=position_dodge(width=5)) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), position=position_dodge(width=5), 
                width = 5, show.legend = FALSE) +
  labs(x="Temperature (˚C)", y="Light Output") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  theme(legend.position = c(0.1, 0.7)) +
  geom_text(aes(label=Tukey), size = 3, 
            position = position_dodge(width=5), vjust=1.5, show.legend = FALSE) +
  scale_x_continuous(breaks=c(100,125,150)) +
  scale_y_continuous(limits=c(450, 1450), breaks=seq(500,1400,300))


Improving the visualisation of the shapes

The plot is good but there is still room for improvement.

It seems that the shape markers are too small for proper visualisation. I will increase them using size = 3 and apply some transparency alpha = 0.5 to avoid hiding the error bars. Both arguments are in the geom_point() function.

# coloured scatterplot
ggplot(data_summary, aes(Temp, mean, color = Glass)) + 
  geom_point(aes(shape = Glass), position=position_dodge(width=5), alpha=0.4, size = 3) +
  geom_line(aes(linetype = Glass), position=position_dodge(width=5)) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), position=position_dodge(width=5), 
                width = 5, show.legend = FALSE) +
  labs(x="Temperature (˚C)", y="Light Output") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  theme(legend.position = c(0.1, 0.7)) +
  geom_text(aes(label=Tukey), size = 3, 
            position = position_dodge(width=5), vjust=1.5, show.legend = FALSE) +
  scale_x_continuous(breaks=c(100,125,150)) +
  scale_y_continuous(limits=c(450, 1450), breaks=seq(500,1400,300))


Customising colours

As the final customisation step, I will show how to change the colour palette using the scale_color_brewer() function to change the colours associated with each glass type. I have chosen the Dark2 palette.

# coloured scatterplot
ggplot(data_summary, aes(Temp, mean, color = Glass)) + 
  geom_point(aes(shape = Glass), position=position_dodge(width=5), alpha=0.4, size = 3) +
  geom_line(aes(linetype = Glass), position=position_dodge(width=5)) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), position=position_dodge(width=5), 
                width = 5, show.legend = FALSE) +
  labs(x="Temperature (˚C)", y="Light Output") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  theme(legend.position = c(0.1, 0.7)) +
  geom_text(aes(label=Tukey), size = 3, 
            position = position_dodge(width=5), vjust=1.5, show.legend = FALSE) +
  scale_x_continuous(breaks=c(100,125,150)) +
  scale_y_continuous(limits=c(450, 1450), breaks=seq(500,1400,300)) +
  scale_color_brewer(palette = "Dark2")

This scatterplot is suitable for any presentation and also for written reports. The plot shows the means, the standard deviation and the compact letter display for each treatment.

The results from each glass type can be identified by the colour, the marker shape and the line type, making it colourblind-friendly and also suitable to be printed in gray-scale.


Grey-scale plot

This last plot is a modification of the coloured plot above, if the final presentation is to be in gray-scale.

To do it, we are simply replacing the scale_color_brewer(palette = "Dark2") function for scale_color_grey(start = 0.1, end = 0.3).

# Scatterplot
ggplot(data_summary, aes(Temp, mean, color = Glass)) + 
  geom_point(aes(shape = Glass), position=position_dodge(width=5), alpha=0.4, size = 3) +
  geom_line(aes(linetype = Glass), position=position_dodge(width=5)) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), position=position_dodge(width=5), 
                width = 5, show.legend = FALSE) +
  labs(x="Temperature (˚C)", y="Light Output") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  theme(legend.position = c(0.1, 0.7)) +
  geom_text(aes(label=Tukey), size = 3, 
            position = position_dodge(width=5), vjust=1.5, show.legend = FALSE) +
  scale_x_continuous(breaks=c(100,125,150)) +
  scale_y_continuous(limits=c(450, 1450), breaks=seq(500,1400,300)) +
  scale_color_grey(start = 0.1, end = 0.3)


Saving the final plot

The final look of a specific ggplot object depends on the size and aspect ratio used. The plots shown in this tutorial were built for a figure size 4×2.5 inches (width x height). I suggest saving the final plot as a png file with 1000 dpi resolution, as shown in the code below.

# saving the final figure
ggsave("scatterplot.png", width = 4, height = 2.5, dpi = 1000)



  1. Data source: Design and analysis of experiments / Douglas C. Montgomery. — Eighth edition↩︎

4 Responses

  1. Paolo M

    Hello,

    I really love these graphs, I think they’re perfect for publications and they’re so elegant! Congrats on putting together these wonderful tutorials!
    I was wondering if you could consider making the Rmd. file available with a link on each tutorial page?
    I know I could paste the individual R chunks but I’d also like to keep the Markdown code since together it looks quite good. Please let me know if that would be possible.

    Thank you so much!

    BW

Leave a Reply