contact@statdoe.com

Tutorials

Step-by-Step Scatterplot for One Factor in R

In this tutorial we are going to build scatterplots to show the results of a one-factor experiment that measured the release of radon in showers with different aperture diameters. The data was published in the Environment International Journal.1

We are going to use a table with summarised data: the shower diameter, the mean and standard deviation of the radon released, and compact letter display indicating the significant differences by Tukey’s test.

You can download the summarised table here, or you can go to the tutorial on One-Way ANOVA to see how to create it.

We are going to start by loading the appropriate libraries, the readr to load the data from a csv file, the ggplot2 for the plots.

# loading the appropriate libraries
library(readr)
library(ggplot2)

# loading and checking the data
radon_summary <- read_csv("radon_summary.csv")
print(radon_summary)
## # A tibble: 6 x 4
##       D  mean    sd Tukey
##   <dbl> <dbl> <dbl> <chr>
## 1  0.37  82.8  2.06 a    
## 2  0.51  77    2.31 ab   
## 3  0.71  75    1.83 b    
## 4  1.02  71.8  3.30 b    
## 5  1.4   65    3.56 c    
## 6  1.99  62.8  2.75 c

The data file shows columns for the shower diameter (D), the mean, the standard deviation (sd), and compact letter display indicating the significant differences (Tukey).

Basic Scatterplot

We are going to use the function ggplot to build the scatterplots. The first argument is the data file, radon_summary, and the second argument is the aesthetics aes, where we define the x and y variables, D and mean. However, if we run only this code, we will have a blank plot. We need also to define the geom, and is this case, geom_point() for the scatterplot.

# scatterplot
ggplot(radon_summary, aes(D, mean)) + 
  geom_point()

Adding error bars

Now let’s add the error bars using the geom_errorbar function. We must define the upper and lower limits. In this example, I am using the mean ± standard deviation, both from the radon_summary data set.

# scatterplot
ggplot(radon_summary, aes(D, mean)) + 
  geom_point() +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd))

The proportions in the plot do not look good. Let’s decrease the error bars width with the argument width = 0.05. It defines the error bar width as 0.05 units of the x-axis.

Let’s also increase size of the marker using geom_point(size = 2).

# scatterplot
ggplot(radon_summary, aes(D, mean)) + 
  geom_point(size = 2) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width = 0.05)

Customizing x and y titles

Let’s now change the x and y titles using the function ‘labs’.

# scatterplot
ggplot(radon_summary, aes(D, mean)) + 
  geom_point(size = 2) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width = 0.05) +
  labs(x="Diameter (mm)", y="Radon Released (%)")

Formating the overall visualisation

The plot could be used as it is, but there is still lots of space for improvement. The next step will be to change the overall theme of the plot. I have chosen the theme_bw. Additionally, I will delete the major and minor grid lines, as they are normally not used in scientific plots using the code theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()).

# scatterplot
ggplot(radon_summary, aes(D, mean)) + 
  geom_point(size = 2) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width = 0.05) +
  labs(x="Diameter (mm)", y="Radon Released (%)") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

Joining the points with lines

The next two examples show how to use lines to join the points of a scatterplot. We can use the function geom_lines() for straight lines or geom_smooth() for a smoothed line.

# scatterplot + straight line
ggplot(radon_summary, aes(D, mean)) + 
  geom_point(size = 2) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width = 0.05) +
  labs(x="Diameter (mm)", y="Radon Released (%)") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  geom_smooth() + 
  labs(subtitle="Smoothed line joining points")

# scatterplot + smoothed line
ggplot(radon_summary, aes(D, mean)) + 
  geom_point(size = 2) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width = 0.05) +
  labs(x="Diameter (mm)", y="Radon Released (%)") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  geom_line() + 
  labs(subtitle="Straight line joining points")

Adding trendlines to the scatterplot

We can also add trendlines to the scatterplot adding arguments to the geom_smooth() function.

The next examples show how to add linear, quadratic and exponential trendlines.

# scatterplot + trendline: linear
ggplot(radon_summary, aes(D, mean)) + 
  geom_point(size = 2) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width = 0.05) +
  labs(x="Diameter (mm)", y="Radon Released (%)") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  geom_smooth(method = "lm", formula = y ~ x) + 
  labs(subtitle="Trendline: linear")

# scatterplot + trendline: quadratic
ggplot(radon_summary, aes(D, mean)) + 
  geom_point(size = 2) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width = 0.05) +
  labs(x="Diameter (mm)", y="Radon Released (%)") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2)) + 
  labs(subtitle="Trendline: quadratic")

# scatterplot + trendline: exponential
ggplot(radon_summary, aes(D, mean)) + 
  geom_point(size = 2) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width = 0.05) +
  labs(x="Diameter (mm)", y="Radon Released (%)") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  geom_smooth(method = "lm", formula = y ~ exp(-x)) + 
  labs(subtitle="Trendline: exponential")

Comparing the different plots, I will choose the one using the exponential trendline to go on with the examples.

Formating the trendline

The gray shadow around the trendline represents the standard error. We can hide it by using the argument se = FALSE.

The trendline also seems too heavy compared with the overall plot. It is possible to decrease the line width using the argument size, and we can also change its color with the argument color.

Finally, looking carefully, we can see that the trendline was drawn in front the points. To have the points, the most important information, in front of the trendline, we need to change the order of the functions in the code, and put the function (geom_smooth()) before the points (geom_points()).

# scatterplot + trendline: exponential
ggplot(radon_summary, aes(D, mean)) + 
  geom_smooth(method = "lm", formula = y ~ exp(-x), se = FALSE, color = "Gray50", size = 0.5) +
  geom_point(size = 2) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width = 0.05) +
  labs(x="Diameter (mm)", y="Radon Released (%)") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

Adding compact letter display from Tukey’s test

Finally, we can add the compact letter display to the plot using the geom_text function. The argument label is directed to the column Tukey in the data file.

# scatterplot + trendline: exponential
ggplot(radon_summary, aes(D, mean)) + 
  geom_smooth(method = "lm", formula = y ~ exp(-x), se = FALSE, color = "Gray50", size = 0.5) +
  geom_point(size = 2) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width = 0.05) +
  labs(x="Diameter (mm)", y="Radon Released (%)") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  geom_text(aes(label=Tukey))

As we can see, the labels of the Tukey’s test were placed at the exact same spot of the points. To correct it, I am going to define their vertical position just over the error bars. As the top of the error bars corresponds to the mean plus the standard deviation (ymax=mean+sd), we can define the y coordinate as mean+sd+2, where the ‘+2’ will add 2 units considering the y axis to locate the letters just above the error bars. I am also going to decrease the size of the letters.

# scatterplot + trendline: exponential
ggplot(radon_summary, aes(D, mean)) + 
  geom_smooth(method = "lm", formula = y ~ exp(-x), se = FALSE, color = "Gray50", size = 0.5) +
  geom_point(size = 2) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width = 0.05) +
  labs(x="Diameter (mm)", y="Radon Released (%)") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  geom_text(aes(label=Tukey, y = mean + sd + 2), size = 3)

And here we have a gray-scale scatterplot suitable to be used in scientific reports and presentations.

Adding colors to the plot

To create a more attractive plot, we can add some colors. In the next example, using the code from the gray-scale scatterplot, I have defined colors for the points, error bars, trendline and labels for the Tukey’s test using the argument color and the seagreen set of shades.

I will also increase the marker size in the geom_point() function.

# scatterplot + trendline: exponential
ggplot(radon_summary, aes(D, mean)) + 
  geom_smooth(method = "lm", formula = y ~ exp(-x), se = FALSE, color = "seagreen3", size = 0.5) +
  geom_point(size = 2, color = "seagreen") +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width = 0.05, color = "seagreen") +
  labs(x="Diameter (mm)", y="Radon Released (%)") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  geom_text(aes(label=Tukey, y = mean + sd + 2), size = 3, color = "seagreen4")

Saving the final plot

The final look of a specific ggplot object depends on the size and aspect ratio used. The plots shown in this tutorial were build for a figure size 4×2.5 inches (width x height). I suggest saving the final plot as a png file with 1000 dpi resolution as shown in the code below.

# saving the final figure
ggsave("scatterplot.png", width = 4, height = 2.5, dpi = 1000)

  1. Data source: Environment International, 1992, 18(4): 363-369. https://doi.org/10.1016/0160-4120(92)90067-E↩︎


1 Response

Leave a Reply