Chapter 4 – Plotting Two Variables (2)

In the last chapter, we plotted two variables, either two categorical variables, or one categorical, and one numeric. In this chapter, we will plot two numeric variables and add a few elements to our plots, just as we learned to use color palettes in chapter 3.

For those of you in Sociology 1205, this chapter covers the EDA module for unit 4. You will find the tutorial scripts in your R Studio Cloud workspace.

The goal of plotting two variables is to see if there is some kind of association between them. For instance, in the previous chapter, we examined whether a numeric variable had a different pattern or shape based on the levels of a categorical variable, for instance, sex or education.

When we have two numeric variables, we have the additional option of a statistical measure of association: the correlation coefficient. We will introduce this concept in this chapter. However, as we know, a statistical indicator does not replace plotting to determine whether the association between variables, as measured by the correlation coefficient, is correct. This is a lesson we will learn with the classical example of Anscombe quartet.

Learning Objectives

In this chapter, we cover the following topics:

  • introduction to the correlation coefficient;
  • understanding the strengths and weaknesses of the correlation coefficient;
  • plotting two numeric variables on a scatterplot;
  • identifying  and highlighting specific points on a scatterplot;
  • adding regression lines to scatterplots;
  • changing the aesthetics parameters on a scatterplot.

Introducing Correlation Coefficients

Tutorial 1

Tutorial 2 – Part 1

Tutorial 2 – Part 2

 

Tutorial 3

Key functions used in this chapter

  • geom_point(): the ggplot2 function that creates a plot with points;
  • cor(): the function that calculates a correlation coefficient;
  • cor.test(): the function that computes a full correlation hypothesis test;
  • geom_text(): the ggplot2 function that adds text to a plot;
  • stat_smooth(): the ggplot2 function that adds a regression line to a plot;
  • xlim(): the function that specifies limits to an x-axis;
  • geom_jitter(): the ggplot2 function that adds jitter to the points of a plot;
  • scale_color_viridis_c(): the function that adds a viridis color palette to a continuous variable;
  • scale_color_jcolors_contin(): the function that adds a jcolors palette to a continuous variable.

Before moving on to the next chapter, or for those of you in Sociology 1205, the exercise, test your understanding with the quiz below.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Visualizing Data with R Copyright © 2022 by Christine Monnier is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book