Chapter 2 – Plotting One Variable

In this chapter, this is where we start using R and the ggplot2 package, out of the tidyverse, to start creating plots of a single variable.

For those of you in Sociology 1205, this chapter covers the EDA module for unit 2.

Why do we use ggplot2 rather than the basic plots that the generic distribution of R provides? Because ggplot2 has a very intuitive way of building plots: line of code by line of code, layer by layer. In ggplot2, every line of code adds a specific thing to the plot, so we can build our plots progressively, easily identifying what each line of code does, and which element it adds (or subtracts) to the plot. This is a very clear and logical way of building a plot.

Secondly, ggplot2 produces far better, and more beautiful plots, than the generic plots from the base distribution of R.

However, it is because of the first reason that we are using ggplot2 in this course so that when it is your turn to build a plot, you will need to think about what you are trying to do and add the line of the code that corresponds to the element you want it in your plot.

To keep it simple in this second unit, we will first cover a brief introduction to ggplot2 (also see here), and then, we’ll get to plotting. In the first tutorial, we will plot a single categorical variable. In the second tutorial, we will plot a single numeric variable.

Learning Objectives

In this chapter, we’ll cover the following topics:

  • getting started with ggplot2;
  • plotting a single categorical variable;
  • pl.tting a single numeric variable;

As always, I recommend not passively watching the videos. Take notes, follow along with the tutorials, stop and examine the code and outputs.

Part 1 – Getting started with ggplot2

Part 2 – Tutorial 1, Part 1

Part 3 – Tutorial 1, Part 2

Part 3 – Tutorial 1, Part 3

Part 4 – Tutorial 2, Part 1

Part 5 – Tutorial 2, Part 2

Part 6 – Tutorial 2, Part 3

Key functions used in this chapter

  • ggplot(): the function that start a plot constructed in ggplot2;
  • aes(): the function inside ggplot() that determines which variable(s) will get mapped to the plot;
  • geom_bar(): the function added to ggplot that determines the geometry of the plot, in this case, a bar plot;
  • remove_missing(): the function that removes the rows with missing data;
  • factor(): the function to either create a factor variable, or reorder the levels of a factor variable;
  • theme(): the function that customizes the non-data components of the plot;
  • labs(): the function to add labels to the plot, for instance on the axes or a title;
  • theme_fivethirtyeight(): the function out of the ggthemes library that gives a plot the aesthetics of the online publication fivethirtyeight.com;
  • guides(): the function that can set different scales on a plot, in this case, the presence or absence of legend;
  • geom_dotplot(): the function added to ggplot that determines the geometry of the plot, in this case, a dot plot;
  • geom_histogram(): the function added to ggplot that determines the geometry of the plot, in this case, a histogram;
  • geom_density(): the function added to ggplot that determines the geometry of the plot, in this case, a density plot;
  • geom_vline(): the function added to ggplot that adds a vertical line to a plot;
  • geom_boxplot(): the function added to ggplot that determines the geometry of the plot, in this case, a boxplot;

Before moving on to the next chapter, or for those of you in Sociology 1205, to the exercise, test your understanding with the quiz below.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Visualizing Data with R Copyright © 2022 by Christine Monnier is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book