Chapter 3 – Working with Rows

In this chapter, we will continue learning some useful dplyr functions but this time to manipulate the rows or observations or cases in our dataset, just as we learned to manipulate the columns or variables in the previous chapter.

This chapter will also involve of a lot of logical statements. If you need a refresher, you might want to go back to unit 1 and R and R Studio For Absolute Beginners.

For those of you in Sociology 1205, this chapter covers the data wrangling module for unit 3. You will find the tutorial and exercise scripts in your R Studio Cloud environment.

The fans of Star Wars among you should enjoy one of the datasets in this unit.

Learning Objectives

In this chapter, we will cover the following topics:

  • Picking rows or observations from a dataset, based on specific parameters;
  • Ordering the rows of the dataset, based on specific parameters;
  • Transforming the dataset with summary statistics.

Part 1 – Introduction

Part 2 – Picking Rows

When exploring our dataset, it might sometimes be helpful to examine parts of the data, based on some characteristics. In such cases, we might want to only take a look at the rows that match the characteristics we want. In dplyr, the filter() function will pick the rows we want, based on the parameters we select. In essence, filter() is the row counterpart to select().

Part 3 – Ordering Rows

The filter() function only picks the rows that match the characteristics we specify. That is helpful but sometimes, we may want to retain our entire dataset, but reorder it based on parameter we are interested in.. In dplyr, this is done with the arrange() function. We can then examine our data in a more organized fashion. Both filter() and arrange() can be used together with the %>% argument.

Part 4 – Extracting Summary Statistics

In addition to filtering and arranging our rows, we frequently want information that is not directly present in our dataset but can be computed with the data we have. In dplyr, the summarize() (or summarise()) function will collapse our dataset and create a one-row (or more) data containing a specific statistic we specify. It can be the mean, the median, the standard deviation, etc. In other words, summarize() will produce a new dataset based on the summary statistics we want to examine. Not to worry, though, dplyr always retain the original dataset intact.

In addition, all the functions we covered in this chapter and unit can be used together so we can extract very granular information from our data through filtering and arranging.

Key functions used in this chapter

  • filter(): the function to pick specific rows in a dataset;
  • arrange(): the function to reorder a dataset based on specific parameters;
  • summarize(): the function that collapses a dataset into a table containing selected descriptive statistics;
  • desc(): the function that orders a set of values in from largest to smallest value;
  • is.na(): a function that returns a logical vector, TRUE when a value is missing, FALSE when a value is not missing.

Before moving on to the next chapter, or exercise for those of you in Sociology 1205, you can test your understanding with the quiz below.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Data Wrangling with R Copyright © 2022 by Christine A Monnier is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book