Chapter 2 – Working with Columns

Christine Monnier

Chapter 2 – Working with Columns

In this chapter, we start with the actual work of wrangling data, using the dplyr package out of the tidyverse.

For those of you taking Sociology 1205, this chapter covers the Unit 2 data wrangling module. You will find the tutorial and exercise scripts in your R Studio Cloud workspace.

This chapter is both a general introduction to dplyr as well as an introduction to the dplyr functions that operate on the columns of a dataset.

Learning Objectives

In this chapter, we will explore:

the dplyr package;
picking columns;
creating or modifying columns;
creating pipelines of functions.

Part 1 – Introducing dplyr

Watch the video below for a general introduction to the key dplyr functions we will use in this unit.

Part 2: Picking columns

Sometimes, as we clean our datasets, we may realize that there are columns or variables that we do not need or have too many missing values to be useful, or are not relevant to our analysis. In such cases, it might help us tidy our dataset to simply pick only the variables or columns we want. It might make our dataset easier to work with, more readable, and just more manageable. The select() function, out of dplyr, helps us do just that.

Part 3: Creating or modifying columns

Frequently, as we examine our datasets, we will realize that some of the columns or variables we have are interesting or relevant to our research purposes, but they need to be reshaped somehow. In addition, we may sometimes need to create columns or variables based on existing columns. The mutate() function, also out of dplyr, is the function that allows us to create new columns or variables, based on existing ones.

Part 4: combining functions

When working with dplyr, we are not limited to the use of one function at a time. With the %>% argument, we can piece together functions into a pipeline where the output of one function is passed on to the next function, until we have done all we needed with the dataset. This is a very economical and elegant way to weave (or pipe) together multiple functions. Throughout these data wrangling modules, we will use the pipe argument a lot.

Key functions used in this chapter

select(): the function that picks columns based on parameters we set;
mutate(): the function that creates or or modifies columns based on parameters we set;
not technically a function: the %>% argument allows us to connect different functions, passing the output of each function to the next one, in a very economical and elegant coding.
set.seed(): the function to use random numbers;
matrix(): the function that creates a matrix;
rownames(): the function that takes a vector and uses it for the names of the rows in a dataset;
colnames(): the function that takes a vector and uses it for the names of the columns in a dataset;
data.frame(): the function that creates a dataframe;
rowSums(): the function that computes rowwise sums;
rowMeans(): the function that computes rowwise means;
rowwise(): the function that applies function across rows;
glimpse(): the function that gives an overview of a dataset.

Before moving on the next chapter (or the exercise, for those your in Sociology 1205), test your understanding with the quiz below.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Part 1 – Introducing dplyr

Part 2: Picking columns

Part 3: Creating or modifying columns

Part 4: combining functions

License

Share This Book