Chapter 4 – Linear Regression

In this chapter, we examine our first regression algorithm: linear regression. Rather than classifying as we did with k-nearest neighbor or clustering, we generate numeric predictions of an outcome variable based on one or more predictors. As always, we start with a multi-part lecture on linear regression, followed by several tutorials on both simple and multiple linear regression.

For those of you in Sociology 1205, this chapter corresponds to the Doing Data Science module for unit 4. You will find all the necessary scripts in our R Studio Cloud workspace.

Learning Objectives

In this chapter, we cover the following topics:

  • understanding linear regression as a form of supervised learning;
  • understanding the elements of the linear regression equation;
  • understanding when it is appropriate to use linear regression;
  • interpreting the elements of linear regressing model output;
  • evaluating linear regression models;
  • avoiding overfitting;
  • producing ,interpreting and evaluating linear regression models in R.

Lecture – Part 1

Lecture – Part 2

Lecture – Part 3

Tutorial 1 – Part 1

Tutorial 1 – Part 2

Tutorial 1 – Part 3

Tutorial 1 – Part 4

Tutorial 1 – Part 5

Tutorial 1 – Part 6

Tutorial 1 – Part 7

Tutorial 2 – Part 1

Tutorial 2 – Part 2

Tutorial 3 – Part 1

Tutorial 3 – Part 2

Key functions used in this chapter

  • hist(): the function that plots a histogram;
  • identify(): the function that allows the interactive identification of data points on a plot;
  • cor(): the function that computes a correlation coefficient;
  • lm(): the function that creates a linear regression model;
  • abline(): the function that adds a regression line to a plot;
  • names(): the function that lists all the output options available in a model object;
  • cooks.distance(): the function that computes cook’s distance on a model object;
  • predict(): the function that predicts values of a test set based on a trained model;
  • lines(): the function that adds lines to a plot;
  • rmse(): the function that computes the root mean square error of a regression model;
  • vif(): the function that allows to check for collinearity;
  • sqrt(): the function that computes a square root

 

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Modeling Data with R Copyright © 2022 by Christine A Monnier is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book