Chapter 7 – Random Forests

In this last chapter, we introduce a method of supervised learning related to decision trees: random forests. Random forests is part of what is called ensemble methods, which involve running a lot of models (such as decision trees) to find the best one for your data. In addition to random forest, we will also introduce another classifier: support vector machine. And we will also take a brief look at association rules.

For those of you in Sociology 1205, this chapter covers the Doing Data Science module for Unit 7. You will find all the scripts in our R Studio Cloud workspace.

Learning outcomes

In this chapter, we cover the following topics:

  • understanding ensemble methods;
  • understanding random forests as an ensemble form of non-linear classification;
  • understanding the advantages of random forests over other classifiers;
  • interpreting the output of a random forest model;
  • evaluating a random forest model;

Lecture – Part 1

Lecture – Part 2

 

Tutorial 1 – Random Forests

Tutorial 2 – Part 1 – Random Forests

Tutorial 2 – Part 2 – Random Forests

Tutorial 2 – Part 3 – Random Forests

Tutorial 3 – Part 1 – Associations Rules

Tutorial 3 – Part 2 – Support Vector Machine

Key functions used in this chapter

  • randomForest(): the function that creates a random forest model;
  • importance(): the function that determines which predictors are the most important to the model;
  • varImpPlot(): the function that plots the predictors by importance to the model.
  • varUsed(): the function that tabulates the number of times the predictors were used in the building the model;
  • itemFrequencyPlot(): the function that plots the frequency of items in a sparse matrix;
  • apriori(): the function that generates rules of association between items;
  • inspect(): the function that displays association rules in a readable form;
  • ksvm(): the function that creates a support vector machine model;
  • alphaindex(): the function that extracts the output of a ksvm model;
  • svm(): another function that creates a support vector machine model.

 

 

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Modeling Data with R Copyright © 2022 by Christine A Monnier is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book