Chapter 2 – K-Nearest Neighbor

In this chapter, we introduce our first machine learning technique: k-nearest neighbor with both a lecture and two tutorials with R.

For those of you in Sociology 1205, this chapter covers the Doing Data Science module for unit 2. You will find the scripts for the tutorials and exercise in the class’s R Studio Cloud workspace.

Learning Objectives

In this chapter, we cover the following topics:

  • k-nearest neighbor as a form of supervised learning;
  • k-nearest neighbor as a form of classification;
  • k-nearest neighbor as a form of instance-based (as opposed to model-based) learning;
  • the difference between Euclidian and Manhattan distances;
  • scaling variables;
  • how to run and interpret k-nearest neighbor in R

K-Nearest Neighbor Lecture

Tutorial 1 – Part 1

Tutorial 1 – Part 2

Tutorial 2

Key functions used in this chapter

  • library(): the function that loads packages;
  • read.csv(): the function that reads comma-separated values files into the environment;
  • na.omit(): the function that removes of the rows of the dataset that have missing values;
  • as.numeric(): the function that changes a variable from character or factor to numeric;
  • set.seed(): the function that sets the random selection of numbers;
  • sample.split(): the function that randomly separates the data into a training set and a test set;
  • subset(): the function that subsets a dataset based on specified criteria;
  • table(): the function that tabulates data;
  • knn(): the function that creates a k-nearest neighbor classification of a test set based on a training set;
  • sum(): the function that adds the elements of a vector;
  • length(): the function that returns the number of elements in a vector;
  • plot(): the function that draws base graphics;
  • legend() the function that adds a legend to a plot;
  • as.factor(): the function that changes a variable from character or numeric to a factor variable.

 

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Modeling Data with R Copyright © 2022 by Christine A Monnier is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book