# Install the library if not already installed
# install.packages("AER")
# Load the library and data
library(AER)
data("CASchools")Selecting Columns and Variables
In this section, we will focus on selecting specific columns using R.
We will work with the dataset CASchools from the AER library.
We also load the library tidyverse
# Install the tidyverse package if not already installed
# install.packages("tidyverse")
# Load the tidyverse library
library(tidyverse)We can use ? to learn about this dataset
?CASchools
Description
The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California.
Let’s use the head() function to examine the first five rows of the dataset.
head(CASchools) district school county grades students teachers
1 75119 Sunol Glen Unified Alameda KK-08 195 10.90
2 61499 Manzanita Elementary Butte KK-08 240 11.15
3 61549 Thermalito Union Elementary Butte KK-08 1550 82.90
4 61457 Golden Feather Union Elementary Butte KK-08 243 14.00
5 61523 Palermo Union Elementary Butte KK-08 1335 71.50
6 62042 Burrel Union Elementary Fresno KK-08 137 6.40
calworks lunch computer expenditure income english read math
1 0.5102 2.0408 67 6384.911 22.690001 0.000000 691.6 690.0
2 15.4167 47.9167 101 5099.381 9.824000 4.583333 660.5 661.9
3 55.0323 76.3226 169 5501.955 8.978000 30.000002 636.3 650.9
4 36.4754 77.0492 85 7101.831 8.978000 0.000000 651.9 643.5
5 33.1086 78.4270 171 5235.988 9.080333 13.857677 641.8 639.9
6 12.3188 86.9565 25 5580.147 10.415000 12.408759 605.7 605.4
Next, we’ll create a new dataset containing only the variables: district, students, teachers, and calworks. R provides various methods to accomplish this task.
Selecting Variables - Base R Method
We’ll use the c() function to specify the names of the desired variables.
We create a new dataset with CASchools_select =
CASchools_select = CASchools[ , c("district", "students", "teachers", "calworks")]This code will extract the specified columns from the CASchools dataset.
Selecting Variables - Tidyverse Method
An alternative method to select specific columns is by using the select function from the tidyverse package, which (might )simplifies the process with a more intuitive syntax.
The select function is designed to make column selection straightforward. You simply pass the dataframe and then list the column names you wish to retain. Here’s how you can create a new dataset using select:
CASchools_select = CASchools |> select(district, students, teachers, calworks)In this code: - The |> operator (the “forward pipe operator”) is used to pass the CASchools dataset into the select function. - The select function takes the dataset and returns a new one containing only the specified columns: district, students, teachers, and calworks.