# Install the library if not already installed
# install.packages("AER")
# Load the library and data
library(AER)
data("CASchools")
Selecting Columns and Variables
In this section, we will focus on selecting specific columns using R.
We will work with the dataset CASchools
from the AER
library.
We also load the library tidyverse
# Install the tidyverse package if not already installed
# install.packages("tidyverse")
# Load the tidyverse library
library(tidyverse)
We can use ?
to learn about this dataset
?CASchools
Description
The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California.
Let’s use the head()
function to examine the first five rows of the dataset.
head(CASchools)
district school county grades students teachers
1 75119 Sunol Glen Unified Alameda KK-08 195 10.90
2 61499 Manzanita Elementary Butte KK-08 240 11.15
3 61549 Thermalito Union Elementary Butte KK-08 1550 82.90
4 61457 Golden Feather Union Elementary Butte KK-08 243 14.00
5 61523 Palermo Union Elementary Butte KK-08 1335 71.50
6 62042 Burrel Union Elementary Fresno KK-08 137 6.40
calworks lunch computer expenditure income english read math
1 0.5102 2.0408 67 6384.911 22.690001 0.000000 691.6 690.0
2 15.4167 47.9167 101 5099.381 9.824000 4.583333 660.5 661.9
3 55.0323 76.3226 169 5501.955 8.978000 30.000002 636.3 650.9
4 36.4754 77.0492 85 7101.831 8.978000 0.000000 651.9 643.5
5 33.1086 78.4270 171 5235.988 9.080333 13.857677 641.8 639.9
6 12.3188 86.9565 25 5580.147 10.415000 12.408759 605.7 605.4
Next, we’ll create a new dataset containing only the variables: district
, students
, teachers
, and calworks
. R provides various methods to accomplish this task.
Selecting Variables - Base R Method
We’ll use the c()
function to specify the names of the desired variables.
We create a new dataset with CASchools_select =
= CASchools[ , c("district", "students", "teachers", "calworks")] CASchools_select
This code will extract the specified columns from the CASchools
dataset.
Selecting Variables - Tidyverse Method
An alternative method to select specific columns is by using the select
function from the tidyverse
package, which (might )simplifies the process with a more intuitive syntax.
The select
function is designed to make column selection straightforward. You simply pass the dataframe and then list the column names you wish to retain. Here’s how you can create a new dataset using select
:
= CASchools |> select(district, students, teachers, calworks) CASchools_select
In this code: - The |>
operator (the “forward pipe operator”) is used to pass the CASchools
dataset into the select
function. - The select
function takes the dataset and returns a new one containing only the specified columns: district
, students
, teachers
, and calworks
.