8  Selecting Rows and Columns

Logical operations in R refer to operations that produce or use logical (TRUE or FALSE) values. These include comparisons (e.g. greater than, less than, equal to), negation, and logical and/or/not operations.

Comparison operations

Comparison operations allow to compare values and produce a logical outcome.

For instance,

10 > 5
[1] TRUE

Is 10 greater than 5 equals TRUE

These operators become particularly useful when filtering rows based on conditions.

The list of comparison operator is

Comparison Operator R symbol
Equal to ==
Not equal to !=
Greater than or equal to >=
Less than or equal to <=
Greater than >
Less than <
Filtering %in%

In this tutorial, we are using the swiss dataset

data("swiss")
head(swiss)
             Fertility Agriculture Examination Education Catholic
Courtelary        80.2        17.0          15        12     9.96
Delemont          83.1        45.1           6         9    84.84
Franches-Mnt      92.5        39.7           5         5    93.40
Moutier           85.8        36.5          12         7    33.77
Neuveville        76.9        43.5          17        15     5.16
Porrentruy        76.1        35.3           9         7    90.57
             Infant.Mortality
Courtelary               22.2
Delemont                 22.2
Franches-Mnt             20.2
Moutier                  20.3
Neuveville               20.6
Porrentruy               26.6

More than, Less than Operators (<=, <, >=, >)

Let’s filter the rows (cities) based on Education being less or equal than 2% in the swiss dataset. In essence, we are extracting rows representing populations where no more than 2% have completed primary education.

Let’s run

swiss$Education <= 2
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

For the rows that satisfy or not the condition, we receive a TRUE or FALSE.

To filter out the rows that are TRUE, we can insert the condition within the square brackets (on the left side of the comma ,)

swiss[swiss$Education <= 2, ]
          Fertility Agriculture Examination Education Catholic Infant.Mortality
Echallens      68.3        72.6          18         2    24.20             21.2
Oron           72.5        71.2          12         1     2.40             21.0
Conthey        75.5        85.9           3         2    99.71             15.1
Herens         77.3        89.7           5         2   100.00             18.3

Now we have all the cities (rows) that satisfy the condition.

We can save it into a new data.frame.

swiss_edu2 = swiss[swiss$Education <= 2, ]
swiss_edu2
          Fertility Agriculture Examination Education Catholic Infant.Mortality
Echallens      68.3        72.6          18         2    24.20             21.2
Oron           72.5        71.2          12         1     2.40             21.0
Conthey        75.5        85.9           3         2    99.71             15.1
Herens         77.3        89.7           5         2   100.00             18.3

8.0.0.1 Exercises

  • Create a new dataframe with Fertility more than 50%
  • Create a new dataframe with Agriculture less or equal than 10%

Equality == Operator

The == operator is used to test if elements are exactly equal. It compares individual elements in a vector, data frame, or other data structures, and returns a logical vector of TRUE and FALSE values, indicating whether each comparison was successful.

swiss$Education == 29
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

The == operator performs element-wise comparison between two vectors or data structures of the same length. Returns a logical vector where each element is TRUE if the condition is met and FALSE otherwise.

We can check which positions (e.g. rows) fulfill the condition with which

which(swiss$Education == 29)
[1] 46 47
# rows 46 and 47

We can save the selected rows into a new dataframe.

# save into new dataframe
swiss_edu_29 = swiss[swiss$Education == 29, ]
swiss_edu_29
            Fertility Agriculture Examination Education Catholic
Rive Droite      44.7        46.6          16        29    50.43
Rive Gauche      42.8        27.7          22        29    58.33
            Infant.Mortality
Rive Droite             18.2
Rive Gauche             19.3

Not in !=

The != operator in R is used to determine inequality between elements. It checks if elements in a vector or data structure are not equal to a specified value or another set of elements. The != operator is an important tool when you want to exclude certain values or filter out elements that do not match a specific criterion.

If we want to select values not in Education is equal to 29

# save into new dataframe
swiss_edu_not29 = swiss[swiss$Education != 29, ]
swiss_edu_not29[1:5, ]
             Fertility Agriculture Examination Education Catholic
Courtelary        80.2        17.0          15        12     9.96
Delemont          83.1        45.1           6         9    84.84
Franches-Mnt      92.5        39.7           5         5    93.40
Moutier           85.8        36.5          12         7    33.77
Neuveville        76.9        43.5          17        15     5.16
             Infant.Mortality
Courtelary               22.2
Delemont                 22.2
Franches-Mnt             20.2
Moutier                  20.3
Neuveville               20.6

Filter %in% Operator

The %in% operator checks if elements from one vector are present in another vector. It is commonly used to filter or select elements that belong to a set of specific values.

Consider checking if the elements of Education are part of a set of target values: c(2,5,20)

swiss$Education %in% c(2,5,20)
 [1] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Select the rows and create a new data frame

swiss_edu_select = swiss$Education %in% c(2,5,20)
swiss_edu_select
 [1] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Comparison == and %in%

The symbols == and %in% are related but distinct.

  • Use == when you need to perform exact comparisons between elements.
  • Use %in% when you want to check if elements are among a list or set of values.

Logical operations

Logical operations are used to combine different conditions.

Logical Operator R symbol
AND & returns TRUE if both the conditions are TRUE
OR | returns TRUE if any of the conditions is TRUE
NOT ! returns the inverse of the condition

AND

For instance, let’s select the rows (cities/provinces) with where the percentage of the population with a primary education is greater than 10% and is over 80% Catholic.

We get only two cities (Sarine and Sion)

# we save the dataset based on the condition in a new object
swiss_cond1 <- swiss[swiss$Education > 10 & swiss$Catholic > 80, ]
swiss_cond1
       Fertility Agriculture Examination Education Catholic Infant.Mortality
Sarine      82.9        45.2          16        13    91.38             24.4
Sion        79.3        63.1          13        13    96.83             18.1

NOT

Using the NOT operator ! we can get the inverse of this condition

swiss[!(swiss$Education > 10 & swiss$Catholic > 80), ]

OR

We can use the OR condition (symbol |) in a similar manner. For example, if we want to filter rows based on two conditions: less than 20% working in Agriculture or above 20% being Catholic, we would do

swiss[swiss$Agriculture < 20 | swiss$Catholic > 20, ]

Practice

Using the swiss dataset

  • Select cities with Agriculture under 30%?
  • Select cities with more than 50% Catholic population and less than 30% of Agriculture?
  • Select cities with less than 20% catholic population OR less than 20% of Agriculture?

Answers 1.


  1. Solutions

    swiss[swiss$Agriculture < 30, ]
    swiss[swiss$Catholic > 50 & swiss$Agriculture < 30, ]
    swiss[swiss$Catholic < 20 | swiss$Agriculture < 20, ]
    ↩︎