OLS regression

Ordinary Least Squares (OLS) regression is a fundamental tool used to model relationships between variables. Here, we illustrate OLS regression using R, focusing on a dataset from Switzerland to explore the relationship between fertility, agriculture, and education.

Step-by-Step Breakdown:

Model 1: Simple Linear Regression

In the first model, we are curious about how agricultural activities might influence fertility rates. Therefore, we set up a simple linear model:

model1 = lm(Fertility ~ Agriculture, data = swiss)

Here, Fertility is our dependent variable, while Agriculture acts as the independent variable. The model assumes the relationship can be described as:

[ = + _1 + ]

Model 2: Multiple Linear Regression

Knowing that other factors could also impact fertility rates, we introduce an additional variable, Education, as a control factor:

model2 = lm(Fertility ~ Agriculture + Education, data = swiss)

This model now considers both agriculture and education as influencers on fertility:

[ = + _1 + _2 + ]

Analyzing the Models:

Using the summary() function on both models, we can analyze the coefficients, significance levels, and overall model fit. This step is crucial for understanding how well our independent variables explain the variability in fertility rates.

summary(model1)

Call:
lm(formula = Fertility ~ Agriculture, data = swiss)

Residuals:
     Min       1Q   Median       3Q      Max 
-25.5374  -7.8685  -0.6362   9.0464  24.4858 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 60.30438    4.25126  14.185   <2e-16 ***
Agriculture  0.19420    0.07671   2.532   0.0149 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.82 on 45 degrees of freedom
Multiple R-squared:  0.1247,    Adjusted R-squared:  0.1052 
F-statistic: 6.409 on 1 and 45 DF,  p-value: 0.01492
summary(model2)

Call:
lm(formula = Fertility ~ Agriculture + Education, data = swiss)

Residuals:
     Min       1Q   Median       3Q      Max 
-17.3072  -6.6157  -0.9443   8.7028  20.5291 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 84.08005    5.78180  14.542  < 2e-16 ***
Agriculture -0.06648    0.08005  -0.830    0.411    
Education   -0.96276    0.18906  -5.092  7.1e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.479 on 44 degrees of freedom
Multiple R-squared:  0.4492,    Adjusted R-squared:  0.4242 
F-statistic: 17.95 on 2 and 44 DF,  p-value: 2e-06

Comparing the Models:

To visually compare the results of both models, we can leverage the texreg package, which provides a neat, readable table:

# 
# install.packages("texreg")
library(texreg)
Version:  1.39.3
Date:     2023-11-09
Author:   Philip Leifeld (University of Essex)

Consider submitting praise using the praise or praise_interactive functions.
Please cite the JSS article in your publications -- see citation("texreg").
screenreg(list(model1, model2), single.row = TRUE)

===============================================
             Model 1           Model 2         
-----------------------------------------------
(Intercept)  60.30 (4.25) ***  84.08 (5.78) ***
Agriculture   0.19 (0.08) *    -0.07 (0.08)    
Education                      -0.96 (0.19) ***
-----------------------------------------------
R^2           0.12              0.45           
Adj. R^2      0.11              0.42           
Num. obs.    47                47              
===============================================
*** p < 0.001; ** p < 0.01; * p < 0.05

This comparison allows us to see how the inclusion of education as a control variable affects the relationship between agriculture and fertility.