= lm(Fertility ~ Agriculture, data = swiss) model1
OLS regression
Ordinary Least Squares (OLS) regression is a fundamental tool used to model relationships between variables. Here, we illustrate OLS regression using R, focusing on a dataset from Switzerland to explore the relationship between fertility, agriculture, and education.
Step-by-Step Breakdown:
Model 1: Simple Linear Regression
In the first model, we are curious about how agricultural activities might influence fertility rates. Therefore, we set up a simple linear model:
Here, Fertility
is our dependent variable, while Agriculture
acts as the independent variable. The model assumes the relationship can be described as:
[ = + _1 + ]
Model 2: Multiple Linear Regression
Knowing that other factors could also impact fertility rates, we introduce an additional variable, Education
, as a control factor:
= lm(Fertility ~ Agriculture + Education, data = swiss) model2
This model now considers both agriculture and education as influencers on fertility:
[ = + _1 + _2 + ]
Analyzing the Models:
Using the summary()
function on both models, we can analyze the coefficients, significance levels, and overall model fit. This step is crucial for understanding how well our independent variables explain the variability in fertility rates.
summary(model1)
Call:
lm(formula = Fertility ~ Agriculture, data = swiss)
Residuals:
Min 1Q Median 3Q Max
-25.5374 -7.8685 -0.6362 9.0464 24.4858
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 60.30438 4.25126 14.185 <2e-16 ***
Agriculture 0.19420 0.07671 2.532 0.0149 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11.82 on 45 degrees of freedom
Multiple R-squared: 0.1247, Adjusted R-squared: 0.1052
F-statistic: 6.409 on 1 and 45 DF, p-value: 0.01492
summary(model2)
Call:
lm(formula = Fertility ~ Agriculture + Education, data = swiss)
Residuals:
Min 1Q Median 3Q Max
-17.3072 -6.6157 -0.9443 8.7028 20.5291
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 84.08005 5.78180 14.542 < 2e-16 ***
Agriculture -0.06648 0.08005 -0.830 0.411
Education -0.96276 0.18906 -5.092 7.1e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.479 on 44 degrees of freedom
Multiple R-squared: 0.4492, Adjusted R-squared: 0.4242
F-statistic: 17.95 on 2 and 44 DF, p-value: 2e-06
Comparing the Models:
To visually compare the results of both models, we can leverage the texreg
package, which provides a neat, readable table:
#
# install.packages("texreg")
library(texreg)
Version: 1.39.3
Date: 2023-11-09
Author: Philip Leifeld (University of Essex)
Consider submitting praise using the praise or praise_interactive functions.
Please cite the JSS article in your publications -- see citation("texreg").
screenreg(list(model1, model2), single.row = TRUE)
===============================================
Model 1 Model 2
-----------------------------------------------
(Intercept) 60.30 (4.25) *** 84.08 (5.78) ***
Agriculture 0.19 (0.08) * -0.07 (0.08)
Education -0.96 (0.19) ***
-----------------------------------------------
R^2 0.12 0.45
Adj. R^2 0.11 0.42
Num. obs. 47 47
===============================================
*** p < 0.001; ** p < 0.01; * p < 0.05
This comparison allows us to see how the inclusion of education as a control variable affects the relationship between agriculture and fertility.