# Install if necessary
# install.packages("haven")
# Load the library
library(haven)Import SPSS/STATA Data

Importing Datasets in RStudio
When working with survey data or datasets from statistical software like SPSS or Stata, you’ll need to import these files into R. The haven package provides the most reliable and modern tools for reading these file formats while preserving important metadata like variable labels and value labels.
Why Use haven?
The haven package is specifically designed to read data from SPSS, Stata, and SAS. It has several advantages:
- Preserves labels: Variable labels and value labels are maintained during import
- Handles missing values: Properly interprets user-defined missing values from SPSS and Stata
- Modern and maintained: Actively developed as part of the tidyverse ecosystem
- Encoding support: Handles different character encodings correctly
- Metadata retention: Keeps important dataset attributes that other packages might lose
Importing Stata Files
Stata files typically have the extension .dta. To import them, use the read_dta() function:
# Import a Stata dataset
df <- read_dta("~/Documents/my_data.dta")Common read_dta() Arguments
The read_dta() function has several useful arguments:
df <- read_dta(
file = "~/Documents/my_data.dta",
encoding = NULL, # Specify encoding if needed (e.g., "latin1")
col_select = NULL # Select specific columns to import
)Key arguments:
file: The path to your.dtafileencoding: Character encoding (usually auto-detected, but specify if you have issues)col_select: Import only specific columns to save memory with large datasets
Importing SPSS Files
SPSS files come in two formats: .sav (standard SPSS format) and .por (portable format). Use read_sav() for .sav files and read_por() for .por files:
# Import an SPSS .sav file
df <- read_sav("~/Documents/my_data.sav")
# Import an SPSS .por file
df <- read_por("~/Documents/my_data.por")Common read_sav() Arguments
df <- read_sav(
file = "~/Documents/my_data.sav",
encoding = NULL, # Specify encoding if needed
user_na = FALSE, # Should user-defined missing values be read as NA?
col_select = NULL # Select specific columns
)Key arguments:
file: The path to your.savor.porfileuser_na: IfTRUE, user-defined missing values are converted to NAencoding: Character encoding specificationcol_select: Import only specific columns
Understanding File Paths
Using Full Paths (Recommended)
It’s recommended to use full file paths in your scripts for reproducibility:
# Full path examples
df <- read_dta("~/Documents/projects/survey_2024/data/responses.dta")
df <- read_sav("C:/Users/YourName/Documents/survey_data.sav") # Windows
df <- read_sav("/Users/YourName/Documents/survey_data.sav") # Mac/LinuxBenefits of full paths:
- Your script will work regardless of your current working directory
- Easy to identify exactly which file is being imported
- Reduces confusion when sharing code with collaborators
Understanding Path Components
~represents your home directory/separates folders (use forward slashes even on Windows)- Absolute paths start from the root directory
- Relative paths start from your current working directory
Using RStudio’s Import Dialog (When Paths Are Tricky)
Sometimes constructing the correct file path can be tricky, especially on Windows or when dealing with network drives. RStudio provides a helpful graphical interface for importing data.
Step-by-Step Process
- Navigate to the Import dialog:
- Click File → Import Dataset → From SPSS (or From Stata)
- Browse for your file:
- Click “Browse” and navigate to your data file
- Select the file and click “Open”
- Preview and configure:
- Review the data preview
- Adjust import options if needed
- Note the dataset name
- Copy the generated code:
- Look at the “Code Preview” pane in the import dialog
- Copy the entire import command before clicking “Import”
- Paste into your script:
- Paste the copied code into your R script
- Save this in your script for future use
Example of Code from Import Dialog
When you use the import dialog, RStudio generates code like this:
# Code generated by RStudio's import dialog
library(haven)
df <- read_sav("C:/Users/YourName/Documents/Projects/survey_2024/data.sav")Important: Always copy this generated code into your script. This ensures:
- You have a record of where the data came from
- You can re-run the import without using the dialog again
- Your analysis is reproducible
Working with Imported Data
Once imported, your data will retain its labels. You can inspect them using functions from the labelled package:
library(labelled)
# View variable labels
look_for(df)
# Check a specific variable's labels
val_labels(df$region)
# Convert labelled data to factors (if needed)
df <- df %>%
mutate(across(where(is.labelled), as_factor))Alternative: The foreign Package
If haven doesn’t work for some reason (e.g., with very old file formats or in legacy R installations), you can use the foreign package as a fallback:
# Install if necessary
# install.packages("foreign")
library(foreign)
# Import Stata files (versions up to 12)
df <- read.dta("~/Documents/my_data.dta")
# Import SPSS files
df <- read.spss("~/Documents/my_data.sav",
to.data.frame = TRUE,
use.value.labels = TRUE)Limitations of foreign
- Older and less maintained than
haven - Limited Stata support: Only works with Stata versions up to 12
- Label handling: Less sophisticated preservation of metadata
- Encoding issues: More prone to character encoding problems
Recommendation: Only use foreign if haven fails. In most modern workflows, haven should be your first choice.
Common Issues and Solutions
Issue: File Not Found
# Error: 'my_data.dta' does not exist in current working directorySolution: Check your working directory and use a full path:
# Check current working directory
getwd()
# Use full path instead
df <- read_dta("~/Documents/my_data.dta")Issue: Encoding Problems
If you see strange characters in your imported data:
# Specify encoding explicitly
df <- read_dta("~/Documents/my_data.dta", encoding = "latin1")
# or
df <- read_sav("~/Documents/my_data.sav", encoding = "UTF-8")Issue: Large File Takes Too Long
For very large datasets, import only the columns you need:
# Import only specific columns
df <- read_dta("~/Documents/large_data.dta",
col_select = c(id, age, gender, income))Best Practices
- Always use
havenas your first choice for SPSS and Stata files - Use full file paths in your scripts for reproducibility
- Keep a copy of the import code from RStudio’s dialog in your script
- Document your data source with comments in your script:
# Data source: National Survey 2024
# Original file: responses_final_v3.sav
# Date imported: 2024-11-10
df <- read_sav("~/Documents/projects/survey2024/responses_final_v3.sav")- Inspect labels immediately after import using
look_for()orstr() - Version control your scripts, not your data files
Summary of Key Functions
| Package | Function | Purpose | File Types |
|---|---|---|---|
haven |
read_dta() |
Import Stata files | .dta (all versions) |
haven |
read_sav() |
Import SPSS files | .sav |
haven |
read_por() |
Import SPSS portable files | .por |
foreign |
read.dta() |
Import Stata files (legacy) | .dta (up to v12) |
foreign |
read.spss() |
Import SPSS files (legacy) | .sav |