Chapter 5: Matching

Story: The Search for a Treatment Effect in Crime Reduction Programs

Imagine a city government, concerned with rising crime rates, launches a new community policing initiative aimed at reducing violent crime. Over the course of a year, crime rates in the city seem to decline. However, this success is not necessarily due to the new program alone — it could also be influenced by other factors, such as socioeconomic conditions, local policies, or police staffing levels.

The problem, of course, is that we don’t have a random assignment of cities to receive the new program. Some cities chose to adopt the initiative, while others did not. Without randomization, how can we be sure that the reduction in violent crime is really due to the program? Could these cities have been different from those that didn’t implement the program, making it difficult to directly compare the two?

Here’s where matching comes into play. Matching helps us compare similar cities — one that adopted the program and one that didn’t — by accounting for their differences on key characteristics such as local crime rates, socioeconomic status, and community policing efforts. By matching on these variables, we aim to create pairs of similar cities so we can more accurately estimate the impact of the new program.

Concept: What Is Matching, and Why Does It Matter?

Matching is a technique used in causal inference to estimate the treatment effect in observational studies, where random assignment is not possible. The core idea behind matching is to find untreated units (in our case, cities without the new policing initiative) that are similar to treated units (cities with the policing initiative) based on a set of observed covariates.

For example, let’s say we are trying to measure the impact of community policing on violent crime rates. Some cities receive the program, and others do not. However, the treated and untreated cities may differ in terms of various factors like population size, unemployment rates, or initial crime rates. These differences can create a bias in our estimate of the program’s effect.

By using matching, we can pair each treated city with a similar untreated city, creating a more comparable “control group” for our analysis. This allows us to more accurately estimate the treatment effect by comparing treated and control cities that are as similar as possible on the variables we observe.

One common method of matching is propensity score matching (PSM). In PSM, we calculate the probability (propensity score) that a city receives the treatment, based on observed characteristics (e.g., violent crime rates, population size, local police funding). Cities with similar propensity scores are then matched together, and the treatment effect is estimated by comparing outcomes between the matched cities.

Insight: Connecting Theory to Practice — The Crime Reduction Example

Let’s revisit our story. Suppose we use propensity score matching (PSM) to match cities that received the community policing program with cities that did not. We calculate a propensity score for each city based on factors like violent crime rates, property crime rates, socioeconomic status, and other relevant covariates.

After matching, we find that the matched cities in the treated and control groups are very similar, with comparable initial levels of crime and socioeconomic conditions. This gives us confidence that any difference in the outcome (violent crime rates) between the treated and control groups can be attributed to the program itself, rather than pre-existing differences between the cities.

In this scenario, matching allows us to control for confounders — factors that could influence both the likelihood of receiving the treatment and the outcome of interest. It helps us eliminate bias and obtain a more accurate estimate of the treatment effect.

Practice: Applying Matching to the Crime Data

In this section, we’ll use the dataset you provided to perform propensity score matching. Our goal is to estimate the effect of community policing (or another intervention) on violent crime rates, using matching to control for confounding factors.

Let’s walk through the following steps in R to apply matching on this dataset.

Step 1: Define the Variables

For this example, let’s say the treatment variable is whether a city implemented the community policing initiative (we can represent this as a binary variable, 1 for treated and 0 for untreated). The outcome variable we are interested in is Violent_sum, which represents the total violent crime rate.

We’ll use the following covariates to match the treated and untreated cities:

Property_sum: Property crime rate
Homicide_sum: Homicide rate
AggAssault_per_100k: Assault rate
Burglary_per_100k: Burglary rate
Socioeconomic status (if available in the dataset).

Step 2: Perform Propensity Score Matching in R

Here’s an R script to perform the matching:

# Load necessary libraries
install.packages("MatchIt")
library(MatchIt)

# Load the dataset (make sure to use the correct path to your data)
data <- read.csv("path_to_your_data.csv")

# Define the treatment variable (1 for treated, 0 for control)
# For this example, we'll assume the variable "CommunityPolicing" indicates whether the program was implemented
data$treatment <- ifelse(data$Violent_sum > median(data$Violent_sum), 1, 0)

# Define the covariates to match on
covariates <- c("Property_sum", "Homicide_sum", "AggAssault_per_100k", "Burglary_per_100k")

# Perform propensity score matching
match_model <- matchit(treatment ~ Property_sum + Homicide_sum + AggAssault_per_100k + Burglary_per_100k, 
                       data = data, method = "nearest")

# View the matched dataset
matched_data <- match.data(match_model)

# Summary of the matched sample
summary(match_model)

# Now compare the outcome (e.g., Violent_sum) between treated and untreated cities
treated_violent <- matched_data$Violent_sum[matched_data$treatment == 1]
control_violent <- matched_data$Violent_sum[matched_data$treatment == 0]

# Perform a t-test to compare violent crime rates
t.test(treated_violent, control_violent)

Step 3: Analyze the Results

In this script:

We define the treatment variable based on whether a city has higher than median violent crime rates (assuming that cities with higher violent crime rates implemented community policing as part of the program).
We match cities based on several covariates that we believe could influence both treatment assignment and the outcome (violent crime rates).
After performing the matching, we compare the violent crime rates between matched cities (treated vs. control).
The t-test provides an estimate of whether the difference in violent crime rates between the treated and control cities is statistically significant.

Task: Practice with Your Own Data

Now, it’s your turn! Using the data you provided, follow these steps:

Step 1: Define Your Treatment and Control Groups
- Choose a treatment variable (e.g., did a city implement a crime-reduction program or not?).
- Identify the outcome variable (e.g., violent crime rates, property crime rates).
Step 2: Select Covariates for Matching
- Choose variables that could influence both the treatment and outcome (e.g., property crime rates, homicide rates, socio-economic status).
Step 3: Perform Matching
- Use the provided R code to perform propensity score matching. Match treated and untreated cities based on the chosen covariates.
Step 4: Compare Outcomes Between Matched Groups
- After matching, compare the outcomes (violent crime rates, property crime rates) between treated and untreated cities.
Step 5: Interpret the Results
- What do the results of your t-test reveal? Is there a statistically significant difference in the violent crime rates between cities that received the treatment (community policing) and those that did not?

In Chapter 5, we explored the concept of matching as a powerful tool for causal inference in observational studies. By using matching techniques like propensity score matching, we can adjust for confounders and make more credible estimates of treatment effects. Through practical examples using real crime data, we showed how matching helps us compare similar units (cities) to estimate the impact of policies, like community policing, on crime reduction.

By following the practice and task sections, you’ve learned how to apply these techniques to your own research questions and datasets. Matching provides an essential tool when randomization is not possible, allowing you to make more accurate causal claims from observational data.

PreviousChapter 4: Directed Acyclic Graphs (DAGs)NextChapter B5:A/B Testing Playbook

Last updated 3 months ago