Chapter 3. Conditional and Unconditional Parallel Trends

Story: Two Cities, Two Trajectories

Imagine two cities — Riverbend and Oakville. Both have similar economic profiles. One day, Riverbend raises its minimum wage while Oakville does not.

Over the next few years, you observe that Riverbend’s employment rates rise slightly faster than Oakville’s. You might think: "The minimum wage helped employment!"

But wait — even before the policy, Riverbend’s employment had been creeping up faster than Oakville’s. Without accounting for these underlying differences in trend, simply comparing "before and after" outcomes would be misleading.

This chapter explores how parallel trends (whether unconditional or conditional) shape our ability to make valid causal claims.

Concept: The Parallel Trends Assumption

At the heart of Difference-in-Differences (DiD) lies a critical assumption: Parallel Trends.

Unconditional Parallel Trends means that, absent treatment, the difference between the treated and control groups would have stayed constant over time — no adjustment needed.
Formally:

E[Ypost(0)−Ypre(0)∣X,Treated]=E[Ypost(0)−Ypre(0)∣X,Control

Conditional Parallel Trends relaxes this: trends would have been parallel after adjusting for certain observed covariates (like income, education, etc.). Formally:

E[Ypost(0)−Ypre(0)∣X,Treated]=E[Ypost(0)−Ypre(0)∣X,Control

where X represents conditioning covariates.

Why Does This Matter?

Without parallel trends, the simple DiD estimate could be biased:

If Riverbend was already on a faster trajectory, we'd wrongly attribute all improvement to the policy.
If differences stemmed from pre-existing factors, not treatment, the causal claim would fail.

Thus, parallel trends are about having a valid counterfactual — imagining what would have happened to Riverbend if no policy had been enacted.

Building Causal Intuition

We note:

DiD is a causal method for observational data. It assumes that treatment is not randomized but that trends are comparable.
Selection Bias (from Chapter 2) threatens DiD unless trends are truly parallel.
Ignorability (Chapter 2) underpins conditional parallel trends: after controlling for key covariates, assignment is as-good-as-random over time.

🔗 If you need a refresher on ignorability or selection bias, please revisit Chapter 2.

Deep Dive: LASSO and Normalized Differences (ND)

When parallel trends do not hold unconditionally, we must find covariates that restore them conditionally.

LASSO: Variable Selection

Why LASSO? We often have many potential predictors — demographics, economics, crime rates, etc. LASSO (Least Absolute Shrinkage and Selection Operator) automatically selects variables that best predict treatment.

It shrinks small coefficients to zero, keeping only important covariates.
This prevents overfitting and identifies a sparse, interpretable model.

In our context: We use pre-treatment data to run LASSO and find covariates that predicted which cities were treated. These covariates are critical — they likely influenced both treatment and trends.

Normalized Difference (ND): Checking Balance

After selecting covariates, we check whether the treated and control groups differ too much.

ND formula:

f(x) = x * e^{2 pi i \xi x}

Norm. Diffω= \frac{\bar{X}_{\omega,T} - \bar{X}_{\omega,C}}{\sqrt{(S^2_{\omega,T} + S^2_{\omega,C})/2}}

Xˉω,T\bar{X}_{\omega,T}Xˉω,T: Mean of covariate X in treated group
Xˉω,C\bar{X}_{\omega,C}Xˉω,C: Mean of covariate X in control group
S^2: Variances

\bar{X}_{\omega,T}

Mean of covariate X in treated group

\bar{X}_{\omega,C}

Mean of covariateX in control group

S^2

Variances

Threshold:

ND < 0.25 is usually considered acceptable balance (Imbens & Rubin, 2015).

If covariates are unbalanced, naive DiD is risky.

Insight: Mind the Gap (and Trend)

Returning to Riverbend and Oakville:

If Riverbend’s employment was already improving faster before the minimum wage hike, unconditional DiD fails.
Conditioning on key covariates (say, % college-educated residents, median income) might restore parallel trends.

Takeaway: Without parallel trends (unconditional or conditional), DiD estimates are biased and misleading. Good causal inference depends on crafting a proper counterfactual.

And beyond research — this explains why you shouldn't believe every online claim: Just because someone says "I took Supplement X and got better" doesn’t prove causality. Without randomization (Chapter 2) or proper trend comparison (Chapter 3), we can't distinguish true effects from coincidences, confounders, or pre-existing trajectories.

Practice: Diagnosing and Conditioning for Parallel Trends

We will:

Use LASSO to select key covariates predicting treatment Check Normalized Differences (ND) for balance Plot Pre-Treatment Trends to visually assess parallelism

Step-by-Step Python Code

# === Step 0: Setup ===

# Install libraries if missing
!pip install pandas numpy matplotlib scikit-learn statsmodels

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegressionCV
from sklearn.preprocessing import StandardScaler

# Load data
data = pd.read_csv('chapter1log.csv')

# Define important variables
outcome_var = "FASSact_per_100k"  # Outcome
treatment_var = "Treated"         # Treatment indicator (0 or 1)
city_var = "City"                 # City name
time_var = "Month_Year"           # Time ID (e.g., "2020-07")

# Notes:
# - Make sure your 'City' column has names like "San Jose", "Santa Barbara", etc.
# - 'Treated' column: 1 = treated city, 0 = control cities

# === Step 1: Subset Pre-Treatment Data ===

pre_treatment_data = data[data['Month_Year'] < '2023-01']

# === Step 2: LASSO Variable Selection ===

# Drop unnecessary columns
X = pre_treatment_data.drop(columns=[outcome_var, treatment_var, city_var, time_var])
y = pre_treatment_data[treatment_var]

# Standardize predictors
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# LASSO with cross-validation
lasso = LogisticRegressionCV(
    Cs=10, penalty='l1', solver='saga', cv=5, random_state=0
).fit(X_scaled, y)

# Identify selected variables
selected_vars = X.columns[lasso.coef_.ravel() != 0]
print(f"Selected covariates by LASSO: {list(selected_vars)}")

# === Step 3: Normalized Difference (ND) ===

# Function to calculate ND
def normalized_difference(x_treated, x_control):
    mean_diff = np.mean(x_treated) - np.mean(x_control)
    pooled_var = (np.var(x_treated) + np.var(x_control)) / 2
    return mean_diff / np.sqrt(pooled_var)

# Calculate ND for selected covariates
for var in selected_vars:
    nd = normalized_difference(
        pre_treatment_data[pre_treatment_data[treatment_var] == 1][var],
        pre_treatment_data[pre_treatment_data[treatment_var] == 0][var]
    )
    print(f"Normalized Difference for {var}: {round(nd, 3)}")

# Interpretation:
# If ND > 0.25 (or <-0.25), imbalance exists and matching or regression adjustment is needed.

# === Step 4: Plot Pre-Treatment Trends ===

# Aggregate outcome by city and time
agg_data = pre_treatment_data.groupby([city_var, time_var])[outcome_var].mean().reset_index()

# Plot
plt.figure(figsize=(12,6))
for city in agg_data[city_var].unique():
    subset = agg_data[agg_data[city_var] == city]
    if city == "San Jose":  # Treated city
        plt.plot(subset[time_var], subset[outcome_var], label=city, color='red', linewidth=3)
    else:
        plt.plot(subset[time_var], subset[outcome_var], label=city, color='gray', alpha=0.5)

plt.axvline(x='2023-01', color='black', linestyle='--')
plt.xlabel('Time')
plt.ylabel(outcome_var)
plt.title('Pre-Treatment Trends')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Task: Worksheet — Checking Conditional Parallel Trends

🔎 You are the researcher.

Scenario:

Set the treated city to "Santa Barbara".
Set the treatment month to "2019-07".

Instructions:

Load chapter1log.csv again.
Filter data before July 2019.
Run LASSO to select predictive covariates.
Compute Normalized Differences (ND) for those covariates.
Plot pre-treatment outcome trends.

Questions:

Which covariates had the largest imbalance?
Were the trends visually parallel before July 2019?
Would you trust an unconditional DiD estimate in this case? Why or why not?

What You Learned

How DiD relies on parallel trends Why unconditional trends might fail How to use LASSO and ND to diagnose imbalance How to visually inspect trends before treatment

PreviousChapter 2. Randomization and Selection Bias NextChapter 4: Directed Acyclic Graphs (DAGs)

Last updated 3 months ago