Chapter 3. Conditional and Unconditional Parallel Trends

Story: Two Cities, Two Trajectories

Imagine two cities — Riverbend and Oakville. Both have similar economic profiles. One day, Riverbend raises its minimum wage while Oakville does not.

Over the next few years, you observe that Riverbend’s employment rates rise slightly faster than Oakville’s. You might think: "The minimum wage helped employment!"

But wait — even before the policy, Riverbend’s employment had been creeping up faster than Oakville’s. Without accounting for these underlying differences in trend, simply comparing "before and after" outcomes would be misleading.

This chapter explores how parallel trends (whether unconditional or conditional) shape our ability to make valid causal claims.


At the heart of Difference-in-Differences (DiD) lies a critical assumption: Parallel Trends.

  • Unconditional Parallel Trends means that, absent treatment, the difference between the treated and control groups would have stayed constant over time — no adjustment needed.

    Formally:

E[Ypost(0)Ypre(0)X,Treated]=E[Ypost(0)Ypre(0)X,ControlE[Ypost(0)−Ypre(0)∣X,Treated]=E[Ypost(0)−Ypre(0)∣X,Control
  • Conditional Parallel Trends relaxes this: trends would have been parallel after adjusting for certain observed covariates (like income, education, etc.). Formally:

E[Ypost(0)Ypre(0)X,Treated]=E[Ypost(0)Ypre(0)X,ControlE[Ypost(0)−Ypre(0)∣X,Treated]=E[Ypost(0)−Ypre(0)∣X,Control

where X represents conditioning covariates.


Why Does This Matter?

Without parallel trends, the simple DiD estimate could be biased:

  • If Riverbend was already on a faster trajectory, we'd wrongly attribute all improvement to the policy.

  • If differences stemmed from pre-existing factors, not treatment, the causal claim would fail.

Thus, parallel trends are about having a valid counterfactual — imagining what would have happened to Riverbend if no policy had been enacted.


Building Causal Intuition

We note:

  • DiD is a causal method for observational data. It assumes that treatment is not randomized but that trends are comparable.

  • Selection Bias (from Chapter 2) threatens DiD unless trends are truly parallel.

  • Ignorability (Chapter 2) underpins conditional parallel trends: after controlling for key covariates, assignment is as-good-as-random over time.

🔗 If you need a refresher on ignorability or selection bias, please revisit Chapter 2.arrow-up-right


Deep Dive: LASSO and Normalized Differences (ND)

When parallel trends do not hold unconditionally, we must find covariates that restore them conditionally.

LASSO: Variable Selection

Why LASSO? We often have many potential predictors — demographics, economics, crime rates, etc. LASSO (Least Absolute Shrinkage and Selection Operator) automatically selects variables that best predict treatment.

  • It shrinks small coefficients to zero, keeping only important covariates.

  • This prevents overfitting and identifies a sparse, interpretable model.

In our context: We use pre-treatment data to run LASSO and find covariates that predicted which cities were treated. These covariates are critical — they likely influenced both treatment and trends.


Normalized Difference (ND): Checking Balance

After selecting covariates, we check whether the treated and control groups differ too much.

ND formula:

f(x)=xe2piiξxf(x) = x * e^{2 pi i \xi x}
Norm.Diffω=Xˉω,TXˉω,C(Sω,T2+Sω,C2)/2Norm. Diffω= \frac{\bar{X}_{\omega,T} - \bar{X}_{\omega,C}}{\sqrt{(S^2_{\omega,T} + S^2_{\omega,C})/2}}
  • Xˉω,T\bar{X}_{\omega,T}Xˉω,T​: Mean of covariate X in treated group

  • Xˉω,C\bar{X}_{\omega,C}Xˉω,C​: Mean of covariate X in control group

  • S^2: Variances

Xˉω,T\bar{X}_{\omega,T}

Mean of covariate X in treated group

Xˉω,C\bar{X}_{\omega,C}

Mean of covariateX in control group

S2S^2

Variances

Threshold:

  • ND < 0.25 is usually considered acceptable balance (Imbens & Rubin, 2015).

If covariates are unbalanced, naive DiD is risky.


Insight: Mind the Gap (and Trend)

Returning to Riverbend and Oakville:

  • If Riverbend’s employment was already improving faster before the minimum wage hike, unconditional DiD fails.

  • Conditioning on key covariates (say, % college-educated residents, median income) might restore parallel trends.

Takeaway: Without parallel trends (unconditional or conditional), DiD estimates are biased and misleading. Good causal inference depends on crafting a proper counterfactual.

And beyond research — this explains why you shouldn't believe every online claim: Just because someone says "I took Supplement X and got better" doesn’t prove causality. Without randomization (Chapter 2) or proper trend comparison (Chapter 3), we can't distinguish true effects from coincidences, confounders, or pre-existing trajectories.


We will:

Use LASSO to select key covariates predicting treatment Check Normalized Differences (ND) for balance Plot Pre-Treatment Trends to visually assess parallelism


Step-by-Step Python Code


🔎 You are the researcher.

Scenario:

  • Set the treated city to "Santa Barbara".

  • Set the treatment month to "2019-07".

Instructions:

  1. Load chapter1log.csv again.

  2. Filter data before July 2019.

  3. Run LASSO to select predictive covariates.

  4. Compute Normalized Differences (ND) for those covariates.

  5. Plot pre-treatment outcome trends.

Questions:

  • Which covariates had the largest imbalance?

  • Were the trends visually parallel before July 2019?

  • Would you trust an unconditional DiD estimate in this case? Why or why not?


What You Learned

How DiD relies on parallel trends Why unconditional trends might fail How to use LASSO and ND to diagnose imbalance How to visually inspect trends before treatment

Last updated