Chapter 1. Counterfactuals and Potential Outcomes
Story: Imagining Alternate Realities
Imagine you are offered two job opportunities after graduating from college. One job is in New York City, bustling with opportunity but burdened by a high cost of living. The other is in a quieter town, offering a slower pace of life and a comfortable salary. You choose New York.
Fast forward one year. You wonder: "What if I had chosen the quieter town? Would I have been happier? Would I have saved more money? Would my career have advanced differently?"
You can observe only the life you lived — in New York. The life you didn't live is hidden. That invisible "what if" is called a counterfactual. In causal inference, the goal is to estimate outcomes we cannot observe. We are forever comparing reality to an imagined, unobserved alternative.
When policymakers evaluate a new program, when doctors compare treatments, or when companies assess marketing strategies, they are all, at their core, asking: What would have happened under a different choice? This chapter introduces the formal machinery that allows us to think rigorously about such unobservable comparisons.
Concept: The Potential Outcomes Framework
The Potential Outcomes Framework, often attributed to Jerzy Neyman and Donald Rubin, offers a clean and powerful way to formalize causal questions.
Suppose each individual (or unit) has two potential outcomes:
Y(1): the outcome if treated (e.g., took the job in New York)
Y(0): the outcome if not treated (e.g., took the job in the quieter town)
Each unit also has a treatment indicator:
T = 1 if the unit is treated
T = 0 if the unit is not treated
Observed Outcome: We only ever observe one of these two outcomes:
Y = T × Y(1) + (1−T) × Y(0) OR Y = T \times Y(1) + (1 - T) \times Y(0)
This leads to the Fundamental Problem of Causal Inference: for each individual, we can observe Y(1) or Y(0), but never both. We cannot directly observe individual-level causal effects.
Defining Causal Effects:
Individual Treatment Effect: \tau_i = Y_i(1) - Y_i(0)
Average Treatment Effect (ATE): ATE = \mathbb{E} [Y(1) - Y(0)]
The task of causal inference is to estimate causal effects — often the ATE — despite only observing one outcome per unit.
Key Assumptions:
Stable Unit Treatment Value Assumption (SUTVA): No interference between units, and no hidden variations of the treatment. My outcome depends only on my treatment, not on yours.
Consistency: The observed outcome corresponds to the treatment received.
These assumptions ensure that potential outcomes are well-defined and meaningful.
The Intuition: Causal inference isn't about predicting who will succeed. It's about comparing what did happen to what would have happened under different circumstances.
Insight: Why "What-If" Matters
Returning to our story, why is thinking about the "alternate reality" — the town you didn't choose — so important?
Suppose after a year in New York, you are exhausted, overworked, and broke. Does that mean the other option was better? Maybe — or maybe you would have been unhappy in the quieter town too, but for different reasons. Without considering the counterfactual, any conclusion is speculative.
This is why naive comparisons are dangerous. If we simply compare treated and untreated units (say, New York dwellers and small-town dwellers), differences in outcomes could reflect underlying differences between the groups, not the effect of the "treatment" itself.
Causal inference, built on potential outcomes, demands that we explicitly grapple with unseen possibilities. Only by imagining "what if" carefully — and estimating it systematically — can we make credible causal claims.
Good science depends on good counterfactual thinking.
Practice: Case Study — Evaluating a Job Training Program
Imagine a government launches a job training program to improve employment prospects. Our question is:
Does the program actually increase participants' earnings?
We define:
Y(1) = Earnings if enrolled in the program
Y(0) = Earnings if not enrolled
For each person, we observe two things:
T ∈ {0,1}: Whether they participated (T = 1) or not (T = 0)
Y: Their actual earnings
Challenge: We never observe both Y(1) and Y(0) for the same individual.
How Researchers Handle This
Randomized Experiments: Randomly assign people into treatment or control groups. Randomization ensures groups are comparable.
Matching: In observational studies, find "twins" for treated individuals — people who did not get treated but are similar on key characteristics (age, education, prior income).
The key idea: we try to reconstruct the missing counterfactual as credibly as possible.
Walkthrough with Python Code
We are given the following small dataset:
ID
Treated (T)
Earnings ($)
1
1
55,000
2
0
45,000
3
1
60,000
4
0
48,000
5
1
62,000
6
0
46,000
Our goal: Estimate the Average Treatment Effect (ATE).
Step 1: Set up the data
import pandas as pd
# Create the dataset
data = pd.DataFrame({
'ID': [1, 2, 3, 4, 5, 6],
'Treated': [1, 0, 1, 0, 1, 0],
'Earnings': [55000, 45000, 60000, 48000, 62000, 46000]
})
print(data)
Step 2: Calculate mean earnings
# Mean earnings for treated and untreated groups
treated_mean = data[data['Treated'] == 1]['Earnings'].mean()
control_mean = data[data['Treated'] == 0]['Earnings'].mean()
print(f"Mean earnings (treated): ${treated_mean:.2f}")
print(f"Mean earnings (control): ${control_mean:.2f}")
Step 3: Calculate ATE
# Average Treatment Effect (ATE)
ate = treated_mean - control_mean
print(f"Estimated ATE: ${ate:.2f}")
Expected Output:
Mean earnings (treated): $59000.00
Mean earnings (control): $46333.33
Estimated ATE: $12666.67
Interpretation: On average, individuals who went through the training program earned about $12,667 more than those who did not.
Quick Discussion Points
Exercise 1: Why might this simple difference be biased?
Because treated individuals might differ systematically from untreated ones (e.g., they might be younger, more motivated, or more educated).
Exercise 2: Suppose treated individuals had, on average, higher education levels. How would that affect your interpretation?
The estimated ATE could be biased upward. Part of the earnings difference might be due to education, not the training itself.
Key Lesson: To estimate causal effects credibly, we must control for confounders or use random assignment.
Task: Worksheet — Estimating Treatment Effects Yourself
Now it's your turn.
Here is a new dataset:
ID
Treated (T)
Earnings ($)
1
1
52,000
2
0
47,000
3
1
58,000
4
0
49,000
5
0
46,000
6
1
61,000
Instructions:
Calculate the mean earnings for treated and control groups.
Estimate the Average Treatment Effect (ATE).
Reflect: If you learned that treated individuals had, on average, more work experience, how would that affect your interpretation?
(Bonus) Write the Python code to solve this yourself!
What You Learned
Potential outcomes: Y(1) and Y(0) define causal effects.
The fundamental problem of causal inference: You can’t observe both Y(1) and Y(0) for the same person.
Estimating ATE: You compare treated vs. untreated means, but must watch out for confounding (more on that later).
Causal inference requires careful design: Randomization or matching is critical to make fair comparisons.
Causality is about asking the right "what-if" questions—and learning to answer them carefully.
Last updated