Chapter 2. Randomization and Selection Bias

Story: The Lucky Draw (Why Randomize?)

Imagine a new scholarship program at a university. Hundreds of students apply, but only a few scholarships are available. To decide fairly, the university holds a lottery: names are drawn randomly from the applicant pool. Some students win the scholarship (the treated group), others do not (the control group).

Months later, researchers observe that students who received the scholarship performed better academically than those who did not. Can we confidently say the scholarship caused better academic outcomes?

Because the recipients were chosen randomly, the answer is "Yes." Randomization ensures that, on average, the treated and control groups were similar at baseline — in motivation, background, ability. Thus, differences in outcomes can credibly be attributed to the treatment itself.

Contrast this with a different scenario: suppose the university awarded scholarships based on GPA. Higher-achieving students would be more likely to get scholarships, making it unclear whether better outcomes were due to the scholarship or pre-existing differences. This is selection bias in action.

Concept: Why Experiments Are the Gold Standard

At the core of causal inference is a challenge: how to know if the treatment, and not some confounding factor, caused the observed difference in outcomes?

Randomization solves this by breaking the link between treatment assignment and potential outcomes.

Formally, randomization implies:

T⊥(Y(0),Y(1)) OR T \perp (Y(0), Y(1))

That is, treatment assignment T is independent of both potential outcomes Y(0) (no treatment) and Y(1) (treatment).

Definitions

Selection Bias: Systematic differences between treated and control groups unrelated to treatment.
Exchangeability: Treated and untreated groups are comparable in distribution.
Ignorability (Unconfoundedness): Given covariates X, treatment assignment is independent of potential outcomes.

If randomization holds, groups are exchangeable and we can simply compare outcomes to estimate causal effects.

Ignorable Treatment Assumption

When randomization is not possible, we need the Ignorable Treatment Assumption:

(Y(0),Y(1)) ⊥ T ∣ X OR (Y(0), Y(1)) \perp T \mid X

Meaning that after controlling for covariates XX, treatment assignment is "as if random." Later chapters will introduce methods (like matching and propensity scores) to try to recover ignorability.

But for now, understand:

Randomization ensures comparability by design.
Selection bias arises when treatment is related to potential outcomes.

Insight: Seeing Through Bias

Returning to our story, we now have the language to explain what happened. Randomization created two groups — scholarship recipients and non-recipients — that were, on average, identical in all respects except for receiving the scholarship. As a result, any differences in future performances could credibly be attributed to the program itself. The random lottery protected us from the bias that might occur if, say, more motivated students were more likely to apply or be selected.

Randomization works because it severs the link between treatment assignment and potential outcomes: T⊥(Y(0),Y(1)) or T \perp (Y(0), Y(1)). In randomized experiments, the treated and untreated groups are exchangeable — any difference we observe later is not due to pre-existing differences but to the treatment itself.

But without randomization, as in observational studies or natural settings, selection bias is always lurking. If we simply compare treated and untreated groups without ensuring comparability, we risk confusing association for causation. The group that received the treatment might have differed in important ways — healthier, wealthier, more educated, or more motivated — and these factors, not the treatment itself, could explain differences in outcomes.

This is why causal inference is hard— and why randomization is so celebrated when possible.

A Real-World Warning: Don't Believe Everything You Hear Online

This principle isn’t just academic — it shapes how we should think about real-world claims, especially around health, wellness, and miracle cures.

How many times have you heard someone say something like:

"My aunt drank blueberry smoothies every day and cured her cancer!" "I started taking supplement X and my chronic pain disappeared!"

These testimonials feel powerful, but without randomization (or at least careful observational study designs), they tell us very little about causality. Maybe the blueberry smoothies helped. But maybe the cancer would have gone into remission anyway. Maybe the pain relief was due to another treatment, or a placebo effect, or natural recovery. Without controlling for confounding factors — like age, prior health, genetics, access to other treatments — we can’t disentangle true effects from coincidence.

In causal inference, we always ask: "Compared to what?" Compared to what would have happened without the supplement, smoothie, or therapy? If we can't credibly reconstruct that counterfactual — ideally through randomization or a robust design — we should be skeptical of strong causal claims.

This isn't to dismiss personal experiences. People's stories matter. But scientifically, evidence for causal claims demands comparability — and randomization remains the gold standard for achieving it.

Whenever you hear an impressive claim about a cure, policy, or product, pause and ask: Was there randomization? If not, what might explain the effect besides the intervention itself?

This small habit of critical thinking is the essence of causal inference — and it will serve you far beyond research: in policy, business, healthcare, and everyday decision-making.

Practice: Walkthrough of a Simple Randomized Controlled Trial (RCT)

Suppose we are tasked with evaluating a new drug designed to lower blood pressure. Let's simulate and analyze an RCT.

Step 1: Simulate Participants

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Set seed for reproducibility
np.random.seed(42)

# Generate 200 participants
n = 200

# Baseline characteristics (e.g., age)
age = np.random.normal(50, 10, n)

# Randomly assign treatment
T = np.random.binomial(1, 0.5, n)

# Potential outcomes
Y0 = 140 + np.random.normal(0, 10, n)  # untreated blood pressure
Y1 = Y0 - 10 + np.random.normal(0, 5, n)  # treatment lowers BP by ~10 units

# Observed outcome
Y_obs = np.where(T == 1, Y1, Y0)

# Assemble dataset
data = pd.DataFrame({
    'Age': age,
    'Treatment': T,
    'BloodPressure': Y_obs
})

print(data.head())

Step 2: Analyze Outcomes

# Mean BP by group
mean_treated = data[data['Treatment'] == 1]['BloodPressure'].mean()
mean_control = data[data['Treatment'] == 0]['BloodPressure'].mean()

ate = mean_treated - mean_control

print(f"Average Blood Pressure (Treated): {mean_treated:.2f}")
print(f"Average Blood Pressure (Control): {mean_control:.2f}")
print(f"Estimated Treatment Effect (ATE): {ate:.2f}")

You should find that the treated group's blood pressure is about 10 units lower — matching the designed effect.

Step 3: Visualize Results

# Boxplot
plt.figure(figsize=(8,5))
data.boxplot(column='BloodPressure', by='Treatment')
plt.title('Blood Pressure by Treatment Group')
plt.suptitle('')
plt.xlabel('Treatment (0=Control, 1=Treated)')
plt.ylabel('Blood Pressure')
plt.show()

Key Observations

Treatment was randomly assigned.
Groups are comparable.
Simple comparison of means yields unbiased estimate.

Task: Worksheet — Designing Your Own RCT

Now it's your turn.

Suppose you are evaluating a new study technique aimed at improving students' test scores.

Instructions:

Simulate 150 students.
Randomly assign half to use the new technique (treatment) and half to study as usual (control).
Assume without the new method, average test score = 75.
Assume the new method improves scores by 5 points.
Add random noise (e.g., standard deviation = 8).

Steps:

Simulate potential outcomes Y(0) and Y(1).
Generate observed outcomes Y based on treatment.
Estimate and report the Average Treatment Effect (ATE).
Plot the test scores by treatment group.

Hint: Follow the structure shown in the Practice section!

Reflection Questions:

How does randomization ensure comparability?
What would go wrong if you had allowed students to choose whether to use the new method?

What You Learned

Randomization breaks the link between treatment and outcomes, ensuring groups are comparable.
Selection bias can severely mislead causal inference when treatment assignment is non-random.
RCTs provide the gold standard for causal claims because they create exchangeable groups.
When randomization is not possible, assumptions like ignorability are necessary — and methods like matching or stratification can help recover causal effects (later chapters).

"Randomization turns the invisible (counterfactual) into something we can credibly estimate."

PreviousChapter 1. Counterfactuals and Potential Outcomes NextChapter 3. Conditional and Unconditional Parallel Trends

Last updated 3 months ago