Preface

In a world overflowing with data, it has never been more important to distinguish correlation from causation. Causal inference gives us the tools to ask “What would happen if we did X?” rather than just “What tends to happen with X?”. Most studies in health, social, and behavioral sciences aim to answer causal questions rather than associative ones (An Introduction to Causal Inference - PMC). Yet for many newcomers (and even experienced analysts), the journey into causal inference can feel intimidating. I wrote A Guide to Causal Inference to make this journey accessible and engaging, while preserving the technical rigor that the subject demands.

Motivation and Audience: As an applied researcher, I often found myself translating between two worlds—the intuitive, story-driven understanding of cause and effect, and the formal, mathematical frameworks developed by statisticians and data scientists. Bridging this gap is challenging. Textbooks often lean heavily on math, leaving practitioners thinking causal inference is beyond their reach. On the other hand, popular science books tell great stories but sometimes shy away from the equations that give precision to those stories. This guide is my attempt to strike a balance. It is written for a broad audience: from undergraduates and curious beginners with minimal math background, to seasoned practitioners and researchers looking to solidify their understanding. Whether you are a policy analyst evaluating interventions, a business analyst exploring A/B test results, or a student encountering causal inference for the first time, this book aims to meet you where you are. You’ll find intuitive explanations and engaging examples if you’re just starting out, with optional deep dives into formulas and derivations for those craving a bit more depth.

What Makes This Book Different: A Guide to Causal Inference is structured unlike a traditional textbook. Each chapter follows a five-part learning model — Story, Concept, Insight, Practice, and Task — to cater to different learning styles and ensure both intuition and rigor:

  • Story: Each chapter opens with a real-world story or anecdote. This isn’t just a gimmick; the story sets the stage for the chapter’s central idea in a relatable way. By seeing causal concepts play out in an everyday scenario or an historical case, readers can develop an intuitive feel for the problem before any equations appear. For example, a chapter might begin with the tale of a new education program in two schools or a policy change in one state but not another. These narratives are engaging and accessible, drawing you into a mystery: What caused the observed change? How could we know?

  • Concept: After the story, the chapter shifts to the theoretical core. Here is where we define terms and introduce the formal machinery of causal inference, including some equations (don’t worry, we step through them slowly). This section provides the technical rigor. If the Story poses a question, the Concept section gives us the language and tools to answer it. We’ll introduce frameworks like the potential outcomes model and counterfactuals, discuss why randomization helps us infer causation, derive the Difference-in-Differences estimator, draw directed acyclic graphs (DAGs) to visualize assumptions, and more. Key assumptions are highlighted (for instance, the fundamental problem of causal inference — that we can never observe both potential outcomes for the same unit (Potential Outcomes Framework for Causal Inference: Conceptual Foundations of Causal Inference Cheatsheet | Codecademy)). The goal is clarity: even complex ideas (like conditional parallel trends or regression discontinuity) are broken down step-by-step. Equations are included to be illustrative, not intimidating; they are accompanied by plain-language interpretations.

  • Insight: Knowing the theory isn’t enough; we need to connect it back to the intuition. In the Insight section, we bridge the gap between the abstract concepts and the opening story. This is a reflective, aha! moment portion of each chapter. We revisit the story’s scenario and examine it through the lens of the new concepts, extracting lessons and clarifying any counterintuitive points. If you’ve ever thought “Okay, I see the formula, but what does it mean in our real example?”, the Insight section is for you. It solidifies understanding by translating formulas back into real-life implications. By the end of it, the characters or scenarios from the story have “learned their lesson,” and so have we: our intuition and formal understanding come into alignment.

  • Practice: To add further depth, each chapter includes a Practice section where I share examples from my own applied research (or illustrative case studies from the literature) that apply the chapter’s concepts. These are essentially “war stories” or case examples demonstrating how causal inference methods are used in practice, complete with nuances and pitfalls. This might be a brief description of how I used matching to control for confounders in an epidemiological study, or how Difference-in-Differences was used to evaluate a new policy’s effect on employment in my research. These sections show the messiness of real data and how the idealized concepts get implemented: you’ll see the thought process of setting up a causal analysis, the choice of method, and interpretation of results in context. (These applied examples also emphasize that causal inference is truly interdisciplinary—spanning economics, medicine, psychology, policy, etc.—and show how the same principles adapt to different fields.)

  • Task: Finally, each chapter ends with a hands-on Task — essentially a worksheet or exercise set. Here, you get to be the investigator. The tasks are designed with real or simulated data and guided questions to walk you through an analysis. You might be given a small dataset and asked to estimate a treatment effect, check an assumption, or identify confounders. These exercises solidify the material by practice. They are also great for instructors who may use this book in a course: the Tasks can serve as homework or lab assignments. Solutions or hints (not included here) will help self-learners check their understanding. By actively engaging with the data and questions, you transition from just reading about causal inference to doing causal inference.

I believe this structure (Story → Concept → Insight → Practice → Task) creates a kind of learning cycle. First, it piques curiosity and grounds our thinking (Story). Next, it builds knowledge and formal understanding (Concept). Then it fosters intuition and reflection (Insight). After that, it demonstrates application and relevance (Practice). And finally, it reinforces learning by doing (Task). This cycle repeats and builds as the book progresses.

How to Use This Book: Different readers may interact with this book in different ways—and that’s by design. If you are completely new or just skimming, you might read all the Stories and Insights first, to get a big-picture sense of the concepts in plain language. If you are using the book as a primary textbook for a course or self-study, you might go chapter by chapter, doing the Concepts and Tasks diligently in order. More advanced readers might jump straight to specific chapters that interest them (for example, you might skip to the chapter on Directed Acyclic Graphs or Synthetic Controls if that’s most relevant to your work, and that should be fine—the chapters are structured to be mostly self-contained). However, there is a cumulative logic: early chapters cover fundamentals (like counterfactuals and randomization) that later chapters build on (like difference-in-differences and beyond). So if you find something in a later chapter unfamiliar, flipping back to earlier chapters or the glossary will help.

Throughout the book, you will notice an emphasis on assumptions and study design. All causal claims rest on assumptions—some testable, many not. Rather than hide these, we discuss them openly. The languages of causal inference (potential outcomes notation, DAGs, structural equations) are introduced to help you formulate and scrutinize these assumptions (An Introduction to Causal Inference - PMC). Causal inference has undergone significant evolution in recent decades, with what Judea Pearl called “paradigmatic shifts” in how we approach data. Methods that once belonged to advanced graduate econometrics or biostatistics courses are now becoming part of the standard toolkit in data science. This book tries to keep pace with those developments. In the final chapter, we even look at cutting-edge methods from 2019 onward, giving you a snapshot of where the field is headed (because the causal inference world in 2025 is certainly not the same as it was in 2005).

A note on mathematical level: I’ve tried to keep the mathematics as simple as possible, but no simpler. When equations are presented, they are there to clarify relationships unambiguously, not to scare or overwhelm. If you find an equation daunting, the surrounding text will usually explain it in words. Don’t be discouraged if you skip some of the math on first read—you can absolutely grasp the main ideas without deriving every formula. Over time, you may find that the equations start to make sense as succinct summaries of the concepts.

Emerging Tools and Resources: Causal inference is a fast-moving field. New techniques and software libraries are emerging that make sophisticated methods easier to implement (we’ll touch on some of these in Chapter 9). I encourage you to explore the references provided. Each chapter includes pointers to key papers (for those who want to delve deeper into theory) and links to code or libraries (for those who want to implement the methods). The combination of conceptual understanding and practical tools will empower you to not only understand others’ analyses but also conduct your own. By the end of this book, terms like “selection on observables,” “back-door criterion,” “instrumental variable,” “parallel trends,” “synthetic control,” and “regression discontinuity” will be part of your vocabulary, and more importantly, you’ll know when and how to use these tools in real-world problems.

In summary, my motivation in writing this guide is deeply personal: I wished for a resource like this when I was learning these concepts (a one-shop stop curriculum!). I hope the mix of storytelling and rigor in this book makes causal inference approachable without diluting its power. Ultimately, understanding causality is crucial if we want to make informed decisions—be it in policy, business, medicine, or daily life. My aspiration is that this book serves as a friendly companion in your causal inference journey, one you can return to when you need to recall a concept or find inspiration in an example. I invite you to read actively: engage with the stories, work through the exercises, question the assumptions, and apply the ideas to problems you care about. Causal inference, at its heart, is detective work for data—together, let’s unravel some mysteries and learn how to trust our conclusions about cause and effect.

Thank you for joining me on this adventure in learning “not just what happened, but why.” Now, let’s dive into the contents of the book and see what lies ahead.

Disclaimer: I do not pretend to know it all or be remotely as smart as other brilliant leading figures in the field who have invented methods and compiled procedures. My sole purpose is to propagate

Last updated