Causal reasoning for social scientists

What Would Have Happened Otherwise?

Javier Garcia-Bernardo, ODISSEI Social Data Science (SoDa)

Starting question

This pattern holds across thousands of children.

Is it causal?

The claim as a DAG (Directed Acyclic Graph)

One explanation for Anna’s €14k advantage:

If that arrow is real, neighborhoods are a policy lever.

But an arrow is a hypothesis. What else could draw the same €14k gap?

Why this might be wrong: selection

  • Anna’s parents could afford the rich neighborhood.
  • They also paid for tutors, networks, university, directly.
  • The €14k gap could exist even if neighborhoods did nothing.

The fundamental problem of causal inference

What we want to know: What would Anna have earned if she had grown up in Daniel’s neighborhood?

We see Anna in one life. The other, the counterfactual, is missing.

Potential outcomes: the binary case

Half the table is missing. That missing half is the whole problem.

Child Observed nbhd Yrich Ypoor
Anna Rich nbhd €42k ?
Bo Rich nbhd €39k ?
Carlos Poor nbhd ? €40k
Daniel Poor nbhd ? €28k
Eva Poor nbhd ? €30k
Finn Rich nbhd €28k ?
Grace Rich nbhd €39k ?
Hassan Poor nbhd ? €26k
  • \(Y^{\text{rich}}\) and \(Y^{\text{poor}}\) are each child’s two potential outcomes.
  • Anna’s causal individual effect is \(Y^{\text{rich}} - Y^{\text{poor}}\). We will never see it.
  • The whole job of design is to recover the missing column.

Two causal languages

Goal: estimate how an intervention (treatment, explanatory variable, independent variable, or predictor) affects an outcome (response or dependent variable).

Two formalisations describe the same missing-counterfactual story.


Pearl, structural causal models

Causal claims represented as DAGs.

Identification = closing back-door paths.

Neyman–Rubin, potential outcomes

A causal effect is the gap between two potential outcomes of the same unit, \(\mathbb{E}[Y^{\text{rich}} - Y^{\text{poor}}]\)

Identification = recovering the (average) missing information.

Unit X Y(0) Y(1)
Anna 1 ? €42k
Bo 1 ? €39k
Daniel 0 €28k ?
Eva 0 €30k ?

Estimand, estimator, estimate

Estimand is the target (the dish you want to make). Estimator is the method (the recipe). Estimate is the result (the plate you get).

Confusing them is the most common mistake in applied work.

What it means Neighborhood example
Estimand The causal quantity we want \(\mathbb{E}[Y^{\text{rich}} - Y^{\text{poor}}] = \mathbb{E}[Y^{\text{rich}}] - \mathbb{E}[Y^{\text{poor}}]\)
Estimator The recipe we use regression, matching, DiD, IV
Estimate The number we get “€6k higher earnings”

The estimand comes from your goal and the counterfactual question, not from the method.

Different estimators can target the same estimand, under their assumptions.

If those assumptions fail, you still get an estimate. Just not the one you ordered.

Why can’t we just compare?

\[ \underbrace{\mathbb{E}[Y^{rich}] - \mathbb{E}[Y^{poor}]}_{\text{what we want}} \;\;\neq\;\; \underbrace{\mathbb{E}[Y \mid \text{rich nbhd}] - \mathbb{E}[Y \mid \text{poor nbhd}]}_{\text{observable}} \]

The fundamental problem: Causal inference requires averaging over a column that does not exist. To recover Anna’s missing counterfactual, we need a design and assumptions.

Questions to ask in a causal study

  1. What is the treatment? Growing up in a rich neighborhood
  2. What is the estimand? The impact on future income of growing up in a rich neighborhood compared to a poor neighborhood
  3. What is the comparison? Naive comparisons don’t work. We need a design, and assumptions!
  4. What assumption makes it credible?
  5. Why might that assumption fail?

The temple of causal inference

Core assumptions. Be prepared to defend them

Design: regression, what most papers do

Regression with controls, lm(y ~ x + controls), is not outside causal inference. It just hands you the hardest assumption to defend.

\[ Y_i = \alpha + \color{#c44e52}{\tau}\,\color{#0b789d}{T_i} + \beta' \color{#6f7681}{X_i} + \varepsilon_i \]

  • \({Y_i}\) is the outcome: future income for chlid \(i\)
  • \(\color{#0b789d}{T_i}\) is the treatment: did child \(i\) grow up in a rich neighborhood
  • \(\color{#c44e52}{\tau}\) is the effect we want: how much the neighborhood shifts income
  • \(\color{#6f7681}{X_i}\) are the controls: the confounders we adjust for, like family wealth


  • Comparison: untreated units made comparable by conditioning on the controls \(X\)
  • Key assumption: once we control for \(X\), no important unmeasured confounders remain. This breaks under omitted variables, bad controls, weak overlap, or the wrong functional form.

Regression as imputation: control the confounder

Model → impute → compare (G-computation, Robins). Unbiased only if the controls close every back-door path,, the model is correctly specified, and we do not control for bad variables like colliders or mediators.

\(\widehat{\text{income}} \;=\; \underbrace{\text{€}28\text{k}}_{\hat\alpha} \;+\; \underbrace{\text{€}0\text{k}}_{\hat\beta}\cdot\text{rich} \;+\; \underbrace{\text{€}12\text{k}}_{\hat\gamma}\cdot\text{wealth}\)

…and what happens when you omit it

Drop family wealth from the model and the groups are not comparable. Rich-neighborhood kids get compared to poor-neighborhood kids who are mostly less wealthy. The neighborhood coefficient absorbs this difference.

\(\widehat{\text{income}} \;=\; \underbrace{\text{€}31\text{k}}_{\hat\alpha} \;+\; \underbrace{\text{€}6\text{k}}_{\hat\beta}\cdot\text{rich}\)

You still get a number, just not the one you want.

Bad controls: Three patterns that break regression

The model still runs. The numbers are still numbers. They just don’t mean what you think.

Tip. Predicting future outcomes rules out reverse causality and reduces the risk of collider bias.

The ideal: a randomized experiment

Regression breaks easily because confounders are everywhere: schools, parental attitudes, pollution, etc. We can never be sure we are measuring them all.

The ideal: randomize neighborhoods. Randomization makes the groups exchangeable, so every confounder, measured or not, cancels out.

The observed difference in income between people growing up in rich vs poor neighborhoods is now the causal effect.

Moving to Opportunity: the benchmark experiment

In the 1990s, the U.S. Department of Housing and Urban Development (HUD) randomly assigned 4,604 families in high-poverty public housing to receive different housing-voucher offers or no voucher offer.

Simplified design

  • Experimental voucher: must move to a census tract with <10% poverty
  • Control: no voucher

One potential estimand

  • ITT (Intention-to-Treat) = effect of being offered the voucher regardless of whether the family actually moved.

Chetty, Hendren & Katz (2016), Figure 2A. Y-axis: experimental-vs-control ITT on adult income ($). X-axis: child’s age at random assignment. Younger movers benefit; the effect vanishes for kids assigned at older ages.

Questions to ask in a causal study: MTO

  1. Treatment? Voucher offer to move to a lower-poverty neighborhood
  2. Estimand? Effect of being offered a voucher young on adult earnings
  3. Comparison? Voucher lottery winners vs losers
  4. Credible? Random assignment balances groups; winning is unrelated to the child’s age (10 vs 14)
  5. Fails? Specific population, not the key estimand.

Note that the estimand differs from what we actually want to know: the effect of moving to a lower-poverty neighborhood. The paper uses instrumental variables for it.

Designs: a menu

Designs are different ways of rebuilding the missing counterfactual. The source of variation, not the method, is what makes a design credible.

The core move: isolate exogenous variation in X

Every design looks for variation in treatment X that is plausibly exogenous with respect to Y’s potential outcomes, variation that doesn’t come from the same forces driving Y.

Design: Instrumental variables (IV), the shifter

An instrument is a source of variation that shifts whether someone receives the treatment, but affects the outcome only through that treatment. The MTO voucher is an instrument: it does not directly raise your future income. The estimand is different: we estimate the effect of moving for families whose move was caused by the voucher, not the effect of merely being offered a voucher.

The red dashed arrow is the exclusion restriction: Z must affect Y only through X.

  • Source of variation: an exogenous shifter \(Z\) that moves treatment but not the outcome directly
  • Comparison: units whose treatment differs because of the shifter
  • Assumption: the shifter is relevant (a real first stage) and excluded (no direct path to \(Y\)). It breaks when the exclusion restriction fails or the first stage is weak.

Design: Regression discontinuity (RDD), the breaker

A regression discontinuity design uses a rule with a sharp cutoff to create a near-experiment (a school-district boundary, an income threshold for a housing voucher, a test-score line). Units just above and just below the cutoff are assumed to be similar, but only one side receives the treatment.

  • Source of variation: a sharp threshold or cutoff in a running variable
  • Comparison: units just above vs just below the cutoff
  • Assumption: continuity, units can’t precisely sort around the cutoff, so the two sides are otherwise comparable. It breaks under manipulation, or sorting at the cutoff, and the effect is only local to the threshold.

Lecture 2 returns to this as a spatial boundary design (school districts, administrative borders).

Design: Difference-in-differences (DiD), exogenous timing

A difference-in-differences design compares how outcomes change over time in treated neighborhoods versus similar untreated neighborhoods. Example: A city builds a new metro line in 2018. The neighborhoods it touches see better child outcomes by 2024. The neighborhoods it doesn’t touch see no change.

  • Source of variation: exogenous timing of a shock across groups
  • Comparison: the change in the treated vs the change in the untreated. By taking the difference, time-invariant unmeasured confounders are canceled out.
  • Assumption: parallel trends: the groups would have moved together without the shock. It breaks under diverging pre-trends, anticipation, compositional change, or spillovers.

DiD is closely related to fixed-effects (within) models, both exploit within-unit variation over time.

Design: within-unit variation (fixed effects)

A fixed-effects design compares units to themselves, or to very similar units within the same group. It asks whether outcomes change when exposure changes within the same unit (e.g. family, school, neighborhood, or cohort). This removes stable background differences, so identification comes from the remaining within-unit variation, such as siblings moving at different ages or cohorts facing different peer mixes.

  • Source of variation: random variability within the same unit once stable characteristics are absorbed by fixed effects
  • Comparison: the same unit observed at different doses, ages, or peer compositions
  • Assumption: after comparing within the same family/school, the remaining exposure differences are not caused by hidden factors that also affect outcomes.

Source: observed similarity

When the world hasn’t run an experiment, there’s no shock, no shifter, what’s left? Find units that look the same on what we measured.

Two designs share this source:

  • Matching / IPW make treated and untreated groups look similar on measured X: matching by pairing, IPW by reweighting.
  • Regression with controls does the same thing implicitly, with stronger assumptions about functional form

Design: matching, find an untreated twin

Matching tries to find an untreated twin for every treated unit.

Among the poor-neighborhood children, find the ones who look most like Anna, same family income, same parents’ education, and compare.

  • Comparison: untreated children who look like the treated ones on observables
  • Assumption: selection on observables, no important unmeasured confounders remain. It breaks when the variables that matter most go unobserved.

Designs change the target population

A common misread is that all designs estimate the same estimand. They don’t. The method picks the population.

  • RCT / matching: the experimental or overlap sample
  • DiD: the treated group’s missing trend
  • IV: the compliers, units whose treatment changes when the instrument moves
  • RDD: units local to the cutoff

No identification strategy is causal by default

Design …only as good as
RCT randomization actually held
IV relevance AND exclusion
DiD the parallel trends bet
RDD no sorting at the cutoff (continuity)
Fixed effects no time-varying confounders
Matching the observed covariates
Regression the controls + functional form

Every design is a way of approximating the missing counterfactual.

“If the estimates you get are not the estimates you want, the fault lies in the econometrician, not the econometrics.” (Angrist & Pischke)

Binary and continuous treatment

  • Binary: one missing counterfactual per child. Estimand is a single number.
  • Continuous: many missing counterfactuals. Estimand is a dose-response function.

Jigsaw: apply one identification design

Goal: practice choosing a credible comparison.

Teams: each team gets one identification strategy/design Matching/IPW · Fixed effects · DiD · IV · RDD · Experiment

Step 1 — Design it, 10 min Pick a causal question and propose a design. Answer the 5 questions:

  1. What is the treatment?
  2. What is the outcome?
  3. What is the comparison group/counterfactual?
  4. What is the source of identifying variation?
  5. What assumption makes the comparison credible?

Step 2 — Defend it, 3 min per team Explain your design. Other teams challenge the a ssumption.

Step 3 — Recap within groups, 5 min What made each design credible? What could break it?

A preview of the spatial problem: Funding people in nhds

You want to study causal impact of giving money to some people in some postcodes. Anna lives in a treated postcode; Daniel lives just across the street in an untreated one.

Space stresses all assumptions:

  • No interference: Anna can change Daniel’s outcome through shared classmates, friends, shops, and public spaces. Daniel is untreated, but not necessarily unaffected.
  • Positivity / overlap: comparable treated and untreated POSTCODE6 areas may not exist. If funding is targeted only to the poorest areas, then some places have no realistic untreated counterpart, and affluent places have no realistic treated counterpart.
  • Consistency / well-defined treatment: “treated postcode” is likely not one sharp dose: it depends on how many people were treated in each postcode
  • Exchangeability: adjacent postcodes share schools, labour markets, police, housing… there are likely unmeasured confounders impacting both the treatment and the outcome.

If your question is policy impact

If your treatment is a policy intervention, the setup is often:

  • binary treatment: exposed versus not exposed
  • not randomized: adoption depends on institutions, timing, politics, or place

The SoDa materials at causalpolicy.nl are a strong next step for exactly that setting:

  • a one-day workshop on causal effects of policy interventions
  • designed for people already comfortable with regression models
  • focused on impact-assessment designs when randomized experiments are usually not feasible
  • covering methods such as pre-post, DiD, interrupted time series, RDD, and synthetic control

So this lecture gives the general counterfactual backbone. causalpolicy.nl then picks up the common policy-evaluation case in more depth.