Spillovers and confounders everywhere
John Snow, On the Mode of Communication of Cholera (1855). Each black mark is a cholera death; deaths cluster around the Broad Street pump (the pumps are the “PUMP” labels at the street corners). Public domain.
The same five questions you will ask in the practical, applied to 1854.
| # | Question | Snow’s answer |
|---|---|---|
| 1 | What is the treatment? | Which company piped your water |
| 2 | What is the estimand? | Effect of a contaminated supply on the risk of cholera death |
| 3 | What is the comparison? | Neighbouring houses on the same streets served by different companies |
| 4 | What makes it credible? | Which pipe reached a house was fixed years earlier, as good as random given the household → exchangeability |
| 5 | Why might it fail? | If households chose their supplier, or the two service areas differed in wealth or sanitation → confounding returns |
Space puts the most pressure on exchangeability and no interference.
A “control” cell next door is then already partly treated, the comparison is contaminated.
Two cells with different exposure are almost never comparable on everything else.
These factors cluster across the map: a beautiful spatial pattern can hide causes at once.
Modifiable Areal Unit Problem (MAUP): change the scale of analysis and the answer can change too.
It stresses consistency (the label “livestock cell” means different things at 80 m vs 1 km) and positivity (the treated/control overlap shifts with the unit).
Pattern adapted from Dinesen and Sønderskov (2015), with illustrative 95% intervals: the trust–diversity effect is strong (and noisiest) at micro scales and fades by ~1 km. The practical’s 1 km grid picks one point on this curve, a deliberate choice that picks an estimand.
First meet the data, then run the first regression. The grid joins three sources on a 1 km raster: RIVM modelled NH₃ concentrations (2018–2024), firm locations & sectors from Bureau van Dijk (Orbis), and CBS neighbourhood covariates (population density, urbanity).
Part 0, meet the data (~10 min)
One row per 1 km cell. A few real rows:
| nh3 ’20 | nh3 ’24 | farms ’20 | farms ’24 | pop. dens. |
|---|---|---|---|---|
| 5.2 | 5.0 | 1 | 3 | 9 |
| 37.3 | 52.2 | 7 | 7 | 78 |
| 8.4 | 4.9 | 0 | 0 | 24 868 |
Step 1, OLS (~10 min): regress NH₃ on farm counts ± controls.
In Lecture 1 we fought confounding by adding controls to the regression. Here we fight spatial dependence the same way, by adding spatial terms to the model. The general one is the Spatial Durbin (SDM):
\[y = \color{#0b789d}{\rho\, W y} + \color{#c44e52}{\beta X} + \color{#e08214}{\theta\,W X} + \varepsilon\]
The two famous models just keep part of this:
\[\text{SAR: } y = \color{#0b789d}{\rho W y} + \color{#c44e52}{\beta X} + \varepsilon \qquad\qquad \text{SEM: } y = \color{#c44e52}{\beta X} + u,\;\; u = \lambda W u + \varepsilon\]
SAR when the outcome itself spreads (prices, congestion, a gas), our choice here (NH₃ drifts). SEM when the leftover clustering comes from unmeasured factors that happen to cluster in space (not from the outcome spreading): it corrects the standard errors but leaves the coefficients’ meaning unchanged. SDM adds neighbours’ covariates (\(\theta WX\)) directly. We use SAR here.
Moran’s I = spatial autocorrelation: do nearby cells have similar residuals? (≈ +1 clustered, 0 random, < 0 checkerboard.)
| Model | resid. Moran’s I | spatial param |
|---|---|---|
| OLS | 0.49 | n/a |
| SAR | −0.01 | ρ = 0.82 |
| SEM | −0.01 | λ = 0.90 |
| SDM | 0.01 | ρ = 0.78 |
The models drive residual Moran’s I from 0.49 → ~0: the clustering is absorbed. But that is spatial honesty, not identification.
W encodes who is a neighbour of whom, an assumption you choose, not data.
Three common ways to define neighbours:
Then row-standardise so each row sums to 1 → Wy is a neighbour average.
Why it’s a problem: change W and you change the results. No test tells you which W is right, so the spatial answer is only as credible as that choice.
Useful models of dependence, not designs for identification — unmeasured confounders are likely present.
# W = who-neighbours-whom (inverse-distance, 3 km)
w = DistanceBand(coords, threshold=3000,
binary=False, alpha=-3); w.transform = "R"
m_sar = ML_Lag(y, X, w=w) # ρ W y
m_sem = ML_Error(y, X, w=w) # λ in errors
m_sdm = ML_Lag(y, X, w=w, slx_lags=1) # + W X θslx_lags controls how many orders of neighbours’ X enter:
slx_lags=0 (SAR): \(y = \rho Wy + \beta X + \varepsilon\)slx_lags=1 (SDM): \(y = \rho Wy + \beta X + \theta_1\,WX + \varepsilon\)slx_lags=2: \(y = \rho Wy + \beta X + \theta_1\,WX + \theta_2\,W^2X + \varepsilon\) (adds neighbours-of-neighbours)In SAR/SDM (\(\text{SAR: } y = \color{#0b789d}{\rho W y} + \color{#c44e52}{\beta X} + \varepsilon\)) the outcome sits on both sides: I affect my neighbours, and they affect me back. So a change in \(X\) doesn’t move just one cell, it ripples out and partly returns.
spreg reports the impact decomposition:\[\underbrace{\text{direct}}_{\text{own cell}} + \underbrace{\text{indirect}}_{\text{spillover + feedback}} = \text{total}\]
Refit the same regression with spatial weights.
Step 2, SAR / SEM / SDM (≈10 min)
What to notice
A clean residual map is spatially honest SE, not causal identification.
Space made exchangeability harder via confounding, no-interference harder via spillovers, consistency / positivity harder via scale.
But geography provides sources of exogenous variation:
| Design | Spatial source of variation | Key question |
|---|---|---|
| IV | Distance, wind, lotteries | Does it shift exposure without directly changing outcomes? |
| DiD | Place-based shocks or staggered rollouts | Would treated and control places have followed parallel trends? |
| RDD | Boundaries or spatial cutoffs | Are units just across the border comparable? |
Each of the next slides takes one design: which assumption it rescues, what comparison it supplies, what new bet we make.
E-ZPass (Currie & Walker, 2011). Toll plazas were automated on a fixed national schedule. Local congestion fell. Birth outcomes improved near the plazas, not far away.
Rescues: exchangeability, the timing of E-ZPass is exogenous to local birth trends.
Practical link: Step 3 uses the same logic, cells that gained livestock farms 2020-2024 vs cells that didn’t.
Black (1999). Two houses 50 m apart, same street, opposite sides of a school attendance boundary. Same bakery, bus stop, neighbors. Different school.
Rescues: exchangeability, locally. The boundary slices off a comparison group that is similar on everything except treatment.
Practical link: the NH₃ grid doesn’t have school boundaries, but we could imagine a municipal manure regulation boundary as the spatial discontinuity for an extension.
The estimand becomes local, the effect near the line, not the population-average effect.
Deryugina, Heutel, Miller, Molitor & Reif (2019, AER). They use daily changes in wind direction as an instrument for fine-particulate (PM₂.₅) exposure, and estimate its effect on mortality and health-care use among older Americans.
Rescues: exchangeability. Where the wind blows is plausibly unrelated to health-care use
Practical link: an NH₃ extension would use wind direction × upwind farm density as a daily instrument for cell-level NH₃, the same trick as Deryugina et al.
The exclusion restriction is untestable, it’s an assumption you defend with theory.
The timing design, on the NH₃ grid, the comparison this lecture builds toward.
Cells that gained > 1 farm (2020 → 2024) vs cells that didn’t.
Real grid: treated cells sit higher, with a hint of faster decline before 2020.
DiD trusts that the controls are untreated. In space they often aren’t, NH₃ drifts across cell boundaries.
Cell B looks like a control on the map but is partly treated by Cell A’s farm. The no-interference assumption fails in space. We look at three ways to respond.
Keep the timing comparison by taking first-differences (\(\Delta Y_i = Y_{i,2024} - Y_{i,2020}\)), but correct for spillovers with SAR:
\[\Delta Y_i = \alpha + \color{#6a51a3}{\rho\,(W\Delta Y)_i} + \color{#0b789d}{\tau\, T_i} + \varepsilon_i\]
Impact decomposition (real grid):
| effect | NH₃ (µg/m³) |
|---|---|
| direct (own cell) | 0.30 |
| indirect (spillover + feedback) | 0.49 |
| total | 0.79 |
| multiplier \(1/(1-\rho)\), ρ = 0.67 | 3.05 |
The total (~0.8) sits modestly above the naïve DiD (~0.5): the spillover adds a little, via the ρ-multiplier, which is sensitive to ρ and to W.
The spatial-lag (SAR) answer is only as good as the weight matrix you assumed.
Modelling spillovers trades “my controls are clean” for “I picked the right W.”
Instead of modelling the leak, change who counts as a control: keep treated cells, drop controls near a treated cell (partly exposed), and compare to far controls.
The own-cell effect is ~0.5, and the spillover onto controls decays with distance (0–2 km: +0.35; gone by 4–6 km).
| comparison | DiD (µg/m³) |
|---|---|
| naïve (all controls) | +0.47 |
| rings (far controls only) | +0.50 |
Dropping near controls barely moves the headline → the own-cell effect is robust in magnitude; contamination is real but small.
New bet: spillovers die out beyond the buffer, treatments do not cluster impacting each other
Butts (2024) formalises DiD with spatial spillovers this way.
| Question | Response 1, model (SAR) | Response 2, rings |
|---|---|---|
| treatment | own gain (spillover via the outcome) | own farm gain (spillover via rings) |
| estimand | total effect incl. spillover (~0.79) | effect on treated vs clean controls (direct + drift the cell receives), ~0.5 |
| comparison | gain + modelled neighbour spillover | treated vs far controls (drop near) |
| credible if | W and ρ are correct | spillovers vanish beyond the buffer |
| fails when | wrong W inflates the multiplier | wrong radius keeps contamination |
Rings clean the control side; with clustered treatment the treated cells still receive drift, so 0.5 is an upper bound on the pure own-cell effect, cf. SAR direct 0.30.
Interference depends on the question you ask:
Make the DiD spillover-aware, and compare the two responses.
Lecture 1, the logic
Lecture 2, in space
Be honest about your design: name treatment → estimand → comparison → assumption → failure