What Changes When Causality Lives in Space?

Spillovers and confounders everywhere

Javier Garcia-Bernardo, ODISSEI Social Data Science (SoDa)

John Snow: A classic spatial causal argument

John Snow, On the Mode of Communication of Cholera (1855). Each black mark is a cholera death; deaths cluster around the Broad Street pump (the pumps are the “PUMP” labels at the street corners). Public domain.

households using different water suppliers were intermixed in the same streets of south London
suppliers: Southwark & Vauxhall (sewage-tainted Thames) vs Lambeth (cleaner, upstream)
this gave Snow a valid comparison, geography was rescuing exchangeability, not just locating cases
the question was already counterfactual: what would cholera risk have been under the other supplier?

Snow’s argument, in five questions

The same five questions you will ask in the practical, applied to 1854.

#	Question	Snow’s answer
1	What is the treatment?	Which company piped your water
2	What is the estimand?	Effect of a contaminated supply on the risk of cholera death
3	What is the comparison?	Neighbouring houses on the same streets served by different companies
4	What makes it credible?	Which pipe reached a house was fixed years earlier, as good as random given the household → exchangeability
5	Why might it fail?	If households chose their supplier, or the two service areas differed in wealth or sanitation → confounding returns

Practical

Remember the assumptions of causal inference

Spatial causal inference

Space puts the most pressure on exchangeability and no interference.

Three characteristics of geospatial analysis

Spatial spillovers stress no interference

A “control” cell next door is then already partly treated, the comparison is contaminated.

Spatial confounding stresses exchangeability

Two cells with different exposure are almost never comparable on everything else.

These factors cluster across the map: a beautiful spatial pattern can hide causes at once.

Scale impacting consistency / positivity

Modifiable Areal Unit Problem (MAUP): change the scale of analysis and the answer can change too.

80 m vs 1 km radius
point, buffer, or administrative unit
where exposure is measured vs where the outcome is measured

It stresses consistency (the label “livestock cell” means different things at 80 m vs 1 km) and positivity (the treated/control overlap shifts with the unit).

Pattern adapted from Dinesen and Sønderskov (2015), with illustrative 95% intervals: the trust–diversity effect is strong (and noisiest) at micro scales and fades by ~1 km. The practical’s 1 km grid picks one point on this curve, a deliberate choice that picks an estimand.

▶ Practice now: Part 0 & Step 1

First meet the data, then run the first regression. The grid joins three sources on a 1 km raster: RIVM modelled NH₃ concentrations (2018–2024), firm locations & sectors from Bureau van Dijk (Orbis), and CBS neighbourhood covariates (population density, urbanity).

Part 0, meet the data (~10 min)

One row per 1 km cell. A few real rows:

nh3 ’20	nh3 ’24	farms ’20	farms ’24	pop. dens.
5.2	5.0	1	3	9
37.3	52.2	7	7	78
8.4	4.9	0	0	24 868

Step 1, OLS (~10 min): regress NH₃ on farm counts ± controls.

Spatial models diagnose dependence: they don’t identify

In Lecture 1 we fought confounding by adding controls to the regression. Here we fight spatial dependence the same way, by adding spatial terms to the model. The general one is the Spatial Durbin (SDM):

\[y = \color{#0b789d}{\rho\, W y} + \color{#c44e52}{\beta X} + \color{#e08214}{\theta\,W X} + \varepsilon\]

ρWy, my outcome tracks my neighbours’ outcome (outcome spillover)
βX, the ordinary regression on my own covariates
θWX, my neighbours’ covariates (their farms) reach me (covariate spillover)

The two famous models just keep part of this:

\[\text{SAR: } y = \color{#0b789d}{\rho W y} + \color{#c44e52}{\beta X} + \varepsilon \qquad\qquad \text{SEM: } y = \color{#c44e52}{\beta X} + u,\;\; u = \lambda W u + \varepsilon\]

SAR when the outcome itself spreads (prices, congestion, a gas), our choice here (NH₃ drifts). SEM when the leftover clustering comes from unmeasured factors that happen to cluster in space (not from the outcome spreading): it corrects the standard errors but leaves the coefficients’ meaning unchanged. SDM adds neighbours’ covariates (\(\theta WX\)) directly. We use SAR here.

Do spatial models clean spatial correlation?

Moran’s I = spatial autocorrelation: do nearby cells have similar residuals? (≈ +1 clustered, 0 random, < 0 checkerboard.)

Model	resid. Moran’s I	spatial param
OLS	0.49	n/a
SAR	−0.01	ρ = 0.82
SEM	−0.01	λ = 0.90
SDM	0.01	ρ = 0.78

The models drive residual Moran’s I from 0.49 → ~0: the clustering is absorbed. But that is spatial honesty, not identification.

A key problem: the weight matrix W

W encodes who is a neighbour of whom, an assumption you choose, not data.

Three common ways to define neighbours:

Rook: share an edge (4 neighbours)
Queen: edge or corner (8)
Distance band: every cell within d, optionally distance-weighted (we use 1/d³ within 3 km)

Then row-standardise so each row sums to 1 → Wy is a neighbour average.

Why it’s a problem: change W and you change the results. No test tells you which W is right, so the spatial answer is only as credible as that choice.

Useful models of dependence, not designs for identification — unmeasured confounders are likely present.

Those models in python

# W = who-neighbours-whom (inverse-distance, 3 km)
w = DistanceBand(coords, threshold=3000,
                 binary=False, alpha=-3); w.transform = "R"
m_sar = ML_Lag(y, X, w=w)              # ρ W y
m_sem = ML_Error(y, X, w=w)            # λ in errors
m_sdm = ML_Lag(y, X, w=w, slx_lags=1)  # + W X θ

slx_lags controls how many orders of neighbours’ X enter:

slx_lags=0 (SAR): \(y = \rho Wy + \beta X + \varepsilon\)
slx_lags=1 (SDM): \(y = \rho Wy + \beta X + \theta_1\,WX + \varepsilon\)
slx_lags=2: \(y = \rho Wy + \beta X + \theta_1\,WX + \theta_2\,W^2X + \varepsilon\) (adds neighbours-of-neighbours)

Another key problem: interpreting coefficients

In SAR/SDM (\(\text{SAR: } y = \color{#0b789d}{\rho W y} + \color{#c44e52}{\beta X} + \varepsilon\)) the outcome sits on both sides: I affect my neighbours, and they affect me back. So a change in \(X\) doesn’t move just one cell, it ripples out and partly returns.

the raw coefficient is not the effect
spreg reports the impact decomposition:

\[\underbrace{\text{direct}}_{\text{own cell}} + \underbrace{\text{indirect}}_{\text{spillover + feedback}} = \text{total}\]

the feedback sums to the spatial multiplier \(\dfrac{1}{1-\rho}\), a larger \(\rho\) inflates the total.

▶ Practice now: Step 2

Refit the same regression with spatial weights.

Step 2, SAR / SEM / SDM (≈10 min)

build \(W\) (inverse-distance decay, 3 km, row-standardised)
fit SAR, SEM, SDM on the livestock belt
compare residual Moran’s I across models

What to notice

Moran’s I collapses toward 0 → dependence absorbed
the livestock coefficient barely moves
read the SAR impacts: direct / indirect / total

A clean residual map is spatially honest SE, not causal identification.

Geography hands us ways to rescue exchangeability

Space made exchangeability harder via confounding, no-interference harder via spillovers, consistency / positivity harder via scale.

But geography provides sources of exogenous variation:

Design	Spatial source of variation	Key question
IV	Distance, wind, lotteries	Does it shift exposure without directly changing outcomes?
DiD	Place-based shocks or staggered rollouts	Would treated and control places have followed parallel trends?
RDD	Boundaries or spatial cutoffs	Are units just across the border comparable?

Each of the next slides takes one design: which assumption it rescues, what comparison it supplies, what new bet we make.

Time-based design: rescues exchangeability through exogenous timing

E-ZPass (Currie & Walker, 2011). Toll plazas were automated on a fixed national schedule. Local congestion fell. Birth outcomes improved near the plazas, not far away.

Rescues: exchangeability, the timing of E-ZPass is exogenous to local birth trends.

Comparison: near vs far × before vs after
New bet: parallel trends between near and far places absent the shock
Failure mode: other local changes coincide with E-ZPass; spillovers (the assumption we just broke) contaminate “far” controls

Practical link: Step 3 uses the same logic, cells that gained livestock farms 2020-2024 vs cells that didn’t.

Boundary design: rescues exchangeability through local continuity

Black (1999). Two houses 50 m apart, same street, opposite sides of a school attendance boundary. Same bakery, bus stop, neighbors. Different school.

Rescues: exchangeability, locally. The boundary slices off a comparison group that is similar on everything except treatment.

Comparison: units just across the line
New bet: local continuity of all confounders at the boundary
Failure mode: households sort to the right side of the line (positivity / selection comes back through the back door)

Practical link: the NH₃ grid doesn’t have school boundaries, but we could imagine a municipal manure regulation boundary as the spatial discontinuity for an extension.

The estimand becomes local, the effect near the line, not the population-average effect.

Instrument design: rescues exchangeability through an exogenous shifter

Deryugina, Heutel, Miller, Molitor & Reif (2019, AER). They use daily changes in wind direction as an instrument for fine-particulate (PM₂.₅) exposure, and estimate its effect on mortality and health-care use among older Americans.

Rescues: exchangeability. Where the wind blows is plausibly unrelated to health-care use

Comparison: units whose exposure differs only because the wind moved.
New bet: exclusion restriction, wind affects health only through pollution; relevance, wind impacts pollution strongly.
Failure mode: wind also changes temperature, mood, indoor activity

Practical link: an NH₃ extension would use wind direction × upwind farm density as a daily instrument for cell-level NH₃, the same trick as Deryugina et al.

The exclusion restriction is untestable, it’s an assumption you defend with theory.

▶ Practice now: Step 3, Difference-in-Differences

The timing design, on the NH₃ grid, the comparison this lecture builds toward.

Cells that gained > 1 farm (2020 → 2024) vs cells that didn’t.

smf.ols("nh3 ~ treated * post", did).fit()

treatment: net farm gain in a cell
estimand: impact of gaining 1+ firms on NH₃ vs not gaining.
comparison: gain vs no-gain, 2020 vs 2024
credible if parallel trends hold
check the pre-trend (2018–2020)

Real grid: treated cells sit higher, with a hint of faster decline before 2020.

But DiD breaks under interference

DiD trusts that the controls are untreated. In space they often aren’t, NH₃ drifts across cell boundaries.

Cell B looks like a control on the map but is partly treated by Cell A’s farm. The no-interference assumption fails in space. We look at three ways to respond.

Response 1: model the spillover (spatial lag / SAR)

Keep the timing comparison by taking first-differences (\(\Delta Y_i = Y_{i,2024} - Y_{i,2020}\)), but correct for spillovers with SAR:

\[\Delta Y_i = \alpha + \color{#6a51a3}{\rho\,(W\Delta Y)_i} + \color{#0b789d}{\tau\, T_i} + \varepsilon_i\]

ML_Lag(dY, X[["treated"]], w=w_fd)

ρ (WΔY), outcome feedback: my ΔNH₃ tracks my neighbours’ ΔNH₃; the multiplier \(1/(1-\rho)\) sums it
τ, my own farm gain (direct)
spillover: a neighbour’s gain reaches me only through the outcome: their ΔNH₃ drifts to me via ρ(WΔY)

Impact decomposition (real grid):

effect	NH₃ (µg/m³)
direct (own cell)	0.30
indirect (spillover + feedback)	0.49
total	0.79
multiplier \(1/(1-\rho)\), ρ = 0.67	3.05

The total (~0.8) sits modestly above the naïve DiD (~0.5): the spillover adds a little, via the ρ-multiplier, which is sensitive to ρ and to W.

The cost of Response 1: it’s all about W

The spatial-lag (SAR) answer is only as good as the weight matrix you assumed.

W is a choice, not data. A different neighbourhood or decay → a different \(\rho\) → a different total. No test tells you W is “right.”
because \(\rho\) enters as \(1/(1-\rho)\), a high \(\rho\) amplifies the own effect: here ρ = 0.67 turns the τ ≈ 0.26 own gain into a ~0.79 total.
the raw \(\tau\) is not the effect, you must read the direct/indirect/total decomposition.
and it’s not only W. SAR pins all the spatial structure of ΔNH₃ on outcome feedback (ρWΔY). If some is common spatial shocks (weather, regional trends hitting neighbours alike), ρ soaks those up too and inflates the total: the same identification gap as SAR-vs-SEM.

Modelling spillovers trades “my controls are clean” for “I picked the right W.”

Response 2: redesign the comparison with distance rings

Instead of modelling the leak, change who counts as a control: keep treated cells, drop controls near a treated cell (partly exposed), and compare to far controls.

The own-cell effect is ~0.5, and the spillover onto controls decays with distance (0–2 km: +0.35; gone by 4–6 km).

comparison	DiD (µg/m³)
naïve (all controls)	+0.47
rings (far controls only)	+0.50

Dropping near controls barely moves the headline → the own-cell effect is robust in magnitude; contamination is real but small.

New bet: spillovers die out beyond the buffer, treatments do not cluster impacting each other

Butts (2024) formalises DiD with spatial spillovers this way.

Two answers

Question	Response 1, model (SAR)	Response 2, rings
treatment	own gain (spillover via the outcome)	own farm gain (spillover via rings)
estimand	total effect incl. spillover (~0.79)	effect on treated vs clean controls (direct + drift the cell receives), ~0.5
comparison	gain + modelled neighbour spillover	treated vs far controls (drop near)
credible if	W and ρ are correct	spillovers vanish beyond the buffer
fails when	wrong W inflates the multiplier	wrong radius keeps contamination

Rings clean the control side; with clustered treatment the treated cells still receive drift, so 0.5 is an upper bound on the pure own-cell effect, cf. SAR direct 0.30.

A third route: make the spillover part of the exposure

Interference depends on the question you ask:

Do this cell’s own farms cause NH₃? → neighbours’ drift contaminates the controls = interference (Responses 1 & 2)
Does the NH₃ you actually breathe harm you? → drift is simply part of your exposure = no interference.

▶ Practice now: Step 4

Make the DiD spillover-aware, and compare the two responses.

Response 1, model it (Step 4)

ML_Lag(dY, X, w=w_fd)  # first-diff SAR (spatial lag)

read the total effect (≈ 0.79) and the multiplier \(1/(1-\rho)\)
remember: ρ drives that number

Response 2, rings (design, optional)

drop “controls” near a treated cell
compare treated vs far controls (≈ 0.5)

Recap: two lectures, three ideas to keep

Lecture 1, the logic

a causal claim is about a missing counterfactual
data help only when there is a credible comparison
the assumptions decide what the estimate means, no method is causal by default

Lecture 2, in space

space doesn’t change the question; it strains the assumptions (confounding → exchangeability, spillovers → interference, scale → consistency/positivity)
credible comparisons depend on the design and the spatial assumptions

Be honest about your design: name treatment → estimand → comparison → assumption → failure