Bayesian Synthetic Control
A NumPyro implementation of Bayesian Synthetic Control Methods.
Synthetic Control Methods (SCM) were first introduced in Abadie and Gardeazabal (2003) and formalised in Abadie, Diamon, and Hainmueller (2010). In the latter of these publications, the authors investigated the impact of California’s Proposition 99 legislation on cigarette sales. In this notebook I implement a Bayesian variant of SCMs and apply it to the same dataset. We shall see how the original SCM can easily be mapped into a Bayesian framework, before conducting posterior inference and exploring how treatment effects and their corresponding uncertainty may be estimated.
| |
Show plotting code
| |
Model Specification
The idea of SCMs is to create a synthetic equivalent of the treated unit, a counterfactual, by learning a weighted combination of untreated control units. This counterfactual should approximate the behaviour of the treated unit in the pre-treatment period. The trajectory of the counterfactual in the post-treatment period can then be interpreted as “what would have happened in the treated unit, had a treatment not been applied”. The difference between the treated unit’s value and the counterfactual then serves as the treatment effect.
Let $Y_{1t}$ be the outcome for the treated unit (California) at time $t$, and let $\mathbf{Y}_{0t}$ be the vector of outcomes for the $D$ control units at time $t$. $\hat{Y}_{1t}$ is then the counterfactual unit. An ordinary SCM can then be written as
$$ \hat{Y}_{1t} = \mathbf{Y}_{0t}\mathbf{w} + \epsilon, \tag{1} $$where $\mathbf{w}\in\Delta^{D}$ where $\Delta^{D}$ is the d-dimension probability simplex. The practicaly implication of this is that the weights of our counterfactual are constrained to be strictly positive and sum to 1; a form of regularisation that prevents overfitting.
A Bayesian variant of such a model can then be easily inferred by mapping the model in (1) to a Bayesian linear regression model where the weights are drawn from a Dirichlet distribution. Use of the Dirichlet distribution satisfies the need for our weights to belong to the simplex, and by setting an appropriate hyperprior on the distribution’s concentration parameter, we may even control how sparse our weights are. As the concentration parameter $c$ tends to 0 from above, the samples drawn from the Dirichlet distribution become increasing sparse, leading to fewer units contributing to the counterfactual unit’s response. This aids interpretability, often improves performance by functioning as a form of regularisation, and can be helpful in a practical setting where there is an increasing cost for each additional unit that is included in the experiment.
We may write the model as
$$ \hat{Y}_{1t} \sim \mathcal{N}(\mu_t, \sigma^2) $$where the mean, $\mu_t$, represents the synthetic control:
$$ \mu_t = \alpha + \mathbf{Y}_{0t}\mathbf{w}\,. $$The parameters of this model are assigned the priors:
$$ \begin{align} \mathbf{c} &\sim \text{Gamma}(0.5, 0.5) \\ \mathbf{w} &\sim \text{Dirichlet}(\mathbf{c}) \\ \alpha &\sim \mathcal{N}(0, 5) \\ \sigma &\sim \text{HalfNormal}(1) \end{align} $$ | |
| |
| |
| |

Data Preparation
We use the california_smoking.csv dataset, which contains per-capita cigarette sales for 39 US states from 1970 to 2000. The treated unit is California, where Proposition 99 was passed in 1988 to increase cigarette taxes.
The data is partitioned into a treated unit (California) and a set of control units (the other 38 states). The data for all units is then split into a pre-treatment period (1970-1987) and a post-treatment period (1988-2000).
| |
Model Fitting and Evaluation
With the model specified and the data prepared, we can perform inference to find the posterior distribution of the parameters. We use a No-U-Turn Sampler (NUTS), which is a self-tuning variant of Hamiltonian Monte Carlo (HMC), to draw samples from the posterior.
| |
Prior Predictive Check
Before fitting the model, it is good practice to perform a prior predictive check. This involves simulating data from the model using only the prior distributions. By comparing the generated data to the actual observed data, we can assess whether the priors are reasonable. The plot below shows that our priors are quite weak, allowing for a wide range of possible outcomes, which is a safe starting point.
| |

Posterior Predictive Check
After fitting the model on the pre-treatment data, we perform a posterior predictive check. This involves generating data from the model using parameter values from the posterior distribution. The left-hand panel below shows that the fitted model provides a good description of the pre-treatment data. The right-hand panel shows the model’s counterfactual prediction for the post-treatment period, which is what we will use to estimate the treatment effect.
| |

Effect Estimation
Now that we have the posterior distribution of the model parameters, we can estimate the causal effect of Proposition 99. We do this by generating a counterfactual outcome for California in the post-treatment period. This is the outcome that would have been observed had the intervention not occurred. The counterfactual is generated by taking the posterior predictive distribution for the post-treatment period.
The treatment effect is the difference between the observed outcome and the counterfactual outcome. A negative effect suggests that Proposition 99 led to a decrease in cigarette sales.
| |
| |

| |
