Hi there,
I'm conducting a difference in difference analysis for the first time. My aim is to compare the proportion of preterm deliveries (in a dataset of deliveries, 1 line per delivery) before and after a policy change. The control group are deliveries during the same time period in a year during which no policy change has happened. I do not expect any confounders so propensity scores or adjusting are not planned.
I want to conduct a linear probability model to quantify the difference of the difference of the probability of preterm delivery between the two years with a robust 95% CI (because one mum could potentially contribute several deliveries to the dataset).
My dataset is structured as follows:
Exposed=1 if delivery in year of policy change, 0 if delivery in year without policy change
Post=1 if delivery after date of policy change, 0 if delivery before date of policy change
MumID Preterm Exposed Post
1 0 1 0
2 1 1 0
3 1 0 1
4 0 0 1
5 0 1 1
I've found the following code which seems to run. However, given I have not done this analysis before I'm unsure if I've implemented it correctly.
proc surveyreg data=dataset;
cluster mumid; *I assume this calculates robust 95% CI by accounting for same Mumid;
class post exposed;
model preterm= post exposed post*exposed / CLPARM solution vadjust=none;
estimate "Diff in Diff" post*exposed 1 -1 -1 1;
lsmeans post*exposed;
run;
Does 'cluster mumid' indicate to calculate robust 95% CI?
I got the following preliminary output (sorry in German) and I'm wondering if I'm interpreting this right:
Any insight into whether I'm reading this correctly or how to improve my code is appreciated, as I have not done this before.
Many thanks,
Julia
If your response is binary, then it is not advisable to use a linear probability model assuming the response is normally distributed. A difference in difference analysis appropriate for a binary response is discussed and illustrated in this note. This uses a logistic model and the DID is computed using a macro such as the NLMeans macro.
If you use the Margins macro to fit the model and do the comparisons as shown in the note I referred to, the interpretation is fairly straightforward. In the binary example in the note, the difference in the two A levels is given at each B level in the "Contrasts of A B Margins" table. As can be seen there, the A1-A0 difference in B=1 is 0.53-0.46 = 0.07. So, the probability of the event (Y=1) increases by 0.07 in that level of B when you increase A from 0 to 1. Similarly, the A1-A0 difference in B=0 is 0.72-0.39 = 0.33, meaning that the event probability increases by 0.33 when you increase A in level 0 of B. The difference in these two probability differences is 0.07-0.33 = -0.26 which indicates that the event probability change in B=1 is 0.26 smaller than the probability change in B=0. This is a measure of the interaction between A and B on the event probability. If the change in the event probability in the two B levels were the same, then there would be no interaction. The tests of all of these differences are significant as shown in the Pr>ChiSq column. You could, of course, report these differences as percent changes if you prefer.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.