BookmarkSubscribeRSS Feed
jspoend
Obsidian | Level 7

Hi there, 

I'm conducting a difference in difference analysis for the first time. My aim is to compare the proportion of preterm deliveries (in a dataset of deliveries, 1 line per delivery) before and after a policy change. The control group are deliveries during the same time period in a year during which no policy change has happened. I do not expect any confounders so propensity scores or adjusting are not planned.

 

I want to conduct a linear probability model to quantify the difference of the difference of the probability of preterm delivery between the two years with a robust 95% CI (because one mum could potentially contribute several deliveries to the dataset).

 

My dataset is structured as follows: 

Exposed=1 if delivery in year of policy change, 0 if delivery in year without policy change

Post=1 if delivery after date of policy change, 0 if delivery before date of policy change

 

MumID Preterm Exposed Post

1 0 1 0

2 1 1 0

3 1 0 1

4 0 0 1

5 0 1 1

 

I've found the following code which seems to run. However, given I have not done this analysis before I'm unsure if I've implemented it correctly.

proc surveyreg data=dataset;
cluster mumid; *I assume this calculates robust 95% CI by accounting for same Mumid;
class post exposed;
model preterm= post exposed post*exposed / CLPARM solution vadjust=none;
estimate "Diff in Diff" post*exposed 1 -1 -1 1;
lsmeans post*exposed;
run;

Does 'cluster mumid' indicate to calculate robust 95% CI?

 

I got the following preliminary output  (sorry in German) and I'm wondering if I'm interpreting this right: 

2022.04.01_SAS output DID for sas forum.PNG

I interpret this such that:
  • the unexposed group (year without intervention) had 6.5% preterms prior to the policy change.
  • The exposed group (year of policy change) had 0.1% more preterms prior to the policy change.
  • I'm not sure how to interpret post 0 = -0.0063864. Is this the average change between pre and post policy change?
  • I interpreted the interaction term as my main result: i.e. that the difference of the difference in the proportion of preterm deliveries between the two years is 0.4/100 deliveries, which is not statistically significant (p=0.203). So the policy change did not significantly change the proportion of preterm deliveries.

Any insight into whether I'm reading this correctly or how to improve my code is appreciated, as I have not done this before. 

 

Many thanks, 

 

Julia 

 

 

 

3 REPLIES 3
StatDave
SAS Super FREQ

If your response is binary, then it is not advisable to use a linear probability model assuming the response is normally distributed. A difference in difference analysis appropriate for a binary response is discussed and illustrated in this note. This uses a logistic model and the DID is computed using a macro such as the NLMeans macro.

jspoend
Obsidian | Level 7
Hi StatDave,
Thanks a lot for your answer. I have thought about that too. However, I'veread that using linear models to quantify absolute risk differences is an option. E.g. here:.

https://www.statalist.org/forums/forum/general-stata-discussion/general/1408193-binary-dependent-var...

But am definitely no expert. The problem is that the output is harder to interpret if I use logistic regression, as I want to quantify simple absolute risk differences. As far as I understand I cant get those with proc logistic?

Thanks a lot.
StatDave
SAS Super FREQ

If you use the Margins macro to fit the model and do the comparisons as shown in the note I referred to, the interpretation is fairly straightforward. In the binary example in the note, the difference in the two A levels is given at each B level in the "Contrasts of A B Margins" table. As can be seen there, the A1-A0 difference in B=1 is 0.53-0.46 = 0.07. So, the probability of the event (Y=1) increases by 0.07 in that level of B when you increase A from 0 to 1. Similarly, the A1-A0 difference in B=0 is 0.72-0.39 = 0.33, meaning that the event probability increases by 0.33 when you increase A in level 0 of B. The difference in these two probability differences is 0.07-0.33 = -0.26 which indicates that the event probability change in B=1 is 0.26 smaller than the probability change in B=0. This is a measure of the interaction between A and B on the event probability. If the change in the event probability in the two B levels were the same, then there would be no interaction. The tests of all of these differences are significant as shown in the Pr>ChiSq column. You could, of course, report these differences as percent changes if you prefer.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2033 views
  • 5 likes
  • 2 in conversation