Re: Difference-in-Difference analysis binary outcome

jspoend · Posted 04-01-2022 09:55 AM

Hi there,

I'm conducting a difference in difference analysis for the first time. My aim is to compare the proportion of preterm deliveries (in a dataset of deliveries, 1 line per delivery) before and after a policy change. The control group are deliveries during the same time period in a year during which no policy change has happened. I do not expect any confounders so propensity scores or adjusting are not planned.

I want to conduct a linear probability model to quantify the difference of the difference of the probability of preterm delivery between the two years with a robust 95% CI (because one mum could potentially contribute several deliveries to the dataset).

My dataset is structured as follows:

Exposed=1 if delivery in year of policy change, 0 if delivery in year without policy change

Post=1 if delivery after date of policy change, 0 if delivery before date of policy change

MumID Preterm Exposed Post

1 0 1 0

2 1 1 0

3 1 0 1

4 0 0 1

5 0 1 1

I've found the following code which seems to run. However, given I have not done this analysis before I'm unsure if I've implemented it correctly.

proc surveyreg data=dataset;
cluster mumid; *I assume this calculates robust 95% CI by accounting for same Mumid;
class post exposed;
model preterm= post exposed post*exposed / CLPARM solution vadjust=none;
estimate "Diff in Diff" post*exposed 1 -1 -1 1;
lsmeans post*exposed;
run;

Does 'cluster mumid' indicate to calculate robust 95% CI?

I got the following preliminary output (sorry in German) and I'm wondering if I'm interpreting this right:

2022.04.01_SAS output DID for sas forum.PNG

I interpret this such that:

the unexposed group (year without intervention) had 6.5% preterms prior to the policy change.
The exposed group (year of policy change) had 0.1% more preterms prior to the policy change.
I'm not sure how to interpret post 0 = -0.0063864. Is this the average change between pre and post policy change?
I interpreted the interaction term as my main result: i.e. that the difference of the difference in the proportion of preterm deliveries between the two years is 0.4/100 deliveries, which is not statistically significant (p=0.203). So the policy change did not significantly change the proportion of preterm deliveries.

Any insight into whether I'm reading this correctly or how to improve my code is appreciated, as I have not done this before.

Many thanks,

Julia

StatDave · Posted 04-01-2022 10:53 AM

If your response is binary, then it is not advisable to use a linear probability model assuming the response is normally distributed. A difference in difference analysis appropriate for a binary response is discussed and illustrated in this note. This uses a logistic model and the DID is computed using a macro such as the NLMeans macro.

jspoend · Posted 04-04-2022 03:09 AM

Hi StatDave,
Thanks a lot for your answer. I have thought about that too. However, I'veread that using linear models to quantify absolute risk differences is an option. E.g. here:.

https://www.statalist.org/forums/forum/general-stata-discussion/general/1408193-binary-dependent-var...

But am definitely no expert. The problem is that the output is harder to interpret if I use logistic regression, as I want to quantify simple absolute risk differences. As far as I understand I cant get those with proc logistic?

Thanks a lot.

StatDave · Posted 04-04-2022 09:34 AM

If you use the Margins macro to fit the model and do the comparisons as shown in the note I referred to, the interpretation is fairly straightforward. In the binary example in the note, the difference in the two A levels is given at each B level in the "Contrasts of A B Margins" table. As can be seen there, the A1-A0 difference in B=1 is 0.53-0.46 = 0.07. So, the probability of the event (Y=1) increases by 0.07 in that level of B when you increase A from 0 to 1. Similarly, the A1-A0 difference in B=0 is 0.72-0.39 = 0.33, meaning that the event probability increases by 0.33 when you increase A in level 0 of B. The difference in these two probability differences is 0.07-0.33 = -0.26 which indicates that the event probability change in B=1 is 0.26 smaller than the probability change in B=0. This is a measure of the interaction between A and B on the event probability. If the change in the event probability in the two B levels were the same, then there would be no interaction. The tests of all of these differences are significant as shown in the Pr>ChiSq column. You could, of course, report these differences as percent changes if you prefer.

Difference-in-Difference analysis binary outcome