Solved: Re: Difference in difference analysis for a binary outcome

lousam · Posted 11-13-2020 05:18 PM

Hello,

I am trying to conduct a difference in difference (DID) analysis to examine the effect of an intervention on the prevalence of smoking using a national cross-sectional survey dataset (2010-2015). The outcome of interest is a binary variable (smoking: yes/no), and I am comparing states with and without the policy of interest.

I have been asked to run both a linear probability model (using proc surveyreg) and a logit model (using proc surveylogistic) to examine the effect of the policy in treated versus reference states.

I have to get the following estimates:

1) The average difference in the probability of smoking (treated versus reference states) from a linear probability model (i.e., estimate [95% CI])

2) The pre-post changes in the odds of smoking (treated versus reference states) from a logit model (i.e., odds ratio [95% CI])

Question 1: Does the coefficient obtained from the "lsmestimate" and "estimate" statement represent the DID estimate (i.e., the average difference in the probability of smoking)?

If not, how can I get the DID estimate as a probability with a 95% CI? Here is the code I was using:

proc surveyreg data=survey_data;
domain gender;
stratum stratum_var;
cluster cluster_var;
class policy_time intervention covar1 covar2 covar3;
weight weight_var;
model smoking= policy_time intervention policy_time*intervention covar1 covar2 covar3 /CLPARM solution vadjust=none;
estimate "Diff in Diff" policy_time*intervention 1 -1 -1 1;
lsmeans policy_time*intervention;
lsmestimate policy_time*intervention "Diff in Diff" 1 -1 -1 1;
run;

Question 2: How can I obtain the changes in the odds of smoking (treated versus reference states) as an odds ratio with a 95% CI? Here is a sample code:

proc surveylogistic data=survey_data;
domain gender;
stratum stratum_var;
cluster cluster_var;
class policy_time intervention covar1 covar2 covar3;
weight weight_var;
model smoking= policy_time intervention policy_time*intervention covar1 covar2 covar3;
run;

I appreciate any insight you can offer.

SteveDenham · Posted 11-16-2020 09:36 AM

I think your code for Question 1 will give what you need (although the ESTIMATE and LSMESTIMATE are redundant). The trick then is to include at least the LSMEANS and LSMESTIMATE statements into the PROC SURVEYLOGISTIC code. You will need to output the results using ODS, and then call the %NLmeans macro to get the differences. The documentation for the %NLmeans macro is in this note: https://support.sas.com/kb/62/362.html

SteveDenham

View solution in original post

SteveDenham · Posted 11-16-2020 09:36 AM

I think your code for Question 1 will give what you need (although the ESTIMATE and LSMESTIMATE are redundant). The trick then is to include at least the LSMEANS and LSMESTIMATE statements into the PROC SURVEYLOGISTIC code. You will need to output the results using ODS, and then call the %NLmeans macro to get the differences. The documentation for the %NLmeans macro is in this note: https://support.sas.com/kb/62/362.html

SteveDenham

StatDave · Posted 11-16-2020 10:34 PM

It's not entirely clear what comparison you want in question 2, but since you say you want an odds ratio, I assume you want to compare one or more pairs of the four combinations. In that case, just use the LSMEANS statement with the DIFF and ODDSRATIO options (and CL if you want confidence intervals) which will give each of the pairwise comparisons and the corresponding odds ratios. For example: lsmeans policy_time*intervention / ilink diff oddsratio cl;

If you again want an estimate of the DID on the mean scale, then see the second section of this note that shows how to obtain the DID on the means using the NLMeans macro. Or if pairwise differences in means are needed, that can also be done with NLMeans as shown in this note (though not shown in the context of a model with interaction).

lousam · Posted 11-19-2020 04:22 PM

I apologize if my questions were unclear. For my second question, I wanted to get a DID estimate as an odds ratio from "proc surveylogistic". This odds ratio (95% CI) would represents the pre-post changes in smoking among individuals in treated states relative to the individuals in the untreated states.

I believe there are several articles that provide a DID estimate using the following approaches:

- DID estimate as an odds ratio (obtained using logistic regression)

- DID estimate as the average difference in the probability of having the outcome of interest (obtained using linear regression)

StatDave · Posted 11-19-2020 08:16 PM

"DID" means *difference" in difference. Odds ratios, being ratios, are not differences. So, again, if you want to estimate the difference in differences of the means, then use the NLMeans macro in the note I referred to. If you truly want odds ratios, instead of differences, then use the LSMEANS statement with DIFF and ODDSRATIO options.