03-24-2018 03:51 PM
I am new to SAS and am trying to run a differences-in-differences analysis with food insecurity as my outcome of interest. I have pre and a post time points (each have two years of data) and am trying to compare food insecurity rates between my treatment group and my control group over time. I have set most of my variables (not age) as dummy variables (I saw on another post that that may not be ideal) and have a couple of variables I would like to control for (age, education, marital status, and SNAP). This is where I am at with the code:
proc logistic data = march24.march24coding;
model foodinsecure (event = '1') = post treatment post*treatment SnapRecipient prtage maritalstatus EAnoHS EAHSnoBD EABD;
title1 'DiD Attempt';
I am using SAS 9.4 and have included more of my coding below in case it is helpful.
Does anyone have any suggestions?
Thanks so much.
FoodInsecure = 0;
if hrfs12m1 in (2, 3) then FoodInsecure = 1;
if peeduca in (31, 32, 33, 34, 35, 36, 37, 38) then EAnoHS = 1;
else EAnoHS = 0;
if peeduca in (39, 40, 41, 42) then EAHSnoBD = 1;
else EAHSnoBD = 0;
if peeduca in (43,44,45) then EABD = 1;
else EABD = 0;
if pemaritl in (1, 2) then MaritalStatus = 1;
else MaritalStatus = 0;
if gestfips in (04, 08, 09, 10, 15, 17, 21, 24, 27, 32, 33, 36, 38, 39, 44, 50, 54) then Treatment = 1;
else Treatment = 0;
if gestfips in (01, 12, 13, 20, 28, 29, 31, 37, 40, 45, 46, 47, 48, 51, 56, 05, 16, 19, 23, 26, 35, 41, 49, 55, 06, 11, 25, 34, 53) then Control = 1;
else control = 0;
if year in (2012, 2013) then Pre = 1;
else Pre = 0;
if year in (2015, 2016) then Post = 1;
else Post = 0;
if hesp1 in (1) then SnapRecipient = 1;
else SnapRecipient = 0;
03-26-2018 11:37 AM
You say that your data is over time. If that means you have repeated measurements on subjects, then you probably don't want to use PROC LOGISTIC as you have done since it is assuming that all of the observations are independent. You can use procedures like GLIMMIX, GEE, or GENMOD to fit logistic models that accommodate repeated measures. The DID tests you want will involve a nonlinear combination of the model parameters. That is discussed and illustrated in this note.
03-26-2018 01:18 PM
Thank you for your response. I do not have repeated measures on subjects. I have data on my control and treatment groups for pre and post, but these are made up of different individuals within the same states that I considered my control and treatment.
Do you have a suggestion for how to use a DID test in proc logistic for independent observations?
03-26-2018 01:25 PM
In that case, you should be able to proceed as in the note I referred to - add a STORE statement to save the model, and then use the NLEstimate macro to estimate and test the difference in difference effect (assuming you want the estimate on the probability rather than the log odds scale). Because of the presence of the covariates in your model, that estimate will be adjusted for them.
03-28-2018 01:17 PM
Thank you again. I appreciate the help. I have a couple of questions.
03-28-2018 01:31 PM
The order in the MODEL statement doesn't matter. The DID estimate on the log odds scale is just the interaction estimate as discussed in the note. It is shown by default in the Parameter Estimates table, so there is no need for an ESTIMATE, LSMEANS, or LSMESTIMATE statement. For the estimate on the probability scale you need the NLEstimate macro, which you need to download from the link I provided and then run the macro code to make the macro available before you can call it. This is discussed in the Usage section in the macro documentation at that link. You then need to use the names for the model parameters (of the form B_px) as shown in the example in the Results tab of the macro documentation.