turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How do I run a differences-in-differences analysis...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-24-2018 03:51 PM

Hello,

I am new to SAS and am trying to run a differences-in-differences analysis with food insecurity as my outcome of interest. I have pre and a post time points (each have two years of data) and am trying to compare food insecurity rates between my treatment group and my control group over time. I have set most of my variables (not age) as dummy variables (I saw on another post that that may not be ideal) and have a couple of variables I would like to control for (age, education, marital status, and SNAP). This is where I am at with the code:

proc logistic data = march24.march24coding;

model foodinsecure (event = '1') = post treatment post*treatment SnapRecipient prtage maritalstatus EAnoHS EAHSnoBD EABD;

title1 'DiD Attempt';

run;

I am using SAS 9.4 and have included more of my coding below in case it is helpful.

Does anyone have any suggestions?

Thanks so much.

----

FoodInsecure = 0;

if hrfs12m1 in (2, 3) then FoodInsecure = 1;

if peeduca in (31, 32, 33, 34, 35, 36, 37, 38) then EAnoHS = 1;

else EAnoHS = 0;

if peeduca in (39, 40, 41, 42) then EAHSnoBD = 1;

else EAHSnoBD = 0;

if peeduca in (43,44,45) then EABD = 1;

else EABD = 0;

if pemaritl in (1, 2) then MaritalStatus = 1;

else MaritalStatus = 0;

if gestfips in (04, 08, 09, 10, 15, 17, 21, 24, 27, 32, 33, 36, 38, 39, 44, 50, 54) then Treatment = 1;

else Treatment = 0;

if gestfips in (01, 12, 13, 20, 28, 29, 31, 37, 40, 45, 46, 47, 48, 51, 56, 05, 16, 19, 23, 26, 35, 41, 49, 55, 06, 11, 25, 34, 53) then Control = 1;

else control = 0;

if year in (2012, 2013) then Pre = 1;

else Pre = 0;

if year in (2015, 2016) then Post = 1;

else Post = 0;

if hesp1 in (1) then SnapRecipient = 1;

else SnapRecipient = 0;

run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mary88

03-25-2018 04:29 AM

I am not sure Maybe you should take Mixed Logistic Regression by PROC GLIMMIX ,and take PRE variable as a random effect .

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mary88

03-26-2018 11:37 AM

You say that your data is over time. If that means you have repeated measurements on subjects, then you probably don't want to use PROC LOGISTIC as you have done since it is assuming that all of the observations are independent. You can use procedures like GLIMMIX, GEE, or GENMOD to fit logistic models that accommodate repeated measures. The DID tests you want will involve a nonlinear combination of the model parameters. That is discussed and illustrated in this note.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to StatDave_sas

03-26-2018 01:18 PM

Thank you for your response. I do not have repeated measures on subjects. I have data on my control and treatment groups for pre and post, but these are made up of different individuals within the same states that I considered my control and treatment.

Do you have a suggestion for how to use a DID test in proc logistic for independent observations?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mary88

03-26-2018 01:25 PM

In that case, you should be able to proceed as in the note I referred to - add a STORE statement to save the model, and then use the NLEstimate macro to estimate and test the difference in difference effect (assuming you want the estimate on the probability rather than the log odds scale). Because of the presence of the covariates in your model, that estimate will be adjusted for them.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to StatDave_sas

03-28-2018 01:17 PM

Thank you again. I appreciate the help. I have a couple of questions.

Does it matter where I place my covariates in the model statement? Should they all come after the = but before my two predictors of interest?

-So, model foodinsecure (event="1") = prtage snaprecipient eanohs eahsnobd eabd maritalstatus post treatment post*treatment; ?

---

If I am looking for the LogOdds, does this procedure still work for nonlinear parameters?

proc logistic data;

class foodinsecure snaprecipient eanohs eahsnobd eabd maritalstatus post treatment / param=glm ref=first;

model foodinsecure (event="1") = prttage snaprecipient eanohs eahsnobd eabd maritalstatus post treatment post*treatment;

estimate "diff in diff" post*treatment 1 -1 -1 1;

lsmeans post*treatment / ilink;

lsmestimate post*treatment "diff in diff logodds" 1 -1 -1 1;

run;

---

When I try to find the estimate on probability and use the NLEstimate macro, I am hitting an error that says the statement (referring to the %NLEstimate) is not valid or it is used out of proper order. My code is below. Do you know what I am doing wrong?

proc logistic data;

class foodinsecure snaprecipient eanohs eahsnobd eabd maritalstatus post treatment;

model foodinsecure (event="1") = prttage snaprecipient eanohs eahsnobd eabd maritalstatus post treatment post*treatment;

lsmeans post*treatment / ilink;

store log;

run;

%NLEstimate (instore = log,

label = diff in diff means,

f= (logistic(intercept + post + treatment + post*treatment) - logistic (intercept + treatment)) - (logistic (intercept + post) - logistic (intercept)))

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mary88

03-28-2018 01:31 PM

The order in the MODEL statement doesn't matter. The DID estimate on the log odds scale is just the interaction estimate as discussed in the note. It is shown by default in the Parameter Estimates table, so there is no need for an ESTIMATE, LSMEANS, or LSMESTIMATE statement. For the estimate on the probability scale you need the NLEstimate macro, which you need to download from the link I provided and then run the macro code to make the macro available before you can call it. This is discussed in the Usage section in the macro documentation at that link. You then need to use the names for the model parameters (of the form B_px) as shown in the example in the Results tab of the macro documentation.