Programming the statistical procedures from SAS

Difference-in-Difference analysis

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Difference-in-Difference analysis

I am trying to figure out how to conduct a difference in difference analysis between coefficients.  The paper I am following compares one group to another and a timer period to another.  Below, is basically what they are doing.  What I don't know how to do is put this in SAS so I can figure out if the difference between the coefficients you see below are significant. Any help is greatly appreciated!!!

 

Panel B:  Difference-in-Differences Analysis     
       
  Loss Sample
PeriodDescriptionNon-Santioned Sanctioned Diff.
Pre-SanctionLog-odds switching risk0.927 2.412760731*1.486045197
 Measured byβ0 β0 + δ1 δ1
Post-SantionLog-odds switching risk 0.716062028 1.85913691 1.275391691
 Measured byβ0 + δ2 β0 + δ1 + δ2 + δ3  δ1 + δ2
       
ChangeLog-odds switching risk -0.210653506 0.825418811 1.036072317
 Measured byδ2  δ2 + δ3 δ3

Accepted Solutions
Solution
‎08-02-2017 09:40 AM
SAS Employee
Posts: 282

Re: Difference-in-Difference analysis

It would be better if you used the CLASS statement to specify your categorical variables rather than create your own dummy variables separately as you've apparently done. Then you can specify the interaction as sanc_PY*post_violation.  In any case, your interaction parameter is, as I mentioned, the difference in difference estimate on the log odds scale.  And this is what you show in your spreadsheet in your original post - the delta3 value is presumably your interaction parameter estimate.  The test of its significance is in the parameter estimates table from the procedure. Similarly, the delta1 and delta2 differences are the two main effect parameter estimates for your sanc_PY and post_violation variables.  For the differences delta1+delta3 (note this is incorrectly shown as delta1+delta2 in your spreadsheet) and delta2+delta3, you will need to use ESTIMATE statements.  The NLEstimate macro could be used too, but isn't necessary as these are just linear, not nonlinear, contrasts.

 

estimate 'd1+d3' sanc_PY 1 sanc_PYPost 1;

estimate 'd2+d3' post_violation 1 sanc_PYPost 1;

 

View solution in original post


All Replies
Super User
Posts: 11,343

Re: Difference-in-Difference analysis

Not much information provided but I am going to guess that a possible starting place is Proc Logisitic with LOGIT option on the Model statement to generate Log odds output.

 

You may also be looking for LSMESTIMATES to do some of the hypothesis tests but that's a guess.

Occasional Contributor
Posts: 10

Re: Difference-in-Difference analysis

Hi and thank you for the response! I am using proc surveylogistic and am clustering by auditor. I have the regression coefficients as see in the excel spreadsheet I included. I am following another paper that looks at the difference between the regression results (as you see on the spreadsheet). I don't know how they are determining if the difference is significant. So, the difference between beta zero and beta zero plus the coefficient of interest, for example. I can calculate the difference. How can I determine if it is significant? I am new at this and find it very challenging.. Because I am using surveylogistic so that I can cluster by auditor, I don't think the logit option is available. I added it to the model statement and it generated an error.


Thank you!!
SAS Employee
Posts: 282

Re: Difference-in-Difference analysis

The interaction parameter in a two-way logistic model with binary variables estimates the difference in difference of log odds. (For an ordinary regression model, it estimates the difference in difference of means). If you want to estimate the difference in difference of probabilities, then you need to apply the inverse of the logit link to obtain each probability. This can be done with either the NLEstimate macro or using the ESTIMATE statement in PROC NLMIXED.

 

For example, if A and B are both binary with values 1 and 2, and Y is binary with values 0 and 1, the PROC LOGISTIC statements below fit the logistic model with interaction. The interaction parameter estimates the difference in difference of log odds. The MEANS, TRANSPOSE, and DATA steps use the saved estimated probabilities and log odds (xbeta) to compute the difference in difference of probabilities and of log odds. NLMIXED then refits the logistic model. The first ESTIMATE statement shows that the difference in difference of log odds is just the interaction parameter. The second ESTIMATE statement applies the inverse logit link (via the LOGISTIC function) to each part to compute the difference in difference of probabilities.

 

proc logistic data=mydata; 
 class a b / param=ref; 
 model y(event="1")=a|b; 
 output out=out p=p xbeta=xb;
 store log;
 run; 
%NLEstimate(instore=log, label=diff in diff probs, 
  f=(logistic(b_p1+b_p2+b_p3+b_p4)-logistic(b_p1+b_p2)) - (logistic(b_p1+b_p3)-logistic(b_p1)), df=100)
%NLEstimate(instore=log, label=diff in diff log odds, 
  f=((b_p1+b_p2+b_p3+b_p4)-(b_p1+b_p2)) - ((b_p1+b_p3)-(b_p1)), df=100)
 
proc means data=out mean nway; 
 class a b; 
 var p xb; 
 output out=probs mean=; 
 run; 
proc transpose data=probs out=tprobs;
 var p xb;
 run; 
data tprobs; 
 set tprobs; 
 difdif=(col1-col3)-(col2-col4); 
 run; 
proc print noobs;
 var _label_ difdif;
 run;
 
proc nlmixed data=mydata; 
 p=logistic(b0 + b1*(a=1) + b2*(b=1) + b3*(a=1 and b=1)); 
 model y ~ binary(p); 
 estimate "(a1b1-a1b2)-(a2b1-a2b2)" ((b0+b1+b2+b3)-(b0+b1)) - ((b0+b2)-(b0)); 
 estimate "(Pa1b1-Pa1b2)-(Pa2b1-Pa2b2)" (logistic(b0+b1+b2+b3)-logistic(b0+b1)) - (logistic(b0+b2)-logistic(b0)); 
 run; 
Occasional Contributor
Posts: 10

Re: Difference-in-Difference analysis

Posted in reply to StatDave_sas
Hi and thank you so much for your response. I tried to incorporate your code into mine and it is working somewhat. However, I am not sure how to interpret. My logit model is as follows:


proc surveylogistic data=FINALCITYDATA3_means_wins;
cluster auditor_fkey;
model Switch(event='1') = sanc_PY post_violation sanc_PYPost cpa_acm cpaPost
growth absdacc invar gc modop icw tenure
roa loss leverage chglev cash size chgsize
m_a abnrml_fee
yfe2006 yfe2007 yfe2008 yfe2009 yfe2010
yfe2011 yfe2012 yfe2013 yfe2014 yfe2015
sic_2 sic_7 sic_8 sic_9 sic_10 sic_12 sic_13
sic_14 sic_15 sic_16 sic_17 sic_20 sic_21 sic_22 sic_23
sic_24 sic_25 sic_26 sic_27 sic_28 sic_29 sic_30 sic_31
sic_32 sic_33 sic_34 sic_35 sic_36 sic_37 sic_38 sic_39
sic_40 sic_41 sic_42 sic_44 sic_45 sic_46 sic_47 sic_48
sic_49 sic_50 sic_51 sic_52 sic_53 sic_54 sic_55 sic_56
sic_57 sic_58 sic_59 sic_70 sic_72 sic_73 sic_75 sic_78
sic_79 sic_80 sic_82 sic_83 sic_87 sic_89/rsq DF=infinity;
output out = SwitchwCPA_CITYPY p = prob xbeta = logit;
run;

in the paper that I am following the first thing they compare is the coefficient on the intercept to the coefficient on the intercept plus the sanc_PY variable . Obviously, the difference between those two is the sanc_PY variable. They are looking at if that difference is significant or not. Does the code you provide accomplish this? Sorry, but this is beyond my level of expertise. I am trying!

Thank you again!!
SAS Employee
Posts: 282

Re: Difference-in-Difference analysis

That parameter by itself is just a log odds ratio for the effect of a unit increase of the sanc_PY predictor. That has nothing to do with a "difference in difference" analysis which involves two binary predictors and looks at the effect of changing the level of the first predictor while at one level of the second predictor (the first difference) compared to the effect of changing the level of first predictor while at the other level of the second predictor (the second difference). The difference of those two differences is the interaction parameter of the two predictors. There is no interaction in your model.

Occasional Contributor
Posts: 10

Re: Difference-in-Difference analysis

Posted in reply to StatDave_sas

I know the model may be hard to read. The interaction is between the sanc_cy and post_violation variables.  Both are indicator variables.  Is this what you mean?

Occasional Contributor
Posts: 10

Re: Difference-in-Difference analysis

In the model, this is the sanc_PYPost. I am following another paper, that basically did the same analysis.
SAS Super FREQ
Posts: 3,753

Re: Difference-in-Difference analysis

What is the reference for the paper that you are following?

Occasional Contributor
Posts: 10

Re: Difference-in-Difference analysis

Boone et. al (2015)

Did the 2007 PCAOB Disciplinary Order against Deloitte Impose Actual costs on the firm or improve its audit quality?

Solution
‎08-02-2017 09:40 AM
SAS Employee
Posts: 282

Re: Difference-in-Difference analysis

It would be better if you used the CLASS statement to specify your categorical variables rather than create your own dummy variables separately as you've apparently done. Then you can specify the interaction as sanc_PY*post_violation.  In any case, your interaction parameter is, as I mentioned, the difference in difference estimate on the log odds scale.  And this is what you show in your spreadsheet in your original post - the delta3 value is presumably your interaction parameter estimate.  The test of its significance is in the parameter estimates table from the procedure. Similarly, the delta1 and delta2 differences are the two main effect parameter estimates for your sanc_PY and post_violation variables.  For the differences delta1+delta3 (note this is incorrectly shown as delta1+delta2 in your spreadsheet) and delta2+delta3, you will need to use ESTIMATE statements.  The NLEstimate macro could be used too, but isn't necessary as these are just linear, not nonlinear, contrasts.

 

estimate 'd1+d3' sanc_PY 1 sanc_PYPost 1;

estimate 'd2+d3' post_violation 1 sanc_PYPost 1;

 

Occasional Contributor
Posts: 10

Re: Difference-in-Difference analysis

Posted in reply to StatDave_sas
Thank you for your assistance! You've been a great help! One last question... I've looked up the estimate statement and it seems that I can include the intercept and a variable. As in the paper I'm following, Boone appears to test the Intercept (B0) and a variable of interest (Deloitte_py in their paper). When I try to estimate this in my regression, I get 'non-est' as the estimate result.


Please advise..


Thank you again!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 642 views
  • 7 likes
  • 4 in conversation