turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Difference-in-Difference analysis

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-27-2017 04:40 PM

I am trying to figure out how to conduct a difference in difference analysis between coefficients. The paper I am following compares one group to another and a timer period to another. Below, is basically what they are doing. What I don't know how to do is put this in SAS so I can figure out if the difference between the coefficients you see below are significant. Any help is greatly appreciated!!!

Panel B: Difference-in-Differences Analysis | ||||||

Loss Sample | ||||||

Period | Description | Non-Santioned | Sanctioned | Diff. | ||

Pre-Sanction | Log-odds switching risk | 0.927 | 2.412760731 | * | 1.486045197 | |

Measured by | β0 | β0 + δ1 | δ1 | |||

Post-Santion | Log-odds switching risk | 0.716062028 | 1.85913691 | 1.275391691 | ||

Measured by | β0 + δ2 | β0 + δ1 + δ2 + δ3 | δ1 + δ2 | |||

Change | Log-odds switching risk | -0.210653506 | 0.825418811 | 1.036072317 | ||

Measured by | δ2 | δ2 + δ3 | δ3 |

Accepted Solutions

Solution

08-02-2017
09:40 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to samme

07-31-2017 11:10 AM

It would be better if you used the CLASS statement to specify your categorical variables rather than create your own dummy variables separately as you've apparently done. Then you can specify the interaction as sanc_PY*post_violation. In any case, your interaction parameter is, as I mentioned, the difference in difference estimate on the log odds scale. And this is what you show in your spreadsheet in your original post - the delta3 value is presumably your interaction parameter estimate. The test of its significance is in the parameter estimates table from the procedure. Similarly, the delta1 and delta2 differences are the two main effect parameter estimates for your sanc_PY and post_violation variables. For the differences delta1+delta3 (note this is incorrectly shown as delta1+delta2 in your spreadsheet) and delta2+delta3, you will need to use ESTIMATE statements. The NLEstimate macro could be used too, but isn't necessary as these are just linear, not nonlinear, contrasts.

estimate 'd1+d3' sanc_PY 1 sanc_PYPost 1;

estimate 'd2+d3' post_violation 1 sanc_PYPost 1;

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to samme

07-27-2017 06:25 PM

Not much information provided but I am going to guess that a possible starting place is Proc Logisitic with LOGIT option on the Model statement to generate Log odds output.

You may also be looking for LSMESTIMATES to do some of the hypothesis tests but that's a guess.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

07-27-2017 06:44 PM

Hi and thank you for the response! I am using proc surveylogistic and am clustering by auditor. I have the regression coefficients as see in the excel spreadsheet I included. I am following another paper that looks at the difference between the regression results (as you see on the spreadsheet). I don't know how they are determining if the difference is significant. So, the difference between beta zero and beta zero plus the coefficient of interest, for example. I can calculate the difference. How can I determine if it is significant? I am new at this and find it very challenging.. Because I am using surveylogistic so that I can cluster by auditor, I don't think the logit option is available. I added it to the model statement and it generated an error.

Thank you!!

Thank you!!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to samme

07-28-2017 01:17 PM

The interaction parameter in a two-way logistic model with binary variables estimates the difference in difference of log odds. (For an ordinary regression model, it estimates the difference in difference of means). If you want to estimate the difference in difference of probabilities, then you need to apply the inverse of the logit link to obtain each probability. This can be done with either the NLEstimate macro or using the ESTIMATE statement in PROC NLMIXED.

For example, if A and B are both binary with values 1 and 2, and Y is binary with values 0 and 1, the PROC LOGISTIC statements below fit the logistic model with interaction. The interaction parameter estimates the difference in difference of log odds. The MEANS, TRANSPOSE, and DATA steps use the saved estimated probabilities and log odds (xbeta) to compute the difference in difference of probabilities and of log odds. NLMIXED then refits the logistic model. The first ESTIMATE statement shows that the difference in difference of log odds is just the interaction parameter. The second ESTIMATE statement applies the inverse logit link (via the LOGISTIC function) to each part to compute the difference in difference of probabilities.

```
proc logistic data=mydata;
class a b / param=ref;
model y(event="1")=a|b;
output out=out p=p xbeta=xb;
store log;
run;
%NLEstimate(instore=log, label=diff in diff probs,
f=(logistic(b_p1+b_p2+b_p3+b_p4)-logistic(b_p1+b_p2)) - (logistic(b_p1+b_p3)-logistic(b_p1)), df=100)
%NLEstimate(instore=log, label=diff in diff log odds,
f=((b_p1+b_p2+b_p3+b_p4)-(b_p1+b_p2)) - ((b_p1+b_p3)-(b_p1)), df=100)
proc means data=out mean nway;
class a b;
var p xb;
output out=probs mean=;
run;
proc transpose data=probs out=tprobs;
var p xb;
run;
data tprobs;
set tprobs;
difdif=(col1-col3)-(col2-col4);
run;
proc print noobs;
var _label_ difdif;
run;
proc nlmixed data=mydata;
p=logistic(b0 + b1*(a=1) + b2*(b=1) + b3*(a=1 and b=1));
model y ~ binary(p);
estimate "(a1b1-a1b2)-(a2b1-a2b2)" ((b0+b1+b2+b3)-(b0+b1)) - ((b0+b2)-(b0));
estimate "(Pa1b1-Pa1b2)-(Pa2b1-Pa2b2)" (logistic(b0+b1+b2+b3)-logistic(b0+b1)) - (logistic(b0+b2)-logistic(b0));
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to StatDave_sas

07-28-2017 11:00 PM

Hi and thank you so much for your response. I tried to incorporate your code into mine and it is working somewhat. However, I am not sure how to interpret. My logit model is as follows:

proc surveylogistic data=FINALCITYDATA3_means_wins;

cluster auditor_fkey;

model Switch(event='1') = sanc_PY post_violation sanc_PYPost cpa_acm cpaPost

growth absdacc invar gc modop icw tenure

roa loss leverage chglev cash size chgsize

m_a abnrml_fee

yfe2006 yfe2007 yfe2008 yfe2009 yfe2010

yfe2011 yfe2012 yfe2013 yfe2014 yfe2015

sic_2 sic_7 sic_8 sic_9 sic_10 sic_12 sic_13

sic_14 sic_15 sic_16 sic_17 sic_20 sic_21 sic_22 sic_23

sic_24 sic_25 sic_26 sic_27 sic_28 sic_29 sic_30 sic_31

sic_32 sic_33 sic_34 sic_35 sic_36 sic_37 sic_38 sic_39

sic_40 sic_41 sic_42 sic_44 sic_45 sic_46 sic_47 sic_48

sic_49 sic_50 sic_51 sic_52 sic_53 sic_54 sic_55 sic_56

sic_57 sic_58 sic_59 sic_70 sic_72 sic_73 sic_75 sic_78

sic_79 sic_80 sic_82 sic_83 sic_87 sic_89/rsq DF=infinity;

output out = SwitchwCPA_CITYPY p = prob xbeta = logit;

run;

in the paper that I am following the first thing they compare is the coefficient on the intercept to the coefficient on the intercept plus the sanc_PY variable . Obviously, the difference between those two is the sanc_PY variable. They are looking at if that difference is significant or not. Does the code you provide accomplish this? Sorry, but this is beyond my level of expertise. I am trying!

Thank you again!!

proc surveylogistic data=FINALCITYDATA3_means_wins;

cluster auditor_fkey;

model Switch(event='1') = sanc_PY post_violation sanc_PYPost cpa_acm cpaPost

growth absdacc invar gc modop icw tenure

roa loss leverage chglev cash size chgsize

m_a abnrml_fee

yfe2006 yfe2007 yfe2008 yfe2009 yfe2010

yfe2011 yfe2012 yfe2013 yfe2014 yfe2015

sic_2 sic_7 sic_8 sic_9 sic_10 sic_12 sic_13

sic_14 sic_15 sic_16 sic_17 sic_20 sic_21 sic_22 sic_23

sic_24 sic_25 sic_26 sic_27 sic_28 sic_29 sic_30 sic_31

sic_32 sic_33 sic_34 sic_35 sic_36 sic_37 sic_38 sic_39

sic_40 sic_41 sic_42 sic_44 sic_45 sic_46 sic_47 sic_48

sic_49 sic_50 sic_51 sic_52 sic_53 sic_54 sic_55 sic_56

sic_57 sic_58 sic_59 sic_70 sic_72 sic_73 sic_75 sic_78

sic_79 sic_80 sic_82 sic_83 sic_87 sic_89/rsq DF=infinity;

output out = SwitchwCPA_CITYPY p = prob xbeta = logit;

run;

in the paper that I am following the first thing they compare is the coefficient on the intercept to the coefficient on the intercept plus the sanc_PY variable . Obviously, the difference between those two is the sanc_PY variable. They are looking at if that difference is significant or not. Does the code you provide accomplish this? Sorry, but this is beyond my level of expertise. I am trying!

Thank you again!!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to samme

07-29-2017 04:47 PM

That parameter by itself is just a log odds ratio for the effect of a unit increase of the sanc_PY predictor. That has nothing to do with a "difference in difference" analysis which involves two binary predictors and looks at the effect of changing the level of the first predictor while at one level of the second predictor (the first difference) compared to the effect of changing the level of first predictor while at the other level of the second predictor (the second difference). The difference of those two differences is the interaction parameter of the two predictors. There is no interaction in your model.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to StatDave_sas

07-29-2017 06:25 PM

I know the model may be hard to read. The interaction is between the sanc_cy and post_violation variables. Both are indicator variables. Is this what you mean?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to samme

07-29-2017 06:27 PM

In the model, this is the sanc_PYPost. I am following another paper, that basically did the same analysis.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to samme

07-29-2017 10:27 PM

What is the reference for the paper that you are following?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

07-30-2017 10:28 AM

Boone et. al (2015)

Did the 2007 PCAOB Disciplinary Order against Deloitte Impose Actual costs on the firm or improve its audit quality?

Solution

08-02-2017
09:40 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to samme

07-31-2017 11:10 AM

It would be better if you used the CLASS statement to specify your categorical variables rather than create your own dummy variables separately as you've apparently done. Then you can specify the interaction as sanc_PY*post_violation. In any case, your interaction parameter is, as I mentioned, the difference in difference estimate on the log odds scale. And this is what you show in your spreadsheet in your original post - the delta3 value is presumably your interaction parameter estimate. The test of its significance is in the parameter estimates table from the procedure. Similarly, the delta1 and delta2 differences are the two main effect parameter estimates for your sanc_PY and post_violation variables. For the differences delta1+delta3 (note this is incorrectly shown as delta1+delta2 in your spreadsheet) and delta2+delta3, you will need to use ESTIMATE statements. The NLEstimate macro could be used too, but isn't necessary as these are just linear, not nonlinear, contrasts.

estimate 'd1+d3' sanc_PY 1 sanc_PYPost 1;

estimate 'd2+d3' post_violation 1 sanc_PYPost 1;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to StatDave_sas

07-31-2017 06:08 PM

Thank you for your assistance! You've been a great help! One last question... I've looked up the estimate statement and it seems that I can include the intercept and a variable. As in the paper I'm following, Boone appears to test the Intercept (B0) and a variable of interest (Deloitte_py in their paper). When I try to estimate this in my regression, I get 'non-est' as the estimate result.

Please advise..

Thank you again!

Please advise..

Thank you again!