I am trying to figure out how to conduct a difference in difference analysis between coefficients. The paper I am following compares one group to another and a timer period to another. Below, is basically what they are doing. What I don't know how to do is put this in SAS so I can figure out if the difference between the coefficients you see below are significant. Any help is greatly appreciated!!!
Panel B: Difference-in-Differences Analysis | ||||||
Loss Sample | ||||||
Period | Description | Non-Santioned | Sanctioned | Diff. | ||
Pre-Sanction | Log-odds switching risk | 0.927 | 2.412760731 | * | 1.486045197 | |
Measured by | β0 | β0 + δ1 | δ1 | |||
Post-Santion | Log-odds switching risk | 0.716062028 | 1.85913691 | 1.275391691 | ||
Measured by | β0 + δ2 | β0 + δ1 + δ2 + δ3 | δ1 + δ2 | |||
Change | Log-odds switching risk | -0.210653506 | 0.825418811 | 1.036072317 | ||
Measured by | δ2 | δ2 + δ3 | δ3 |
It would be better if you used the CLASS statement to specify your categorical variables rather than create your own dummy variables separately as you've apparently done. Then you can specify the interaction as sanc_PY*post_violation. In any case, your interaction parameter is, as I mentioned, the difference in difference estimate on the log odds scale. And this is what you show in your spreadsheet in your original post - the delta3 value is presumably your interaction parameter estimate. The test of its significance is in the parameter estimates table from the procedure. Similarly, the delta1 and delta2 differences are the two main effect parameter estimates for your sanc_PY and post_violation variables. For the differences delta1+delta3 (note this is incorrectly shown as delta1+delta2 in your spreadsheet) and delta2+delta3, you will need to use ESTIMATE statements. The NLEstimate macro could be used too, but isn't necessary as these are just linear, not nonlinear, contrasts.
estimate 'd1+d3' sanc_PY 1 sanc_PYPost 1;
estimate 'd2+d3' post_violation 1 sanc_PYPost 1;
Not much information provided but I am going to guess that a possible starting place is Proc Logisitic with LOGIT option on the Model statement to generate Log odds output.
You may also be looking for LSMESTIMATES to do some of the hypothesis tests but that's a guess.
The interaction parameter in a two-way logistic model with binary variables estimates the difference in difference of log odds. (For an ordinary regression model, it estimates the difference in difference of means). If you want to estimate the difference in difference of probabilities, then you need to apply the inverse of the logit link to obtain each probability. This can be done with either the NLEstimate macro or using the ESTIMATE statement in PROC NLMIXED.
For example, if A and B are both binary with values 1 and 2, and Y is binary with values 0 and 1, the PROC LOGISTIC statements below fit the logistic model with interaction. The interaction parameter estimates the difference in difference of log odds. The MEANS, TRANSPOSE, and DATA steps use the saved estimated probabilities and log odds (xbeta) to compute the difference in difference of probabilities and of log odds. NLMIXED then refits the logistic model. The first ESTIMATE statement shows that the difference in difference of log odds is just the interaction parameter. The second ESTIMATE statement applies the inverse logit link (via the LOGISTIC function) to each part to compute the difference in difference of probabilities.
proc logistic data=mydata;
class a b / param=ref;
model y(event="1")=a|b;
output out=out p=p xbeta=xb;
store log;
run;
%NLEstimate(instore=log, label=diff in diff probs,
f=(logistic(b_p1+b_p2+b_p3+b_p4)-logistic(b_p1+b_p2)) - (logistic(b_p1+b_p3)-logistic(b_p1)), df=100)
%NLEstimate(instore=log, label=diff in diff log odds,
f=((b_p1+b_p2+b_p3+b_p4)-(b_p1+b_p2)) - ((b_p1+b_p3)-(b_p1)), df=100)
proc means data=out mean nway;
class a b;
var p xb;
output out=probs mean=;
run;
proc transpose data=probs out=tprobs;
var p xb;
run;
data tprobs;
set tprobs;
difdif=(col1-col3)-(col2-col4);
run;
proc print noobs;
var _label_ difdif;
run;
proc nlmixed data=mydata;
p=logistic(b0 + b1*(a=1) + b2*(b=1) + b3*(a=1 and b=1));
model y ~ binary(p);
estimate "(a1b1-a1b2)-(a2b1-a2b2)" ((b0+b1+b2+b3)-(b0+b1)) - ((b0+b2)-(b0));
estimate "(Pa1b1-Pa1b2)-(Pa2b1-Pa2b2)" (logistic(b0+b1+b2+b3)-logistic(b0+b1)) - (logistic(b0+b2)-logistic(b0));
run;
That parameter by itself is just a log odds ratio for the effect of a unit increase of the sanc_PY predictor. That has nothing to do with a "difference in difference" analysis which involves two binary predictors and looks at the effect of changing the level of the first predictor while at one level of the second predictor (the first difference) compared to the effect of changing the level of first predictor while at the other level of the second predictor (the second difference). The difference of those two differences is the interaction parameter of the two predictors. There is no interaction in your model.
I know the model may be hard to read. The interaction is between the sanc_cy and post_violation variables. Both are indicator variables. Is this what you mean?
What is the reference for the paper that you are following?
Boone et. al (2015)
Did the 2007 PCAOB Disciplinary Order against Deloitte Impose Actual costs on the firm or improve its audit quality?
It would be better if you used the CLASS statement to specify your categorical variables rather than create your own dummy variables separately as you've apparently done. Then you can specify the interaction as sanc_PY*post_violation. In any case, your interaction parameter is, as I mentioned, the difference in difference estimate on the log odds scale. And this is what you show in your spreadsheet in your original post - the delta3 value is presumably your interaction parameter estimate. The test of its significance is in the parameter estimates table from the procedure. Similarly, the delta1 and delta2 differences are the two main effect parameter estimates for your sanc_PY and post_violation variables. For the differences delta1+delta3 (note this is incorrectly shown as delta1+delta2 in your spreadsheet) and delta2+delta3, you will need to use ESTIMATE statements. The NLEstimate macro could be used too, but isn't necessary as these are just linear, not nonlinear, contrasts.
estimate 'd1+d3' sanc_PY 1 sanc_PYPost 1;
estimate 'd2+d3' post_violation 1 sanc_PYPost 1;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.