10-14-2013 06:49 AM
Hello Expert SAS users,
I’m in the midst of doing some SRD analysis (Sharp Regression Discontinuity) in SAS.
The technique is still somewhat rare, so I’ve had significant problems finding textbook examples. There are a bunch of theoretical articles, but not that many that seems to explain how to implement it in SAS.
The few sources I have found don’t seem to agree at all on how to implement it. Some say it requires that I write a macro in SAS, while others claim that you can just run 2 regressions and calculate the differences in mean between the two, while even others suggest a pooled regression – but not how to do it (because then you can use the tests carried out in the regression). Even the regressions themselves, seem to be a topic of debate, as the suggested regressors vary a great deal. (Some even suggest that it doesn’t matter whether you use an OLS or GLS regression).
The few people I know that has used this technique, have all used STATA, as it apparently has a RD package, but I prefer to use SAS, as all my other analysis are done here.
So my question is: What is the best way to do SRD in SAS, and are there any good textbooks/ARTICLES, preferably with examples and code, that I can use? I have searched the internet a lot, which has given me plenty of ideas, but also plenty of questions.
My data is the following:
I look at the number of applicants for a government program, and the number of accepted into this program. My hypothesis is that the number of applicants sharply rose, because of a shift in policy, which led to a decrease in the quality of applications. So therefore I use RD to compare the applicants before and after the policy shift, to see if the applicants after the policy shift have a much lower acceptance rate, than before the policy shift.
If the acceptance rate is lower, it is evidence of a lower ‘quality’ of applicants.
Any help is appreciated.
10-15-2013 08:38 AM
This is a long shot, but take a look at Example 55.12 Change Point Models in the PROC MCMC documentation.
I am basing everything on a single graph, presented at the end of the example, that looks something like what I would expect from a regression discontinuity.
10-16-2013 07:51 AM
Thanks for the suggestion, but as far as I can gather, the example is about a change in the coeffecient/slope after a given point and not a jump in the mean as seen in a RD design.
10-23-2013 12:54 PM
I think you could get around that by parameterizing the model slightly differently, so that it has a change in both the slope and intercept, which would be equivalent to the jump in the mean.
12-26-2013 11:36 AM
I too have had a similar dilemma with RD designs especially in terms of estimation in SAS. I've combed the internet as well as past SAS Global Forum papers and found very little. I understand RD designs as a quasi-experimental method in the context of Angrist and Pishke. They loosely claim that even a basic regression with a dummy intercept capturing the discontinuity - see more details here on my interpretation of this) will do a pretty good job capturing the treatment effect. In terms of estimation in SAS, I have compared results from the RD function in R (which allows for more complicated local linear regression techniques and bandwitdth selection similar to STATA) and compared the results to an estimation in SAS using PROC GLM or GLIMMIX with interactions and obtained very similar estimation of the treatment effect. I also try to keep most of my analysis in SAS for production analytics and interoperability with other analysts. Again, I'm thinking of RD as a quasi-experimental method that identifies the treatment effect using the quasi-experimental variation generated by near random treatment assignment near the cutoff Xo. Your description does seem like an analysis of a discontinuity, but not exactly in a context I'm familiar with. Could your treatment effect also be estimated using other methods that exploit discontinuities such as a difference-in-difference approach or interrupted time series? I also have had the same issue with DD as RD in terms of good references doing the estimation in SAS. Interrupted time series is much more proliferate in terms of examples and documentation. I'm very new with these methods so I'm interested in learning more from responses that hopefully will follow.