Hello. I hope I've posted to the correct location. I'm using SAS version 9.4. I'm not a novice SAS user, but I'm not an expert. I use SAS in the context of applied public health, and any assistance using straightforward language, in addition to/instead of technical jargon, would be greatly appreciated.
I'm working on a difference-in-difference (DID) analysis of the impact of public housing demolition on violent crime at the census tract level (data sample attached). I am using annual data, and my timeline is from five years prior to the intervention (demolition) to five years post-intervention. My intervention group is a set of 8 "target" census tracts where public housing was demolished. My comparison group is a group of 8 "other public housing (OPH)" census tracts where public housing underwent routine maintenance. I have already conducted a couple of DID analyses using pooled data. My question stems from the third analysis--an event study. The demolitions in the target tracts occurred in different years, so in order to use the OPH tracts as a comparison group I matched the target and oph tracts based on the % males ages 15-34, and then assigned the 5-year pre-post timing of the target tract to its oph tract pair. Therefore I have a variable for each tract, as well as a "couplet" variable for each target/oph pair. The variable C00 is to include a variable that accounts for the fact that there are data from two censuses in the model.
My questions are:
1)I think I have a multilevel model here. I'm looking at census tracts and couplets of census tracts, but this isn't quite the same as looking at, for example, appointments of patients in clinics as in this SAS guide (http://support.sas.com/kb/40/724.html).I used this code first:
random int/subject=tract;
random int/subject=couplet(tract) type=un;
thinking that the tract level was the larger level and the couplet was the smaller one. The model converges in 13 iterations, with no error messages. Then someone told me they saw my data the other way around--that "couplet" was the larger level and tract was the smaller level. So I switched the code to reflect that. The results were the same as before, but I had to use the "nloptions" command to increase the number of iterations so the model would converge, and I received an error message ("Estimated G matrix is not positive definite."). Was my first crack at the code correct then?
2.) Am I using the "random" code correctly? I'm not sure I need both lines. And maybe I just need the second line (random int/subject=couplet(tract) type=un; ). I've attempted to use the covtest code to test this, but I'm not completely understanding how to order the "0 ." and what's going on behind the scenes in SAS so that I can interpret the output.
3.) One last, model-building oriented question: None of my covariates were statistically significantly contributing to the model. This makes sense given the similarity of the target and oph tract demographics. Can I then exclude C00 from this model since C00 is meant to tell the model the census data (which I'm not including) came from two separate censuses?
I included all of the event study code and side notes I have at present to show my thought process. Thank you very much for reading all of this, and for your feedback. I hope I've described my process clearly.
proc glimmix data=main.dataset;
class tract couplet exposed (ref="0") timeline (ref="-1") c00;
model totcrime = timeline*pre timeline*post exposed timeline*pre*exposed timeline*post*exposed c00/
solution
dist = negbin
link = log
offset = logpopyrs
cl;
random int/subject=tract; *Original code-- the model converges in 13 iterations;
random int/subject=couplet(tract) type=un; *Using this line of code by itself and with the one above comes up with the same results;
/*random int/subject=couplet type=un; *edited code-as suggested by someone else;*/
/*random int/subject=tract(couplet) type=un;*edited code-as suggested by someone else;*/
/*random int/subject=couplet type=un; *tested this most recently. Results don't make sense based on background knowledge.;*/
nloptions maxiter=120;
/*covtest 'No random couplet effect' zeroG; *P<.001 says the couplet effect (first random statement) is necessary in the model (http://support.sas.com/kb/40/724.html);*/
/*covtest 'var(couplet)=0' 0 ./estimates;*/
covtest 'var(tract)=0' . 0;
/*covtest 'var(couplet(tract))=0' 0 .;*/
run;
For questions 1) and 2), you can definitely use more than one RANDOM statement. Whether to use more than one depends in part on the structure of your data because it is correct to use multiple random statements in PROC GLIMMIX when you are trying to find a hierarchical linear model with nested data. The code you posted is requesting a random intercept for tract and a random intercept for couplet nested within tract. It might be helpful for you to add the "s" keyword at the end of your random statements so that you can see the estimates.
For 3), it might depend on the conventions in your discipline...sometimes researchers leave control variables like c00 in the model to show that that parameter estimates are produced while adjusting for it.
I don't know how to do difference-in-difference designs so I cannot provide much help about the specific model building questions with those designs.
For questions 1) and 2), you can definitely use more than one RANDOM statement. Whether to use more than one depends in part on the structure of your data because it is correct to use multiple random statements in PROC GLIMMIX when you are trying to find a hierarchical linear model with nested data. The code you posted is requesting a random intercept for tract and a random intercept for couplet nested within tract. It might be helpful for you to add the "s" keyword at the end of your random statements so that you can see the estimates.
For 3), it might depend on the conventions in your discipline...sometimes researchers leave control variables like c00 in the model to show that that parameter estimates are produced while adjusting for it.
I don't know how to do difference-in-difference designs so I cannot provide much help about the specific model building questions with those designs.
Thank you very much for your response.
I think I found an answer for my second question here: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_mixed_examples05.htm
One issue with DID designs is that the value may be negative, which would mean that you are going to drop all of those values due to the log link. You may be better off modeling the actual values, with the baseline value as a covariate.
SteveDenham
There is a difference between obtaining a negative estimate for a parameter in the log space, and having a negative number as a result of calculating difference in difference. The former may arise if there is a decrease in the response variable considered as a whole within a parameter of interest (say age group), where you may get a smaller difference in difference for an older age cohort. My concern was what happens when a measurement for a subject is negative. In that case, trying to use a log link will result in a missing value for that subject as the log of a negative number cannot be calculated.
As far as the other (exponentiating the estimates) question, you can do that through the use of an LSMESTIMATE statement and the ILINK option.
SteveDenham
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.