BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
DID
Fluorite | Level 6 DID
Fluorite | Level 6

Hello. I hope I've posted to the correct location. I'm using SAS version 9.4. I'm not a novice SAS user, but I'm not an expert. I use SAS in the context of applied public health, and any assistance using straightforward language, in addition to/instead of technical jargon, would be greatly appreciated.

 

I'm working on a difference-in-difference (DID) analysis of the impact of public housing demolition on violent crime at the census tract level (data sample attached). I am using annual data, and my timeline is from five years prior to the intervention (demolition) to five years post-intervention. My intervention group is a set of 8 "target" census tracts where public housing was demolished. My comparison group is a group of 8 "other public housing (OPH)" census tracts where public housing underwent routine maintenance. I have already conducted a couple of DID analyses using pooled data. My question stems from the third analysis--an event study. The demolitions in the target tracts occurred in different years, so in order to use the OPH tracts as a comparison group I matched the target and oph tracts based on the % males ages 15-34, and then assigned the 5-year pre-post timing of the target tract to its oph tract pair. Therefore I have a variable for each tract, as well as a "couplet" variable for each target/oph pair. The variable C00 is to include a variable that accounts for the fact that there are data from two censuses in the model.

 

My questions are:

1)I think I have a multilevel model here. I'm looking at census tracts and couplets of census tracts, but this isn't quite the same as looking at, for example, appointments of patients in clinics as in this SAS guide (http://support.sas.com/kb/40/724.html).I used this code first:

random int/subject=tract;
random int/subject=couplet(tract) type=un;

thinking that the tract level was the larger level and the couplet was the smaller one. The model converges in 13 iterations, with no error messages. Then someone told me they saw my data the other way around--that "couplet" was the larger level and tract was the smaller level. So I switched the code to reflect that. The results were the same as before, but I had to use the "nloptions" command to increase the number of iterations so the model would converge, and I received an error message ("Estimated G matrix is not positive definite."). Was my first crack at the code correct then?

 

2.) Am I using the "random" code correctly? I'm not sure I need both lines. And maybe I just need the second line (random int/subject=couplet(tract) type=un; ). I've attempted to use the covtest code to test this, but I'm not completely understanding how to order the "0 ." and what's going on behind the scenes in SAS so that I can interpret the output.

 

3.) One last, model-building oriented question: None of my covariates were statistically significantly contributing to the model. This makes sense given the similarity of the target and oph tract demographics. Can I then exclude C00 from this model since C00 is meant to tell the model the census data (which I'm not including) came from two separate censuses?

 

I included all of the event study code and side notes I have at present to show my thought process. Thank you very much for reading all of this, and for your feedback. I hope I've described my process clearly.

 

proc glimmix data=main.dataset;
class tract couplet exposed (ref="0") timeline (ref="-1") c00;
model totcrime = timeline*pre timeline*post exposed timeline*pre*exposed timeline*post*exposed c00/
solution
dist = negbin
link = log
offset = logpopyrs
cl;

random int/subject=tract; *Original code-- the model converges in 13 iterations;
random int/subject=couplet(tract) type=un; *Using this line of code by itself and with the one above comes up with the same results;

 

/*random int/subject=couplet type=un; *edited code-as suggested by someone else;*/
/*random int/subject=tract(couplet) type=un;*edited code-as suggested by someone else;*/

 

/*random int/subject=couplet type=un; *tested this most recently. Results don't make sense based on background knowledge.;*/


nloptions maxiter=120;
/*covtest 'No random couplet effect' zeroG; *P<.001 says the couplet effect (first random statement) is necessary in the model (http://support.sas.com/kb/40/724.html);*/

/*covtest 'var(couplet)=0' 0 ./estimates;*/
covtest 'var(tract)=0' . 0;
/*covtest 'var(couplet(tract))=0' 0 .;*/
run;

1 ACCEPTED SOLUTION

Accepted Solutions
svh
Lapis Lazuli | Level 10 svh
Lapis Lazuli | Level 10

For questions 1) and 2), you can definitely use more than one RANDOM statement. Whether to use more than one depends in part on the structure of your data because it is correct to use multiple random statements in PROC GLIMMIX when you are trying to find a hierarchical linear model with nested data. The code you posted is requesting a random intercept for tract and a random intercept for couplet nested within tract. It might be helpful for you to add the "s" keyword at the end of your random statements so that you can see the estimates. 

 

For 3), it might depend on the conventions in your discipline...sometimes researchers leave control variables like c00 in the model to show that that parameter estimates are produced while adjusting for it. 

 

I don't know how to do difference-in-difference designs so I cannot provide much help about the specific model building questions with those designs. 

View solution in original post

7 REPLIES 7
svh
Lapis Lazuli | Level 10 svh
Lapis Lazuli | Level 10

For questions 1) and 2), you can definitely use more than one RANDOM statement. Whether to use more than one depends in part on the structure of your data because it is correct to use multiple random statements in PROC GLIMMIX when you are trying to find a hierarchical linear model with nested data. The code you posted is requesting a random intercept for tract and a random intercept for couplet nested within tract. It might be helpful for you to add the "s" keyword at the end of your random statements so that you can see the estimates. 

 

For 3), it might depend on the conventions in your discipline...sometimes researchers leave control variables like c00 in the model to show that that parameter estimates are produced while adjusting for it. 

 

I don't know how to do difference-in-difference designs so I cannot provide much help about the specific model building questions with those designs. 

DID
Fluorite | Level 6 DID
Fluorite | Level 6

Thank you very much for your response.

DID
Fluorite | Level 6 DID
Fluorite | Level 6
@svh: Thank you for the suggestion if adding "s" to the random statements. Two follow up questions if you don't mind. Please let me know if it would be easier to answer these questions with screenshots provided.

Using both of these lines of code:
random int/subject=tract s;
random int/subject=couplet(tract) type=un s;

followed by these:
covtest 'var(tract)=0' . 0;
covtest 'var(couplet(tract))=0' 0 .;

I get output for the covtests that says that the random effects are not significant (p=1.000 for both, note: MI); however, if I run a model using each random statement individually and perform the covtest for the respective random statement, the output says each random effect is significant (p<.0001, MI). Also, the parameters provided by "s" are the same in the models where the random statements are added in individually. If I use both random statement in the model, the estimates provided as a result of each random statement (tract and couplet(tract)) are different.

1. Does this mean I should use just the "random int/subject=couplet(tract) type=un s;" statement since it indicates to SAS that the model is hierarchical? Whether I use one random statement or both, the fixed effects parameters stay the same.

2. Since the output for the random effects estimates provides intercepts, then after exponentiating the estimates, the interpretation is the baseline level of violent crime/1000 person years for each census tract, correct?

Thanks very much.
SteveDenham
Jade | Level 19

One issue with DID designs is that the value may be negative, which would mean that you are going to drop all of those values due to the log link.  You may be better off modeling the actual values, with the baseline value as a covariate.

 

SteveDenham

DID
Fluorite | Level 6 DID
Fluorite | Level 6
Thank you for your reply Steve. As I've understood things so far, I would exponentiate the fixed effects estimates to obtain a ratio of rate ratios. So exponentiating a negative fixed effect estimate (e^Beta) provided by SAS would provide a percentage decrease in my intervention group vs. the control group. Is that how you understand it? I've wondered if there is a SAS option to ask SAS to exponentiate the fixed effects estimates for me. Do you know of one?
Additionally, I asked on a forum at one time about modeling the violent crime rates rather than using the log link, and I was told that doing that would risk imprecision in the error terms, and it was better to use the log link. Interested in your thoughts. Thank you.
SteveDenham
Jade | Level 19

There is a difference between obtaining a negative estimate for a parameter in the log space, and having a negative number as a result of calculating difference in difference.  The former may arise if there is a decrease in the response variable considered as a whole within a parameter of interest (say age group), where you may get a smaller difference in difference for an older age cohort.  My concern was what happens when a measurement for a subject is negative.  In that case, trying to use a log link will result in a missing value for that subject as the log of a negative number cannot be calculated.

 

As far as the other (exponentiating the estimates) question, you can do that through the use of an LSMESTIMATE statement and the ILINK option.

 

SteveDenham

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1289 views
  • 1 like
  • 3 in conversation