Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Am I using the RANDOM statement properly? (PROC GLIMMIX, event study, ...

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-11-2022 05:33 PM
(683 views)

Hello. I hope I've posted to the correct location. I'm using SAS version 9.4. I'm not a novice SAS user, but I'm not an expert. I use SAS in the context of applied public health, and any assistance using straightforward language, in addition to/instead of technical jargon, would be greatly appreciated.

I'm working on a difference-in-difference (DID) analysis of the impact of public housing demolition on violent crime at the census tract level (data sample attached). I am using annual data, and my timeline is from five years prior to the intervention (demolition) to five years post-intervention. My intervention group is a set of 8 "target" census tracts where public housing was demolished. My comparison group is a group of 8 "other public housing (OPH)" census tracts where public housing underwent routine maintenance. I have already conducted a couple of DID analyses using pooled data. My question stems from the third analysis--an event study. The demolitions in the target tracts occurred in different years, so in order to use the OPH tracts as a comparison group I matched the target and oph tracts based on the % males ages 15-34, and then assigned the 5-year pre-post timing of the target tract to its oph tract pair. Therefore I have a variable for each tract, as well as a "couplet" variable for each target/oph pair. The variable C00 is to include a variable that accounts for the fact that there are data from two censuses in the model.

My questions are:

1)I think I have a multilevel model here. I'm looking at census tracts and couplets of census tracts, but this isn't quite the same as looking at, for example, appointments of patients in clinics as in this SAS guide (http://support.sas.com/kb/40/724.html).I used this code first:

random int/subject=tract;

random int/subject=couplet(tract) type=un;

thinking that the tract level was the larger level and the couplet was the smaller one. The model converges in 13 iterations, with no error messages. Then someone told me they saw my data the other way around--that "couplet" was the larger level and tract was the smaller level. So I switched the code to reflect that. The results were the same as before, but I had to use the "nloptions" command to increase the number of iterations so the model would converge, and I received an error message ("Estimated G matrix is not positive definite."). Was my first crack at the code correct then?

2.) Am I using the "random" code correctly? I'm not sure I need both lines. And maybe I just need the second line (random int/subject=couplet(tract) type=un; ). I've attempted to use the covtest code to test this, but I'm not completely understanding how to order the "0 ." and what's going on behind the scenes in SAS so that I can interpret the output.

3.) One last, model-building oriented question: None of my covariates were statistically significantly contributing to the model. This makes sense given the similarity of the target and oph tract demographics. Can I then exclude C00 from this model since C00 is meant to tell the model the census data (which I'm not including) came from two separate censuses?

I included all of the event study code and side notes I have at present to show my thought process. Thank you very much for reading all of this, and for your feedback. I hope I've described my process clearly.

proc glimmix data=main.dataset;

class tract couplet exposed (ref="0") timeline (ref="-1") c00;

model totcrime = timeline*pre timeline*post exposed timeline*pre*exposed timeline*post*exposed c00/

solution

dist = negbin

link = log

offset = logpopyrs

cl;

random int/subject=tract; *Original code-- the model converges in 13 iterations;

random int/subject=couplet(tract) type=un; *Using this line of code by itself and with the one above comes up with the same results;

/*random int/subject=couplet type=un; *edited code-as suggested by someone else;*/

/*random int/subject=tract(couplet) type=un;*edited code-as suggested by someone else;*/

/*random int/subject=couplet type=un; *tested this most recently. Results don't make sense based on background knowledge.;*/

nloptions maxiter=120;

/*covtest 'No random couplet effect' zeroG; *P<.001 says the couplet effect (first random statement) is necessary in the model (http://support.sas.com/kb/40/724.html);*/

/*covtest 'var(couplet)=0' 0 ./estimates;*/

covtest 'var(tract)=0' . 0;

/*covtest 'var(couplet(tract))=0' 0 .;*/

run;

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

For questions 1) and 2), you can definitely use more than one RANDOM statement. Whether to use more than one depends in part on the structure of your data because it is correct to use multiple random statements in PROC GLIMMIX when you are trying to find a hierarchical linear model with nested data. The code you posted is requesting a random intercept for tract and a random intercept for couplet nested within tract. It might be helpful for you to add the "s" keyword at the end of your random statements so that you can see the estimates.

For 3), it might depend on the conventions in your discipline...sometimes researchers leave control variables like c00 in the model to show that that parameter estimates are produced while adjusting for it.

I don't know how to do difference-in-difference designs so I cannot provide much help about the specific model building questions with those designs.

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

For questions 1) and 2), you can definitely use more than one RANDOM statement. Whether to use more than one depends in part on the structure of your data because it is correct to use multiple random statements in PROC GLIMMIX when you are trying to find a hierarchical linear model with nested data. The code you posted is requesting a random intercept for tract and a random intercept for couplet nested within tract. It might be helpful for you to add the "s" keyword at the end of your random statements so that you can see the estimates.

For 3), it might depend on the conventions in your discipline...sometimes researchers leave control variables like c00 in the model to show that that parameter estimates are produced while adjusting for it.

I don't know how to do difference-in-difference designs so I cannot provide much help about the specific model building questions with those designs.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you very much for your response.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@svh: Thank you for the suggestion if adding "s" to the random statements. Two follow up questions if you don't mind. Please let me know if it would be easier to answer these questions with screenshots provided.

Using both of these lines of code:

random int/subject=tract s;

random int/subject=couplet(tract) type=un s;

followed by these:

covtest 'var(tract)=0' . 0;

covtest 'var(couplet(tract))=0' 0 .;

I get output for the covtests that says that the random effects are not significant (p=1.000 for both, note: MI); however, if I run a model using each random statement individually and perform the covtest for the respective random statement, the output says each random effect is significant (p<.0001, MI). Also, the parameters provided by "s" are the same in the models where the random statements are added in individually. If I use both random statement in the model, the estimates provided as a result of each random statement (tract and couplet(tract)) are different.

1. Does this mean I should use just the "random int/subject=couplet(tract) type=un s;" statement since it indicates to SAS that the model is hierarchical? Whether I use one random statement or both, the fixed effects parameters stay the same.

2. Since the output for the random effects estimates provides intercepts, then after exponentiating the estimates, the interpretation is the baseline level of violent crime/1000 person years for each census tract, correct?

Thanks very much.

Using both of these lines of code:

random int/subject=tract s;

random int/subject=couplet(tract) type=un s;

followed by these:

covtest 'var(tract)=0' . 0;

covtest 'var(couplet(tract))=0' 0 .;

I get output for the covtests that says that the random effects are not significant (p=1.000 for both, note: MI); however, if I run a model using each random statement individually and perform the covtest for the respective random statement, the output says each random effect is significant (p<.0001, MI). Also, the parameters provided by "s" are the same in the models where the random statements are added in individually. If I use both random statement in the model, the estimates provided as a result of each random statement (tract and couplet(tract)) are different.

1. Does this mean I should use just the "random int/subject=couplet(tract) type=un s;" statement since it indicates to SAS that the model is hierarchical? Whether I use one random statement or both, the fixed effects parameters stay the same.

2. Since the output for the random effects estimates provides intercepts, then after exponentiating the estimates, the interpretation is the baseline level of violent crime/1000 person years for each census tract, correct?

Thanks very much.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

One issue with DID designs is that the value may be negative, which would mean that you are going to drop all of those values due to the log link. You may be better off modeling the actual values, with the baseline value as a covariate.

SteveDenham

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for your reply Steve. As I've understood things so far, I would exponentiate the fixed effects estimates to obtain a ratio of rate ratios. So exponentiating a negative fixed effect estimate (e^Beta) provided by SAS would provide a percentage decrease in my intervention group vs. the control group. Is that how you understand it? I've wondered if there is a SAS option to ask SAS to exponentiate the fixed effects estimates for me. Do you know of one?

Additionally, I asked on a forum at one time about modeling the violent crime rates rather than using the log link, and I was told that doing that would risk imprecision in the error terms, and it was better to use the log link. Interested in your thoughts. Thank you.

Additionally, I asked on a forum at one time about modeling the violent crime rates rather than using the log link, and I was told that doing that would risk imprecision in the error terms, and it was better to use the log link. Interested in your thoughts. Thank you.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

There is a difference between obtaining a negative estimate for a parameter in the log space, and having a negative number as a result of calculating difference in difference. The former may arise if there is a decrease in the response variable considered as a whole within a parameter of interest (say age group), where you may get a smaller difference in difference for an older age cohort. My concern was what happens when a measurement for a subject is negative. In that case, trying to use a log link will result in a missing value for that subject as the log of a negative number cannot be calculated.

As far as the other (exponentiating the estimates) question, you can do that through the use of an LSMESTIMATE statement and the ILINK option.

SteveDenham

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.