Re: Random _residual_ (was Re: Specifying Random Effects)

researchstats · Posted 01-07-2021 12:04 PM

Hello,

I'm trying to build a mixed-effects model with the following vars:

Fixed effects - race, gender, age group, region, orthopaedic component, season, comorbidity.

Random effects - year of procedure, region, hospital npi, surgeon.

Outcome - 90-day readmission.

I'm not sure if I have the proper syntax to properly specify random effects. There is nesting (i.e. random intercept / subject = surg(npi region yr)), but each of these variables are also random effects. For the model I wish to build, does anyone have any suggestions on how they might code/specify random effects?

Code

proc glimmix data=in method=laplace;
class race (ref="1");
class gender (ref="1");
class Age_Group (ref="1");
class Region (ref="1");
class Component_Major (ref="1");
class Season (ref="1");
class Comorbidity (ref="1");
class yr;
class surg;
class npi;
class region;
model Read90 (event='1') = race gender Age_Group region component_major season Comorbidity / dist=binary link=logit solution oddsratio;
*yr, region, npi, surg are all random effects;
*surgeons clustered within hospitals clustered within regions clustered within years;
random intercept / subject = yr;
random intercept / subject = region;
random intercept / subject = npi;
random intercept / subject = surg;
random intercept / subject = surg(npi region yr);
lsmeans racekey gender Age_Group region component_major season Comorbidity /
oddsratio ilink diff cl adjust=bon;
run;

Thank you!!

ChrisNZ · Posted 01-17-2021 04:14 PM

Moved to statistics community.

High-Performance SAS Coding - Third Edition

SteveDenham · Posted 01-18-2021 08:26 AM

I would suggest using a single CLASS statement, although the code you have should work.

For the random effects, you have some redundancy that will likely lead to convergence issues. I would successively fit random intercepts using the following:

*surgeons clustered within hospitals clustered within regions clustered within years;
random intercept / subject = yr;
random intercept / subject = region(yr);
random intercept / subject = npi(region yr);
random intercept / subject = surg(npi region yr);

This assumes that different hospitals are measured in different regions in different years. If hospitals are repeatedly measured within region or year, these statements might be simplified.

Now it comes down to how much data you have per surgeon. With 7 fixed effects, you will likely need at least 70 observations (rule of thumb=10) to have a chance at convergence, with 10 hospitals, 10 regions and 10 years, for a rough total of around 7000 observations. Otherwise, you are almost certain to end up with a message in the output that the G matrix is not positive definite. You may also get a non-convergence message as the default number of iterations is 20. To get around that, try an NLOPTIONS maxiter=1000; statement.

SteveDenham

pdortho · Posted 01-24-2021 03:37 PM

Thanks for your response to the first post Steve. I am working on that same project with the original poster, and we are both new to multi-level modeling, so your response has been very helpful to us. I have a few follow up questions about our random effects specification.

We are working with a large health care claims database. We are measuring the same hospitals within each region every year. How should we handle year in the random statement, given that the same hospitals are measured in each year? I have a similar question regarding the surgeon variable. For the most part, surgeons are nested within hospitals, although some surgeons perform procedures at more than one hospital. So, would we have to specify surgeons the same way as year, since they are not completely nested within hospitals?

Thanks again for your help!

-Patrick

SteveDenham · Posted 01-25-2021 08:56 AM

I hope this makes sense. In my opinion, year is a repeated effect. Surgeon is the experimental/observational unit, nested within hospital, which in turn is nested within region. The subject for the repeated effect is surgeon within hospital, which accounts for those surgeons that have privileges at more than one hospital. I am assuming that there are no surgeons that have privileges in more than one region. If so, then the subject is surgeon within hospital by region. That gives something like the following code. Note that I have switched to a pseudo-likelihood method from the integrated Laplace method (see Stroup and Claassen, 2020, <doi.org/10.1007/s13253-020-00402-6> paywall, unless you are an ASA member).

proc glimmix data=in method=rspl;
class race (ref="1") gender (ref="1") Age_Group (ref="1") Region (ref="1")
 Component_Major (ref="1") Season (ref="1") Comorbidity (ref="1") yr surg npi region; 

model Read90 (event='1') = race gender Age_Group region component_major season Comorbidity / dist=binary link=logit solution oddsratio ddfm=bw;

random yr /residual type=ar(1) subject = surg(npi region);

random intercept / subject = region;
random intercept / subject = npi(region);
random intercept / subject = surg(npi region);

lsmeans racekey gender Age_Group region component_major season Comorbidity /
oddsratio ilink diff cl adjust=bon;
run;

Try running this. If there are odd degrees of freedom, etc. there are some changes that might need to be made. In particular, you may need to aggregate observations over surgeons within hospital so that a trials/events syntax can be used and thus utilize the better behaved binomial distribution over the binary.

SteveDenham

pdortho · Posted 01-25-2021 12:03 PM

That makes sense that year is a repeated effect. You are correct that surgeons will only have privileges in one region. I am glad we can use pseudo-likelihood as well, since that should run faster with our large dataset. We will give this model a try.

Thanks again for your response. You have helped us out a great deal.

Best regards,

Patrick

RosieSAS · Posted 01-25-2021 01:40 PM

When year was used as a repeated measure, should include year in model statement as a fixed effect, or another random statement as a random factor? As I know,

random yr /residual type=ar(1) subject = surg(npi region);

is equivalent to

random /residual type=ar(1) subject = surg(npi region);

because surg(npi region) was repeated measured on years, so no need to specify yr in this statement.

jiltao · Posted 01-25-2021 01:54 PM

This statement is not a valid syntax in PROC GLIMMIX --

random /residual type=ar(1) subject = surg(npi region);

You have to specify the random effect, such as --

random yr /residual type=ar(1) subject = surg(npi region);

or

random _residual_ / type=ar(1) subject = surg(npi region);

SteveDenham · Posted 01-25-2021 02:36 PM

Thanks @jiltao . And while both of these give similar tests, lsmeans and standard errors, they do NOT necessarily yield the same log likelihood and consequent different information critera. Anyone have any ideas why this might be the case?

SteveDenham

ps. I renamed the subject as I think we are about to hijack the original poster's thread.

jiltao · Posted 01-25-2021 02:46 PM

They should give you the same results unless you have missing values or the data is not sorted chronologically for each subject. When you specify RANDOM YR /....., the values for YR would identify the order of the repeated measures; when you specify RANDOM _RESIDUAL_ / .... without specifying the effect, the order in which the values appear in the data set determines the order of the repeated measures and can produce different results (that might not be what you expected) when you have missing values for some repeated measures or the data is not sorted by YR for a subject.

A general recommendation is to always specify the repeated effect, and this is true for PROC GLIMMIX and PROC MIXED.

RosieSAS · Posted 01-25-2021 04:45 PM

Thanks for correcting my code. However, I still have the question that yr is a random factor or a fixed factor? Should yr be include in the model. If yes, how?

's code didn't include yr as a random or fixed factor, except putting it in a RANDOM statement to specify R matrix structure.

The code

random yr /residual type=ar(1) subject = surg(npi region);

only specify the correlation between yrs from the same surg.

jiltao · Posted 01-25-2021 05:17 PM

You can put YR in the MODEL statement.

The fact that it is in the RANDOM statement does not necessarily always make it a (G-side) random effect. The following specification --

random yr /residual type=ar(1) subject = surg(npi region);

actually specifies that YR is an R-side random effect, or the repeated effect if it were a PROC MIXED program. It identifies the repeated effect (YR) and is not really entered into the model as a random effect (G-side). So you probably want to specify YR in the MODEL statement as a fixed effect.

researchstats · Posted 01-26-2021 10:04 AM

First, thank you all for your input, which has been both informative and helpful. We might end up removing 'region' as a random effect, but even with the code below, the model will not run. It does not produce an error, but rather runs for about 10 hours until our serves reboot. Here is the code:

proc glimmix data=add8 method=rspl;
class Region (ref="1"); class Season (ref="1"); class RaceKey (ref="1"); class gender (ref="1"); class agegrp (ref="1");
class Component_Major (ref="1"); class yr; class surg; class npi; class region;
model allreads90 (event='1') = region season racekey gender agegrp component_major wgtcci yr /
dist=binary link=logit solution oddsratio ddfm=bw;
*Specify random effects;
random yr /residual type=ar(1) subject = surg(npi);
random intercept / subject = npi;
random intercept / subject = surg(npi);
lsmeans region season /oddsratio ilink diff cl adjust=bon;
run;

Any suggestions on how to get the model to run? We did try memsize options and in the 2nd to last line of code we are only seeking ORs for two variables.

Thank you,

Patrick

SteveDenham · Posted 01-26-2021 10:34 AM

This may be a case where "rolling up" the data may be beneficial. Consider the variable "surg". Currently, I assume that this is a single record of allreads90 for each instance, where it is either a 1 or a zero, so that there are several observations per surgeon per hospital per year. Can you consolidate these to the events/trials syntax? This would have a numerator of the number of records with a 1 and a denominator of the number of records per surgeon per hospital per year. So, suppose surgeon A had three observations of allreads90 at Memorial Hospital in 2006. Two of these were allreads90=1 and one was allreads90=0. In the new data set events=2 and trials=3. Then fit the model with a binomial (not binary) distribution. The binary distribution in large datasets is notorious for long run times.

You can get from the individual records to the events/trials syntax using PROC MEANS, PROC SUMMARY or PROC SQL (I would recommend PROC SQL). There should be several examples around the internet on how to do this.

Also, I would really recommend using the STORE statement, and then post-processing lsmeans, odds ratios, etc. using PROC PLM. That way if you want additional info on other terms in the model, you wouldn't have to re-run the whole thing.

Another method would be to subsample your data so that the model will run on the subsamples and bootstrap the estimates.

SteveDenham.

SteveDenham · Posted 01-26-2021 08:20 AM

True @RosieSAS . I misspecified the model - YR should be included.

SteveDenham

Catch up on SAS Innovate 2026