Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: Does this PROC GLIMMIX Code make sense?

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-26-2017 08:16 PM
(1467 views)

Hi All,

I've been struggling to run PROC GLIMMIX on a very large dataset (70 variables and approx 130,000 records). It's a mixed effects model with nested random effects and a poission distribution. I have Records, nested in patients, nested in geographic regions. The big problem was trying to "class" by patients as there are so many of them it keps causing a memory error. After a lot of research I've found a code that gives me results, but I'm not totally confident that they are correct. In short, my code works but I can't tell if it works properly. If someone could take a look and see if my code makes sense it would be much appreciated.

Here's what I'm running:

```
proc glimmix data=work.dataset ic=q;
class geography patient_ID;
model outcome_var = a b c d / solution dist=poisson link=log;
random intercept / solution subject=geography;
random _residual_ / solution subject=patient_ID(geography);
covtest / wald;
nloptions tech=nmsimp;
```

Any isnights as to whether it looks like I can trust the results coming out of this code would be much appreciated.

Thanks so much.

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I think I don't like the second RANDOM statement because PATIENT_ID is not the "bottom" unit in the design (i.e. "residual" ); RECORD is. I don't know what you've tried, but this is where I would start:

```
proc glimmix data=work.dataset ic=q;
class geography patient_ID;
model outcome_var = a b c d / solution dist=poisson link=log;
random intercept / solution subject=geography;
random intercept / solution subject=patient_ID(geography);
run;
```

If there was evidence of overdispersion, you could add a scale parameter

```
proc glimmix data=work.dataset ic=q;
class geography patient_ID;
model outcome_var = a b c d / solution dist=poisson link=log;
random intercept / solution subject=geography;
random intercept / solution subject=patient_ID(geography);
random _residual_;
run;
```

or add an observation-level variance

```
proc glimmix data=work.dataset ic=q;
class geography patient_ID record;
model outcome_var = a b c d / solution dist=poisson link=log;
random intercept / solution subject=geography;
random intercept / solution subject=patient_ID(geography);
random intercept / solution subject=record(patient_ID geography);
run;
```

or switch from Poisson to negative binomial, or to a generalized Poisson.

Speculating wildly and noting that GLIMMIX allows the *random-effects* to be either classification or continuous (see the documentation for the RANDOM statement in GLIMMIX), you could try incorporating patient_ID (and possibly geography) as continuous effects, i.e., remove them from the CLASS statement. Extrapolating from the documentation for the GROUP option on the RANDOM statement, a continuous random-effect might execute more quickly and use less memory; but I am just guessing here. For sure, you'd have to sort the dataset correctly.

Whoa, 70 variables is a lot. Variables a, b, c, and d are incorporated as continuous variables in your example model, so you're doing linear regression. Lots of challenges: linearity (on the link scale), multicollinearity possibilities, influential observations. You don't identify the level at which these predictors are observed (geography, patient or record). Regression in a mixed model is also known as a random coefficients model; the models above are only random intercepts: they assume that slopes have no random variance. You could assess random slopes--theoretically. In practice, if you are already having memory problems, adding more covariance parameters to the estimation list is not going to make your modeling life easier.

If you try the continous random effects thing, I'd be curious to know how that works out.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Sld,

Thanks so much for your thoughtful reply, I’ll give some of these things a try and get back to you. Your first example was how I had the code written originally and gave me the memory error, I think it’s because I have approximately 80,000 different subjects in my study and that’s a lot of levels for GLIMMIX to handle. I’ve read anything over 1,000 can be tricky.

That being said, your point about the patient_ID not being the “bottom” unit is well taken.

I’ll play with some things and see what I can do in terms of adding patient_ID as a continuous effect and report back!

Stay tuned,

Rightcoast.

Thanks so much for your thoughtful reply, I’ll give some of these things a try and get back to you. Your first example was how I had the code written originally and gave me the memory error, I think it’s because I have approximately 80,000 different subjects in my study and that’s a lot of levels for GLIMMIX to handle. I’ve read anything over 1,000 can be tricky.

That being said, your point about the patient_ID not being the “bottom” unit is well taken.

I’ll play with some things and see what I can do in terms of adding patient_ID as a continuous effect and report back!

Stay tuned,

Rightcoast.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

My clients do not have big data sets, to put it mildly, so memory issues are not my forte. I'm intrigued by incorporating random factors as continuous rather than classification. And if that didn't work...there are folks out there that deal with memory issues and I would not hesitate to touch base with SAS Tech Support.

Good luck and have fun!

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.