Programming the statistical procedures from SAS

Mixed model with large data set and 3 levels of nesting

Reply
Occasional Contributor bg
Occasional Contributor
Posts: 14

Mixed model with large data set and 3 levels of nesting

Does anyone have experience with  running Proc mixed with large data sets with 3 levels of nesting?

I am working with a 10% random sample of a huge data set.

The 10% sample has 28000 people(id)  with measurements from 1 or 2 years.

The people are nested within 4600 providers who are nested with 1300 chains

How should I specify the random effects for proc mixed?

I have tried

random chain provider(chain);

repeated /subject=id(provider*chain);

and

random intercept/subject=chain;

random intercept/subject=provider(chain);

repeated/subject=id(provider*chain) type=cs;

and    other constructs

but get an error msg

ERROR: Integer overflow on computing amount of memory required. A request to allocate 8905.72M bytes of memory can not be honored.

NOTE: The SAS System stopped processing this step because of insufficient memory.

The model will run if I drop the variance component involving id.

Also I have been able to run the model    using  STATA .

Thanks for any advice!

Respected Advisor
Posts: 2,655

Mixed model with large data set and 3 levels of nesting

This looks like something that PROC HPMIXED may be able to address. From the documentation:

"The HPMIXED procedure is designed to solve large mixed model problems by using sparse matrix techniques. A mixed model can be large in many ways: a large number of observations, a large number of columns in the X matrix, a large number of columns in the Z matrix, and a large number of covariance parameters. The aim of the HPMIXED procedure is parameter estimation, inference, and prediction in linear mixed models with large and/or matrices and many observations, but with relatively few covariance parameters."

I can't guarantee anything here, but it looks like this might be a tool to try.

Steve Denham

Occasional Contributor bg
Occasional Contributor
Posts: 14

Mixed model with large data set and 3 levels of nesting

Thanks very much for the suggestion!

Some info in case anyone else is interested...

I was not able to run the analysis using PROC MIXED.

I was able to run the analysis using PROC HPMIXED.

The HPMIXED analysis ran and  used about 3 minutes of CPU time (and about 5 minutes real time)!

Using STATA xtmixed also took about 3 minutes of CPU time.

HPMIXED also produced the same covariance estimates as  STATA xtmixed (note STATA reports the SD's and SAS reports the variances ..the values from STATA (when squared) agree with the values from SAS to 5 decimal places...

The SAS random statement I used is

random chain provider(chain) id(provider*chain);

Contributor
Posts: 26

Re: Mixed model with large data set and 3 levels of nesting

Does proc hpmixed work for a binary (0,1) outcome, because proc glimmix is not working for me.  I have a similiar situation with three random statements (one is a random residual statement). 

Also, can genmod gee models handle multiple levels of nesting.  I know it has a subcluster command, but I don't know how it works.  Thanks. 

Occasional Contributor bg
Occasional Contributor
Posts: 14

Mixed model with large data set and 3 levels of nesting

Proc Hpmixed would not be appropriate for a binary(0,1) outcome.

The outcome should be continuous.

I have not used Genmod with more than one level of clustering.

Let me know if you figure out how to use the subcluster command.

Good luck!

Valued Guide
Valued Guide
Posts: 684

Re: Mixed model with large data set and 3 levels of nesting

If the dataset is too large for PROC GLIMMIX for binary data, you should try the %hpglimmix macro.

http://www.jstatsoft.org/v58/i08

This is written for large scale problems and nonnormal data. It is not a sas product (not distributed by sas), but a macro written for sas. It will take you some time to get used to the syntax (unless you used the old/obsolete %glimmix macro). You can  contact the senior author of the article for specific assistance.

Valued Guide
Valued Guide
Posts: 684

Re: Mixed model with large data set and 3 levels of nesting

By the way, a program with

random chain provider(chain) id(provider*chain);

will run must faster and use less memory if written as:

random int provider id*provider / sub=chain;

These have equivalent meanings, but the later works better computationally.

Ask a Question
Discussion stats
  • 6 replies
  • 1827 views
  • 3 likes
  • 4 in conversation