BookmarkSubscribeRSS Feed
bg
Calcite | Level 5 bg
Calcite | Level 5

Does anyone have experience with  running Proc mixed with large data sets with 3 levels of nesting?

I am working with a 10% random sample of a huge data set.

The 10% sample has 28000 people(id)  with measurements from 1 or 2 years.

The people are nested within 4600 providers who are nested with 1300 chains

How should I specify the random effects for proc mixed?

I have tried

random chain provider(chain);

repeated /subject=id(provider*chain);

and

random intercept/subject=chain;

random intercept/subject=provider(chain);

repeated/subject=id(provider*chain) type=cs;

and    other constructs

but get an error msg

ERROR: Integer overflow on computing amount of memory required. A request to allocate 8905.72M bytes of memory can not be honored.

NOTE: The SAS System stopped processing this step because of insufficient memory.

The model will run if I drop the variance component involving id.

Also I have been able to run the model    using  STATA .

Thanks for any advice!

6 REPLIES 6
SteveDenham
Jade | Level 19

This looks like something that PROC HPMIXED may be able to address. From the documentation:

"The HPMIXED procedure is designed to solve large mixed model problems by using sparse matrix techniques. A mixed model can be large in many ways: a large number of observations, a large number of columns in the X matrix, a large number of columns in the Z matrix, and a large number of covariance parameters. The aim of the HPMIXED procedure is parameter estimation, inference, and prediction in linear mixed models with large and/or matrices and many observations, but with relatively few covariance parameters."

I can't guarantee anything here, but it looks like this might be a tool to try.

Steve Denham

bg
Calcite | Level 5 bg
Calcite | Level 5

Thanks very much for the suggestion!

Some info in case anyone else is interested...

I was not able to run the analysis using PROC MIXED.

I was able to run the analysis using PROC HPMIXED.

The HPMIXED analysis ran and  used about 3 minutes of CPU time (and about 5 minutes real time)!

Using STATA xtmixed also took about 3 minutes of CPU time.

HPMIXED also produced the same covariance estimates as  STATA xtmixed (note STATA reports the SD's and SAS reports the variances ..the values from STATA (when squared) agree with the values from SAS to 5 decimal places...

The SAS random statement I used is

random chain provider(chain) id(provider*chain);

proctice
Quartz | Level 8

Does proc hpmixed work for a binary (0,1) outcome, because proc glimmix is not working for me.  I have a similiar situation with three random statements (one is a random residual statement). 

Also, can genmod gee models handle multiple levels of nesting.  I know it has a subcluster command, but I don't know how it works.  Thanks. 

bg
Calcite | Level 5 bg
Calcite | Level 5

Proc Hpmixed would not be appropriate for a binary(0,1) outcome.

The outcome should be continuous.

I have not used Genmod with more than one level of clustering.

Let me know if you figure out how to use the subcluster command.

Good luck!

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

If the dataset is too large for PROC GLIMMIX for binary data, you should try the %hpglimmix macro.

http://www.jstatsoft.org/v58/i08

This is written for large scale problems and nonnormal data. It is not a sas product (not distributed by sas), but a macro written for sas. It will take you some time to get used to the syntax (unless you used the old/obsolete %glimmix macro). You can  contact the senior author of the article for specific assistance.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

By the way, a program with

random chain provider(chain) id(provider*chain);

will run must faster and use less memory if written as:

random int provider id*provider / sub=chain;

These have equivalent meanings, but the later works better computationally.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 3945 views
  • 3 likes
  • 4 in conversation