Does anyone have experience with running Proc mixed with large data sets with 3 levels of nesting?
I am working with a 10% random sample of a huge data set.
The 10% sample has 28000 people(id) with measurements from 1 or 2 years.
The people are nested within 4600 providers who are nested with 1300 chains
How should I specify the random effects for proc mixed?
I have tried
random chain provider(chain);
repeated /subject=id(provider*chain);
and
random intercept/subject=chain;
random intercept/subject=provider(chain);
repeated/subject=id(provider*chain) type=cs;
and other constructs
but get an error msg
ERROR: Integer overflow on computing amount of memory required. A request to allocate 8905.72M bytes of memory can not be honored.
NOTE: The SAS System stopped processing this step because of insufficient memory.
The model will run if I drop the variance component involving id.
Also I have been able to run the model using STATA .
Thanks for any advice!
This looks like something that PROC HPMIXED may be able to address. From the documentation:
"The HPMIXED procedure is designed to solve large mixed model problems by using sparse matrix techniques. A mixed model can be large in many ways: a large number of observations, a large number of columns in the X matrix, a large number of columns in the Z matrix, and a large number of covariance parameters. The aim of the HPMIXED procedure is parameter estimation, inference, and prediction in linear mixed models with large and/or matrices and many observations, but with relatively few covariance parameters."
I can't guarantee anything here, but it looks like this might be a tool to try.
Steve Denham
Thanks very much for the suggestion!
Some info in case anyone else is interested...
I was not able to run the analysis using PROC MIXED.
I was able to run the analysis using PROC HPMIXED.
The HPMIXED analysis ran and used about 3 minutes of CPU time (and about 5 minutes real time)!
Using STATA xtmixed also took about 3 minutes of CPU time.
HPMIXED also produced the same covariance estimates as STATA xtmixed (note STATA reports the SD's and SAS reports the variances ..the values from STATA (when squared) agree with the values from SAS to 5 decimal places...
The SAS random statement I used is
random chain provider(chain) id(provider*chain);
Does proc hpmixed work for a binary (0,1) outcome, because proc glimmix is not working for me. I have a similiar situation with three random statements (one is a random residual statement).
Also, can genmod gee models handle multiple levels of nesting. I know it has a subcluster command, but I don't know how it works. Thanks.
Proc Hpmixed would not be appropriate for a binary(0,1) outcome.
The outcome should be continuous.
I have not used Genmod with more than one level of clustering.
Let me know if you figure out how to use the subcluster command.
Good luck!
If the dataset is too large for PROC GLIMMIX for binary data, you should try the %hpglimmix macro.
http://www.jstatsoft.org/v58/i08
This is written for large scale problems and nonnormal data. It is not a sas product (not distributed by sas), but a macro written for sas. It will take you some time to get used to the syntax (unless you used the old/obsolete %glimmix macro). You can contact the senior author of the article for specific assistance.
By the way, a program with
random chain provider(chain) id(provider*chain);
will run must faster and use less memory if written as:
random int provider id*provider / sub=chain;
These have equivalent meanings, but the later works better computationally.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.