turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Mixed model with large data set and 3 levels of ne...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-06-2011 04:11 PM

Does anyone have experience with running Proc mixed with large data sets with 3 levels of nesting?

I am working with a 10% random sample of a huge data set.

The 10% sample has 28000 people(id) with measurements from 1 or 2 years.

The people are nested within 4600 providers who are nested with 1300 chains

How should I specify the random effects for proc mixed?

I have tried

random chain provider(chain);

repeated /subject=id(provider*chain);

and

random intercept/subject=chain;

random intercept/subject=provider(chain);

repeated/subject=id(provider*chain) type=cs;

and other constructs

but get an error msg

ERROR: Integer overflow on computing amount of memory required. A request to allocate 8905.72M bytes of memory can not be honored.

** **

NOTE: The SAS System stopped processing this step because of insufficient memory.

The model will run if I drop the variance component involving id.

Also I have been able to run the model using STATA .

Thanks for any advice!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-07-2011 08:06 AM

This looks like something that PROC HPMIXED may be able to address. From the documentation:

"The HPMIXED procedure is designed to solve large mixed model problems by using sparse matrix techniques. A mixed model can be large in many ways: a large number of observations, a large number of columns in the **X **matrix, a large number of columns in the **Z **matrix, and a large number of covariance parameters. The aim of the HPMIXED procedure is parameter estimation, inference, and prediction in linear mixed models with large and/or matrices and many observations, but with relatively few covariance parameters."

I can't guarantee anything here, but it looks like this might be a tool to try.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

10-07-2011 01:12 PM

Thanks very much for the suggestion!

Some info in case anyone else is interested...

I was not able to run the analysis using PROC MIXED.

I was able to run the analysis using PROC HPMIXED.

The HPMIXED analysis ran and used about 3 minutes of CPU time (and about 5 minutes real time)!

Using STATA xtmixed also took about 3 minutes of CPU time.

HPMIXED also produced the same covariance estimates as STATA xtmixed (note STATA reports the SD's and SAS reports the variances ..the values from STATA (when squared) agree with the values from SAS to 5 decimal places...

The SAS random statement I used is

random chain provider(chain) id(provider*chain);

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-21-2011 06:59 PM

Does proc hpmixed work for a binary (0,1) outcome, because proc glimmix is not working for me. I have a similiar situation with three random statements (one is a random residual statement).

Also, can genmod gee models handle multiple levels of nesting. I know it has a subcluster command, but I don't know how it works. Thanks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to proctice

11-22-2011 05:24 PM

Proc Hpmixed would not be appropriate for a binary(0,1) outcome.

The outcome should be continuous.

I have not used Genmod with more than one level of clustering.

Let me know if you figure out how to use the subcluster command.

Good luck!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-28-2014 02:03 PM

If the dataset is too large for PROC GLIMMIX for binary data, you should try the %hpglimmix macro.

http://www.jstatsoft.org/v58/i08

This is written for large scale problems and nonnormal data. It is not a sas product (not distributed by sas), but a macro written for sas. It will take you some time to get used to the syntax (unless you used the old/obsolete %glimmix macro). You can contact the senior author of the article for specific assistance.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-28-2014 02:06 PM

By the way, a program with

random chain provider(chain) id(provider*chain);

will run must faster and use less memory if written as:

random int provider id*provider / sub=chain;

These have equivalent meanings, but the later works better computationally.