Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Proc Mixed Intiger Overflow

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-24-2019 06:33 PM
(1084 views)

Hello,

To start off, I am using the full version of SAS not the UE. I have reasonable processing resources i.e. 32g ram, a modern processor, and I have updated the config file to maximize the available ram for analysis.

In spite of these efforts, I continue to experience 'integer overflow' and am unable to process this proc mixed code due to insufficient memory.

I have 2 questions for the esteemed experts who frequent this forum.

1) Is there anything in my code that strikes you as particularly resource demanding that I might be able to modify?

2) Is it possible that elements of how I have structured the 2 time parameters are problematic?

This is part of a large, longitudinal analysis. Participants have from 1 to 14 assessment times. In the proc mixed equation both independent variables are time parameters. The first, age_curve, is the subject age at time of assessment. The second, time_discharge, is the time in years, to one decimal place, before a critical event i.e. discharge from one program to another.

I am running this analysis as a Change Point Model. The age_curve acts as a sort of control group and the time_discharge is for only those who fall out of the main program and are discharged. The dependent variable is a measure of functional independence (for those who are interested), I'll call that IND.

One thing that has me curious is if the decision to leave measurement occasions prior to the change point in with the age_curve variable is causing excessive memory demands. For example, subject x has an assessment at 1,2,3,4,5,6 years. I set the change point at 3.5 years and in so doing assessments 4,5 & 6 are assigned to the time_discharge variable while assessments 1,2 & 3 remain in the age_curve variable. Would splitting the subjects across these 2 variables be problematic in terms of processing demands?

I have data from about 150,000 participants, so roughly 4 times that many total observations as the average number of assessments is about 4.

proc mixed data=have noclprint noitprint covtest method=ml;

class IDVar;

model IND= age_curve time_discharge /solution notest;

random intercept age_curve time_discharge /type=un subject=IDVar gcorr;

run;

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

As it turns out, my split plot design has imbalanced groups. I was able to adequately address this imbalance by moving from 'containment' for the degrees of freedom to Fai-Cornelius (ddfm=satterth). I found this solution in *Schaalje et. al. Approximations to Distributions of Test Statistics in Complex Mixed linear Models using SAS PROC MIXED. (see attached, page 2)*

This solution was a radical improvement to the processing resource demands. This went from taking over 15 minutes to process 5% of my sample to running the entire sample in under 2 minutes. This was extremely helpful for my context as I am using an enumerative iteration process in the macro and running 70 iterations across 6 conditions i.e. I needed to run (now complete) the the entire sample 420 times. I also believe the results are more accurate using this new method.

Any further thoughts or feedback are most welcome!

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for the feedback, Rick.

I have dabbled with the HPMIXED approach and will continue educating myself about it.

Is it safe to infer from your comment that you don't see anything unusual in the syntax I posted? Also, do you have any thoughts about my decision to split the subject observations across the two time variables?

All feedback is welcome.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I think I will leave the statistical analysis of your design to others who have more experience with large-scale mixed models. My naive comments are:

1. it seems to me that specifying an unstructured covariance matrix for 150,000 participants is a time- and memory-intensive undertaking. 2. If the average number of repeated measurements per individual is only 4, it seems like you are fitting a lot of parameters with very little data. But perhaps I am misunderstanding your data.

Maybe an expert like @sld will be able to offer you her opinions.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Do all subjects have the same changepoint (e.g., 3.5) or does the changepoint differ among subjects?

Do you know what the changepoint is, or does it need to be estimated (i.e., is it a model parameter)?

Do all subjects have a changepoint (do all subjects experience a critical event)?

Can you reasonably expect that the distribution of IND (conditional on the predictors) is normal?

You've specified a random coefficients model with type=un that will estimate 3 variances (intercept, slope with age_curve, slope with time_discharge) and 3 covariances. That's a relatively small number of estimates, and you have a lot of data, so in and of itself, I don't think the random statement is overly ambitious.

But I am puzzled by the model statement, and I am not convinced that it does what I think you want (but see my questions above) and that the data structure is compatible with the model specification. That might be a cause of your problem.

You might also have a problem with a limited number of observations for some subjects. In a sense, the model statement is fitting a multiple regression to each subject, and estimation will not be well supported if a subject has only one (or a few) observations, and if it lacks enough observations for both pre- and post-event periods. So that might be a cause of your problem.

I suspect that your model specification needs more thought and revision.

I hope this helps.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks SLD.

The proc mixed syntax is part of a much larger macro that estimates various change points. The model is already running well using a subset of the data and findings are consistent with my hypothesis. My challenge is specific to getting the entire dataset to run without memory overflow.

I have a solid theoretical argument for eliminating participants with only one assessment although my understanding is that with ML estimation this shouldn't be necessary.

I have read that the containment method for degrees of freedom may be resource intensive, so I'll look at options. I also appreciate a previous comment about the unstructured covariance matrix contributing to resource demands. Examining these options within the HPMIXED approach sounds promising so I'll get after these options in the next day or two.

Stay tuned 🙂

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

As it turns out, my split plot design has imbalanced groups. I was able to adequately address this imbalance by moving from 'containment' for the degrees of freedom to Fai-Cornelius (ddfm=satterth). I found this solution in *Schaalje et. al. Approximations to Distributions of Test Statistics in Complex Mixed linear Models using SAS PROC MIXED. (see attached, page 2)*

This solution was a radical improvement to the processing resource demands. This went from taking over 15 minutes to process 5% of my sample to running the entire sample in under 2 minutes. This was extremely helpful for my context as I am using an enumerative iteration process in the macro and running 70 iterations across 6 conditions i.e. I needed to run (now complete) the the entire sample 420 times. I also believe the results are more accurate using this new method.

Any further thoughts or feedback are most welcome!

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.