I am attempting to fit a glimmix model for a binary outcome for a few million individuals. Each individual has 1 observation. Part of the problem is the individuals belong to families, and individuals in the family will have the same exposure, though not necessarily the same outcome. There are about half as many family units, though some families have 1 person, some 2 people, some 3, etc.
I initially tried a proc genmod like this:
proc genmod data = dataset descending;
class outcome (ref = "0") exposure (ref = "0") family/ param = Reference;
model outcome = exposure / dist = binomial link = logit;
repeated subject = family/ corr = UN;
estimate "exposure" exposure 1 / exp;
run;
The problem with this method is that I was getting the warning: The working correlation has been ridged with a maximum value to avoid a singularity. The number that was being ridged with would dependent on how the data was sorted (I always had the data sorted by family first, but sometimes I would have secondary sorting options). In term this would produce very different parameter estimates. While always in the same direction, but the strength of the relationship would vary quite a bit.
After reviewing the boards, I tried using a
proc glimmix data = dataset;
class exposure (ref = "0") family;
model outcome(event = '1') = exposure / solution dist = binomial link=logit;
random INTERCEPT /subject= family;
estimate "exposure " exposure 1 / exp;
run;
This runs for about 30 minutes before throwing the error: Model is too large to be fit by proc glimmix in a reasonable amount of time on this system. Consider changing your model.
I have tried a few more options in my glimmix.
nloptions technique = nrridg;
and
method = laplace and method = paplace empirical.
The same results. The part that troubles me is that I am on a system with 64 gbs of memory and a quad core processor. Watching windows task manager it seems that sas is using very little of the available resources: about 500 mbs of memory, CPU percent is about 14%, and Disk is less than 3% (with plenty of available hard drive space).
So some questions:
1. Can I get SAS to use more resources to run this procedure?
2. Can I alter the procedure to run?
3. Is there something else entirely you'd recommend?
In advance, thank you for all your help.
I agree with @Ksharp . It will be impossible to do with glimmix, because this procedure will estimate a parameter for each family.
It should be possible to estimate with generelized estimation equations (genmod+repeated) or GEE, because this method works in a non-parametric way.
I will though not use type=un, but rather type=cs, as cs assume same correlation between any two individuals within a familiy.
Run proc options and post your MEMSIZE setting. It is possible this is set way too low for what you are trying to do.
Millions obs is too big for GLIMMIX .
If you could , try PROC GEE or PROC GENMOD + REPEAT statement.
I agree with @Ksharp . It will be impossible to do with glimmix, because this procedure will estimate a parameter for each family.
It should be possible to estimate with generelized estimation equations (genmod+repeated) or GEE, because this method works in a non-parametric way.
I will though not use type=un, but rather type=cs, as cs assume same correlation between any two individuals within a familiy.
@SASKiwi Currently MEMSIZE=34359738368. The thing is that windows task manager tells me SAS is only using a fraction of the available memory.
@Ksharp and @JacobSimonsen I was originally trying a Genmod but the working correlation was being ridged, and my outcome depended on how the data was sorted. I always had familyID as the first level of sort, but second/third level sorts would alter my parameter estimates. Though updating the correlation matrix to CS allowed genmod to complete without ridging the correlation. That was an excellent catch, thank you. I was struggling to understand which matrix to go with, so I appreciate the suggestion immensely.
@IanS8 - Are you running 32 or 64-bit SAS? Most people run 64-bit SAS these days so it is probably the procedure's processing limits causing the problem.
I am running 64-bit SAS. Unfortunately, it seems like Glimmix is out of the question.
I realize this is from some months ago, but I'll add some insight I learned recently. The default method for determining degrees of freedom for many GLIMMIX models (containment) is extremely resource intensive. I had a model that would give the exact same out of resources error with this default method, but it would run with a different method. I wound up using ddfm = residual. I was told by SAS that with a large sample size, the differences between residual and containment would be negligible (e.g. an F test with a hundred thousand degrees of freedom isn't going to give a different p-value from an F test with a hundred thousand degrees of freedom plus a few).
Second, SAS often estimates how much resources you will need if it tried to estimate the model before actually trying. Your message that you don't have enough resources comes from this resource estimation, not from actually trying and running out. This is why your task manager shows very little resources being used. (It's annoying, I know! I spent weeks with tech support trying to get my system to use more than 3GB of RAM until we realized it wasn't actually trying the model.)
Lastly, even with GLIMMIX, I find that sort order matters. I don't fully understand why. My situation was a little more complicated, because it had a group= variable.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.