Hi All,
I posted a similar question a little while ago, but I think I can articulate the problem better now so I thought I'd try again. I've managed to pick myslef a doozy of a model, and I'm hoping that someone can give me some insight. I'm running a regression on a population of approximately 75,000 individuals with a total of approximately 150,000 total records (multiple records per individual) . These individuals are also nested within postal codes.
here's the code I'm running:
proc glimmix data=(data) ic=q; class Postal_code study_id catvar1 catvar2 catvar3; model (binary outcome variable) = catvar1 catvar2 catvar3 fixedvar1.....fixedvar20 / offset=logexposure solution dist=possion link=log ddfm=satterthwait; random intercept / solution subject=postal_code ; random _residual_ / solution subject=study_id(postal_code) ; covtest/ wald; nloptions tech=nrridge; run;
A couple of notes:
1. I'm runnining _residual_ on the second random intercept statement because I kept getting memory errors with the "intercept command" I'm guessing it's because I have so many levels in the patient_id class. I think this should be fine, but I'm open to other solutions.
2. The model actually converges fine without the offset, but as soon as I include it something breaks and I no longer have convergence.
I've been struggling to get this right for over a week now. Any insight into a) whether the _residual command is a legit way to get around the memory issues, or b) why the offset might be affecting convergence would be very much appreicated.
Thanks so much.
Mike
For those who are interested, the previous post is here:https://communities.sas.com/t5/SAS-Statistical-Procedures/Does-this-PROC-GLIMMIX-Code-make-sense/m-p...
Something old/something new thoughts:
1. I still don't think the second RANDOM statement is appropriate.
2. If the response variable is binary, why specify DIST=POISSON instead of BINARY? I'm guessing that the response variable is actually a count?
3. Have you tried using study_id as a continuous variable with the dataset sorted appropriately? If so, how did that work out? I'm actually quite curious about that....
4. Have you tried METHOD=LAPLACE on the PROC GLIMMIX statement? Sometimes that option gives me convergence when I didn't have it before; sometimes it provides awful results.
5. If the response variable is a count, perhaps it does not follow a Poisson distribution. Overdispersed? Zero-altered?
Hi Mike,
1. Oh, yes, this is an excellent reparameterization trick! To achieve the same model, you must add TYPE=CS
random _residual_ / solution subject=study_id(postal_code) type=cs;
Here's a demo
data demo2;
call streaminit(87564);
mu = 5;
array aa[3] _temporary_ (1 0.5 0);
do b= 1 to 20;
var_b = rand("normal",0,2);
do c= 1 to 10;
var_c = rand("normal",0,1);
do a= 1 to 3;
y = mu + aa[a] + var_b + +var_c + rand("normal",0,1);
output;
end;
end;
end;
run;
proc glimmix data=demo2;
class a b c;
model y = a / solution ddfm=kr2;
random intercept / subject=b;
random a / subject=b;
random intercept / subject=c(b);
run;
proc glimmix data=demo2;
class a b c;
model y = a / solution ddfm=kr2;
random intercept / subject=b;
random a / subject=b;
random _residual_ / subject=c(b) type=cs;
run;
2. The original variable with multiple, non-ordered categorical outcomes strikes me as a multinomial variable, not a count variable; I don't think that the Poisson distribution was ever pertinent. When you compile the various outcomes into event/nonevent, you've created a variable with a binary (Bernoulli) distribution; your model now is a binary logistic regression. Using an offset with a binary distribution to accommodate varying exposures is not as straightforward as replacing "poisson" with "binary". One approach is to use the complementary log-log link; see https://www.r-bloggers.com/modelling-occurence-of-events-with-some-exposure/ and https://rpubs.com/bbolker/logregexp, or search on "logistic regression offset" for more hits.
3. Too bad. I notice that this approach is presented on p 3 of the Kiernan et al paper you link to in your response. But maybe once you fix the distribution specification, you might try it again with more success.
Let me know what works and doesn't!
Did you ever finalize this model? I'm having a similar problem with a 3-level model nesting hospital admissions (n=32,000) within patients (n=20,000) and patients within hospitals (n=9).
If you are still having issues with convergence or memory if you can provide the data set I can see if there are any recommendations that will help your model run without errors. Please keep in mind you may also contact Technical Support for help when you encounter convergence or out of memory errors and we will try to provide recommendations.
Kathleen
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.