By the way, the output says you have 87,303 observations. You may have a problem with your datafile. It comes back tot he Log: what does it say?
The number of observations is right. I am working on a subset (87,303) of the whole sample (250,000) and after weighting this number of sub-sample goes up to almost 400,000.
Strange enough, none of the models that worked successfully on the earlier small dataset is working on this new larger dataset.
The log is showing me the warning as below:
WARNING: Obtaining minimum variance quadratic unbiased estimates as starting values for the covariance parameters failed.
Never encountered it before and not sure what it means. Do you have any idea what it exactly means and how to fix it?
Thanks for your time and help!
Ashwini
The default parameters for your covariance matrix (a single variance in your case) are not allowable, given your data.This problem can be difficult to solve. If you search for the the exact warning message in Google, you get plenty of hits. You can try specifying the initial variance value with:
parms (#);
where # is some constant for the initial guess of the variance. Try putting in the estimate you were getting with your smaller sample (before you had trouble). Or try a much smaller number or much larger.
I still think the model is over-parameterized, so that for this particular dataset, there is a quasi-separation problem.
With 26 class variables, assuming they are all binary, you have ( I think) 2 to the 26th power possible cells in a cross-tab. That is a little over 67 million cells, and with only 400000 (really only 87000) cases, you are bound to have massive numbers of cells with no observations. Thus, when GLIMMIX is trying to get started, there isn't a way to estimate the variance due to hospid.
You are going to have to attack this in chunks, looking at smaller numbers of independent variables in multiple fits of the data.
Steve Denham
I am pretty sure that Steve is right, your model is overparameterized. I suggest you start with a simple fixed-effects part of the model. You could have just a few predictor variables. Then add terms in steps to see when things blow up.
Thank you Steve and @lvm for your time and ideas abotu this particular problem.
I have already tried to run the model with bare minimum predictors and it just won't give me a complete output. And these are the models that worked just fine with the smaller number of cases. This is actaully the NIS-HCUP dataset. And now I have merged the 3 years worth data to get some valid results and I thought it would work fine with a large sample size. But unfortunately I am running into this particular warning. I shall keep trying..
Thanks again!
Ashwini
So it works with all of those independent variables on a smaller dataset? That is interesting.
Consider how the data are sorted. The sweep operator for matrix inversion can often be influenced by the data order, especially in these very complex models. If you "stacked" the 3 years, this may be the cause. Try resorting, and make sure the class statement and model statement reflect the sort order.
Steve Denham
Yes, it works with all of those independent variables on a smaller dataset..
Yes, I have "stacked" the 3 years of data
What do you exactly mean by making sure that the class statement and model statement reflect the sort order? I am not qute clear about this. Would you please elaborate on this?
Are you saying that you can't even fit an intercept-only model?
model died (event=last) =/ dist=binary link=logit solution ;
I am thinking that you have a problem with your new data file that is confusing GLIMMIX. There is no reason why this model could not be fitted.
The intercept model works and parsimonious models work too.
I am running total 10 models on different sub-samples from this data. For Instance..the sample with STEMI type of heart attack and those who received no procedure or STEMI with PCI- Stenting procedure and such.. Below are some examples..
And now each of these and rest of the similar models are running with a very few and different set of predictor variables. Out of total 10 models, only 5 worked after I removed race and income from them. I am not sure how I can get the rest 5 give me complete output.
proc glimmix data=library.nismi091011_gender5 ; *using oddsratio*/;
class hospid gender dm_all cm_htn_c cm_aids cm_alcohol cm_ANEMDEF cm_arth cm_bldloss cm_chf
cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para cm_perivasc cm_psych cm_pulmcirc cm_renlfail cm_wghtloss cararrhythmia;
model died (event='1') = age gender dm_all cm_htn_c cm_aids cm_alcohol cm_ANEMDEF cm_arth cm_bldloss cm_chf
cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para cm_perivasc cm_psych cm_pulmcirc cm_renlfail cm_wghtloss cararrhythmia /oddsratio dist=binary link=logit ddfm=bw solution;
random intercept / subject=hospid;
where noproc=1 and stemi=0;
weight discwt;
title 'proc glimmix for in-hosp mortality NSTEMI patients with no procedure';
run;
quit;
proc glimmix data=library.nismi091011_gender5 ; *using oddsratio*/;
class hospid gender dm_all cm_htn_c cm_aids cm_alcohol cm_ANEMDEF cm_arth cm_bldloss cm_chf
cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para cm_perivasc cm_psych cm_pulmcirc cm_renlfail cm_wghtloss cararrhythmia;
model died (event='1') = age gender dm_all cm_htn_c cm_aids cm_alcohol cm_ANEMDEF cm_arth cm_bldloss cm_chf
cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para cm_perivasc cm_psych cm_pulmcirc cm_renlfail cm_wghtloss cararrhythmia /oddsratio dist=binary link=logit ddfm=bw solution;
random intercept / subject=hospid;
where noproc=1 and stemi=1;
weight discwt;
title 'proc glimmix for in-hosp mortality in STEMI patients with no procedure ';
run;
proc glimmix data=library.nismi091011_gender5 ; *using oddsratio*/;
class hospid gender dm_all cm_htn_c cm_ANEMDEF cm_arth cm_bldloss cm_chf
cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para cm_perivasc cm_pulmcirc cm_renlfail cm_wghtloss cararrhythmia;
model died (event='1') = age gender dm_all cm_htn_c cm_ANEMDEF cm_arth
cm_bldloss cm_chf cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para cm_perivasc cm_pulmcirc cm_renlfail cm_wghtloss cararrhythmia /oddsratio dist=binary link=logit ddfm=bw solution;
random intercept / subject=hospid;
where pcionly=1 and stemi=0;
title 'proc glimmix for in-hosp mortality in POST-PCIonly NSTEMI patients ';
weight discwt;
run;
quit;
proc glimmix data=library.nismi091011_gender5 ; *using oddsratio*/;
class hospid gender dm_all cm_htn_c cm_ANEMDEF cm_arth cm_bldloss cm_chf
cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para cm_perivasc cm_pulmcirc cm_renlfail cm_wghtloss cararrhythmia;
model died (event='1') = age gender dm_all cm_htn_c cm_ANEMDEF cm_arth
cm_bldloss cm_chf cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para cm_perivasc cm_pulmcirc cm_renlfail cm_wghtloss cararrhythmia /oddsratio dist=binary link=logit ddfm=bw solution;
random intercept / subject=hospid;
where pcionly=1 and stemi=1;
weight discwt;
title 'proc glimmix for in-hosp mortality in POST-PCIonly STEMI patients ';
run;
quit;
Is there any way to collapse the data to the events/trial syntax, where dist=binomial can be used? I would collapse on hospid, but it is going to involve the independent variables. Out put from PROC FREQ crosstabs should give what you need. Many times this method can overcome starting value problems encountered with dist=binary.
Steve Denham
The hospid is the hospital ID variable with 700-800 levels, depending on the sub-sample. Not sure how I can collapse that. All the independent variables, except race and income are binary variables...
OK. To get events/trial syntax numbers, you need the number of events, at each hospid, for the cross-tabbed independent variables. Something like:
proc means noprint nway;
class (same as above);
var died;
output out=eventtrial;
run;
You would then have the counts in the _freq_ variable in this dataset. Combine the 0 and 1 counts to get the number of trials.
The covariate that most concerns me is age. If this is handled as a continuous variable, all of this will be for naught, I think, as there would be too few 'trials' for a given age-hospid combination. However, if reasonable discrete ranges for age can be used, then this should lead to an events/trial dataset that makes sense.
Steve Denham
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.