BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ashwini_uci
Obsidian | Level 7

Hi,

I am working on hierarchical models using PROC GLIMMIX. This is my first time working on this proc.

I tried following code but the log returned an error that says "ERROR: Invalid or missing data."

I am not sure if my code is incorrect ot there is anything missing in the dataset. 

Also, what do Insert in place of n in th e=model statement.


proc glimmix data=  xxxxx ;

class hospid female dm dmcx htn_c  aids alcohol ANEMDEF arth race1   ZIPINC_QRTL hosp_location h_contrl hosp_teach

bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc psych pulmcirc renlfail tumor

ulcer valve wghtloss cararrhythmia;

model died/n = female  dm dmcx htn_c aids alcohol ANEMDEF arth race1 ZIPINC_QRTL hosp_location h_contrl hosp_teach

bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc psych pulmcirc renlfail tumor

ulcer valve wghtloss cararrhythmia / solution;

random intercept / subject=hospid;

run;


Any help will be greatly appreciate!


Thanks

Ashwini

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

I still think the model is over-parameterized, so that for this particular dataset, there is a quasi-separation problem.

With 26 class variables, assuming they are all binary, you have ( I think) 2 to the 26th power possible cells in a cross-tab.  That is a little over 67 million cells, and with only 400000 (really only 87000) cases, you are bound to have massive numbers of cells with no observations.  Thus, when GLIMMIX is trying to get started, there isn't a way to estimate the variance due to hospid.

You are going to have to attack this in chunks, looking at smaller numbers of independent variables in multiple fits of the data.

Steve Denham

View solution in original post

37 REPLIES 37
SteveDenham
Jade | Level 19

Hi Ashwini,

That error indicates that something is missing in the dataset, and if you have the highlighted "n" in the model statement, then it is very likely that is what is missing.

So, if we can answer that question, then it will probably solve some of your problems.  The events/trials syntax that you are using implies that you know how many events (in this case, died, assuming that is a summed value across the many class variables) and how many trials are available.  For this it could be died plus lived, or if it is an intent to treat analysis, it could be number enrolled.  Just be sure that the number reflects the cell "total".

With this many categorical variables, I would watch out for quasi-separation.  You may need to collapse categories or eliminate some variables if the crosstabs reveal zeroes.

Steve Denham

Ashwini_uci
Obsidian | Level 7

Hi Steve,

Thanks for your response.

I read a little more and ran another code which is as follows:

proc glimmix data=library.nismicathcabg1_3;

class hospid;

model died (event=last) = female  dm dmcx htn_c  aids alcohol ANEMDEF arth race1 ZIPINC_QRTL hosp_location h_contrl hosp_teach bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc psych pulmcirc renlfail tumor  ulcer valve wghtloss cararrhythmia / dist=binary link=logit ddfm=bw solution ;

random intercept / subject=hospid solution;

run;



It returned me incomplete output, where at the end it said "Did not converge".

I am attaching the output for your reference to the original post.

I am not sure why the output is incomplete and showing "did not converge".


Appreciate your advice !

Thanks

Ashwini


SteveDenham
Jade | Level 19

The default number of iterations in GLIMMIX is twenty.  I see that the objective function looks to be leveling out, but that there is still some iterating to do.

Add the following line to your GLIMMIX code:

nloptions maxiter=200;

If it still does not converge, try adding pconv=1e-6 to the PROC GLIMMIX statement.

Steve Denham

Ashwini_uci
Obsidian | Level 7

Hi Steve,

Thanks again for your advice. i tried both but to no avail. I still get the same output that has "Did not converge" in it.

I also tried using your suggestions in a very parsimonious model but that didn't help either.

Not sure what is wrong with the code...

SteveDenham
Jade | Level 19

I really don't think anything is wrong with your code.  Could you please share the output, especially the iteration history part?  I can be more helpful if I see how that is behaving.

Steve Denham

Ashwini_uci
Obsidian | Level 7

These are the codes

proc glimmix data=library.nismicathcabg1_3;

class hospid;

model died (event=last) = female  dm dmcx htn_c  aids alcohol ANEMDEF arth race1 ZIPINC_QRTL hosp_location h_contrl hosp_teach

bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc psych pulmcirc renlfail tumor

ulcer valve wghtloss cararrhythmia / dist=binary link=logit ddfm=bw solution ;

random intercept / subject=hospid solution;

nloptions maxiter=200;

run;


proc glimmix data=library.nismicathcabg1_3 pconv=1e-6;

class hospid;

model died (event=last) = female  dm dmcx htn_c  aids alcohol ANEMDEF arth race1 ZIPINC_QRTL hosp_location h_contrl hosp_teach

bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc psych pulmcirc renlfail tumor

ulcer valve wghtloss cararrhythmia / dist=binary link=logit ddfm=bw solution ;

random intercept / subject=hospid solution;

run;

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

You have a very large number of predictor variables in your model. I am guessing that the log likelihood function for your data and model is fairly flat in the region of the maximum value, so that the optimization method is bouncing around and not converging at the actual maximum. I suggest you try a much simpler fixed effect part of your model. Start with just a few predictor variables so that you can obtain convergence. Then add more predictor variables to see if you can find the circumstances where problems develop.

Other comments: GLMMs with a binary conditional distribution (your situation) can give biased results. You should try METHOD=LAPLACE in the procedure statement. You should also try fitting the model without a random effect, just to look at the parameter values (to give you a sense of the magnitude and sign of the coefficient estimates). This would be just for exploratory work (you should not ignore the random effect for the final analysis). You should be able to see that many of the predictors will likely not be significant (due to an overfit).

SteveDenham
Jade | Level 19

In addition, presenting the iteration history helps me in interpreting/troubleshooting.  Note that from iteration 5 onward, the objective function moves by a nearly fixed quantity (20 and a tiny fraction) and from iteration 13 on the tiny fraction goes away.  This is a sign of an over parameterized model, with a collinearity problem.  Some of these predictors are nearly completely confounded, such that the effective X matrix has only 17 linearly independent columns (I think that's right.  Hopefully, @lvm will correct me if I messed this up), and the 20 redundant columns just keep adding on with each iteration.

So, use subject matter knowledge to boil the independent variables down to a reasonable list.  If this is an exploratory study, you could try PROC GLMSELECT and use the LASSO option to get a handle on the independent variables.  At least run PROC REG and check the collinearity diagnostics.

Steve Denham


Ashwini_uci
Obsidian | Level 7

Thanks @Ivm and Steve for your useful suggestions! I tried to minimize the number of predictor variables and I ran the following model. And I could get some output which seems right. But it gives me only estimates and p-values. I am attaching the output for your reference. Is there any way I can also get the odds ratio by using proc glimmix?

proc glimmix data=library.data;

class hospid female dm htn_c  ANEMDEF arth race1   ZIPINC_QRTL bldloss chf chrnlung coag depress drug hypothy

liver lymph lytes mets neuro obese para perivasc pulmcirc renlfail  wghtloss cararrhythmia;

model died (event='1') = female  dm htn_c   ANEMDEF arth race1 ZIPINC_QRTL bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc  pulmcirc renlfail wghtloss cararrhythmia /dist=binary link=logit ddfm=bw solution;

random intercept / subject=hospid;

run;

Ashwini_uci
Obsidian | Level 7

Hello Steve,

As discussed in this post, I finally got the right code for Proc Glimmix and I have been using it thus far. Only today when I started using it on much larger dataset (over 250000 records) . It is the same code as I have posted above. The Proc Glimmix worked fine on the 100000 cases. But on the new dataset witha around 250000 cases, the output I am getting is not complete. Here is the code.I am not sure why it is not yielding a complete. Any help is greatly appreciated. I am also attaching the output for your reference.

proc glimmix data=library.nismi091011_gender5 ; *using oddsratio*/;

class hospid gender racenew ZIPINC_QRTL_1  dm_all   cm_htn_c cm_ANEMDEF cm_arth cm_bldloss cm_chf

cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para

cm_perivasc  cm_pulmcirc cm_renlfail  cm_wghtloss cararrhythmia;

model died (event='1') = age gender  racenew ZIPINC_QRTL_1  dm_all   cm_htn_c cm_ANEMDEF cm_arth

cm_bldloss cm_chf cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para cm_perivasc  cm_pulmcirc cm_renlfail  cm_wghtloss cararrhythmia /oddsratio dist=binary link=logit ddfm=bwsolution;

random intercept / subject=hospid;

where pcionly=1 and stemi=0;

weight discwt;

title 'Logi Reg in-hosp mortality vs gender in POST-PCI MI patients using "where" option with RACE +income for nonstemi with pcionly';

run;

quit;

Ashwini_uci
Obsidian | Level 7

Thanks @Ivm and Steve for your useful suggestions! I tried to minimize the number of predictor variables and I ran the following model. And I could get some output which seems right. But it gives me only estimates and p-values. I am attaching the output for your reference. Is there any way I can also get the odds ratio by using proc glimmix?

proc glimmix data=library.data;

class hospid female dm htn_c  ANEMDEF arth race1   ZIPINC_QRTL bldloss chf chrnlung coag depress drug hypothy

liver lymph lytes mets neuro obese para perivasc pulmcirc renlfail  wghtloss cararrhythmia;

model died (event='1') = female  dm htn_c   ANEMDEF arth race1 ZIPINC_QRTL bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc  pulmcirc renlfail wghtloss cararrhythmia /dist=binary link=logit ddfm=bw solution;

random intercept / subject=hospid;

run;

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Just add the ODDSRATIO option on the model statement. See the documentation for suboptions, but this should give you what you want.

Ashwini_uci
Obsidian | Level 7

Hello lvm,

As discussed in this post, I finally got the right code for Proc Glimmix and I have been using it thus far. Only today when I started using it on much larger dataset (over 250000 records) . It is the same code as I have posted above. The Proc Glimmix worked fine on the 100000 cases. But on the new dataset witha around 250000 cases, the output I am getting is not complete. Here is the code.I am not sure why it is not yielding a complete. Any help is greatly appreciated. I am also attaching the output for your reference.

proc glimmix data=library.nismi091011_gender5 ; *using oddsratio*/;

class hospid gender racenew ZIPINC_QRTL_1  dm_all   cm_htn_c cm_ANEMDEF cm_arth cm_bldloss cm_chf

cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para

cm_perivasc  cm_pulmcirc cm_renlfail  cm_wghtloss cararrhythmia;

model died (event='1') = age gender  racenew ZIPINC_QRTL_1  dm_all   cm_htn_c cm_ANEMDEF cm_arth 

cm_bldloss cm_chf cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para cm_perivasc  cm_pulmcirc cm_renlfail  cm_wghtloss cararrhythmia /oddsratio dist=binary link=logit ddfm=bw solution;

random intercept / subject=hospid;

where pcionly=1 and stemi=0;

weight discwt;

title 'Logi Reg in-hosp mortality vs gender in POST-PCI MI patients using "where" option with RACE +income for nonstemi with pcionly';

run;

quit;

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

You may simply be running out of memory. What does the Log say?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 37 replies
  • 8029 views
  • 12 likes
  • 3 in conversation