turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Proc Glimmix

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-15-2014 02:28 PM

Hi,

I am working on hierarchical models using PROC GLIMMIX. This is my first time working on this proc.

I tried following code but the log returned an error that says **"ERROR: Invalid or missing data."**

I am not sure if my code is incorrect ot there is anything missing in the dataset.

Also, what do Insert in place of n in th e=model statement.

** **

**proc** **glimmix** data= xxxxx ;

class hospid female dm dmcx htn_c aids alcohol ANEMDEF arth race1 ZIPINC_QRTL hosp_location h_contrl hosp_teach

bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc psych pulmcirc renlfail tumor

ulcer valve wghtloss cararrhythmia;

model died/**n** = female dm dmcx htn_c aids alcohol ANEMDEF arth race1 ZIPINC_QRTL hosp_location h_contrl hosp_teach

bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc psych pulmcirc renlfail tumor

ulcer valve wghtloss cararrhythmia / solution;

random intercept / subject=hospid;

**run**;

Any help will be greatly appreciate!

Thanks

Ashwini

Accepted Solutions

Solution

2 weeks ago

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-19-2014 03:01 PM

I still think the model is over-parameterized, so that for this particular dataset, there is a quasi-separation problem.

With 26 class variables, assuming they are all binary, you have ( I think) 2 to the 26th power possible cells in a cross-tab. That is a little over 67 million cells, and with only 400000 (really only 87000) cases, you are bound to have massive numbers of cells with no observations. Thus, when GLIMMIX is trying to get started, there isn't a way to estimate the variance due to hospid.

You are going to have to attack this in chunks, looking at smaller numbers of independent variables in multiple fits of the data.

Steve Denham

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ashwini_uci

01-15-2014 03:16 PM

Hi Ashwini,

That error indicates that something is missing in the dataset, and if you have the highlighted "n" in the model statement, then it is very likely that is what is missing.

So, if we can answer that question, then it will probably solve some of your problems. The events/trials syntax that you are using implies that you know how many events (in this case, died, assuming that is a summed value across the many class variables) and how many trials are available. For this it could be died plus lived, or if it is an intent to treat analysis, it could be number enrolled. Just be sure that the number reflects the cell "total".

With this many categorical variables, I would watch out for quasi-separation. You may need to collapse categories or eliminate some variables if the crosstabs reveal zeroes.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

01-15-2014 08:41 PM

Hi Steve,

Thanks for your response.

I read a little more and ran another code which is as follows:

**proc** **glimmix** data=library.nismicathcabg1_3;

class hospid;

model died (event=last) = female dm dmcx htn_c aids alcohol ANEMDEF arth race1 ZIPINC_QRTL hosp_location h_contrl hosp_teach bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc psych pulmcirc renlfail tumor ulcer valve wghtloss cararrhythmia / dist=binary link=logit ddfm=bw solution ;

random intercept / subject=hospid solution;

**run**;

It returned me incomplete output, where at the end it said "Did not converge".

I am attaching the output for your reference to the original post.

I am not sure why the output is incomplete and showing "did not converge".

Appreciate your advice !

Thanks

Ashwini

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ashwini_uci

01-16-2014 12:46 PM

The default number of iterations in GLIMMIX is twenty. I see that the objective function looks to be leveling out, but that there is still some iterating to do.

Add the following line to your GLIMMIX code:

nloptions maxiter=200;

If it still does not converge, try adding pconv=1e-6 to the PROC GLIMMIX statement.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

01-17-2014 06:21 PM

Hi Steve,

Thanks again for your advice. i tried both but to no avail. I still get the same output that has "Did not converge" in it.

I also tried using your suggestions in a very parsimonious model but that didn't help either.

Not sure what is wrong with the code...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ashwini_uci

01-20-2014 02:26 PM

I really don't think anything is wrong with your code. Could you please share the output, especially the iteration history part? I can be more helpful if I see how that is behaving.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

01-20-2014 04:35 PM

These are the codes

**proc** **glimmix** data=library.nismicathcabg1_3;

class hospid;

model died (event=last) = female dm dmcx htn_c aids alcohol ANEMDEF arth race1 ZIPINC_QRTL hosp_location h_contrl hosp_teach

bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc psych pulmcirc renlfail tumor

ulcer valve wghtloss cararrhythmia / dist=binary link=logit ddfm=bw solution ;

random intercept / subject=hospid solution;

nloptions maxiter=**200**;

**run**;

**proc** **glimmix** data=library.nismicathcabg1_3 pconv=**1e-6**;

class hospid;

model died (event=last) = female dm dmcx htn_c aids alcohol ANEMDEF arth race1 ZIPINC_QRTL hosp_location h_contrl hosp_teach

ulcer valve wghtloss cararrhythmia / dist=binary link=logit ddfm=bw solution ;

random intercept / subject=hospid solution;

**run**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ashwini_uci

01-22-2014 09:57 AM

You have a very large number of predictor variables in your model. I am guessing that the log likelihood function for your data and model is fairly flat in the region of the maximum value, so that the optimization method is bouncing around and not converging at the actual maximum. I suggest you try a much simpler fixed effect part of your model. Start with just a few predictor variables so that you can obtain convergence. Then add more predictor variables to see if you can find the circumstances where problems develop.

Other comments: GLMMs with a binary conditional distribution (your situation) can give biased results. You should try METHOD=LAPLACE in the procedure statement. You should also try fitting the model without a random effect, just to look at the parameter values (to give you a sense of the magnitude and sign of the coefficient estimates). This would be just for exploratory work (you should not ignore the random effect for the final analysis). You should be able to see that many of the predictors will likely not be significant (due to an overfit).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-24-2014 03:05 PM

In addition, presenting the iteration history helps me in interpreting/troubleshooting. Note that from iteration 5 onward, the objective function moves by a nearly fixed quantity (20 and a tiny fraction) and from iteration 13 on the tiny fraction goes away. This is a sign of an over parameterized model, with a collinearity problem. Some of these predictors are nearly completely confounded, such that the effective X matrix has only 17 linearly independent columns (I think that's right. Hopefully, @lvm will correct me if I messed this up), and the 20 redundant columns just keep adding on with each iteration.

So, use subject matter knowledge to boil the independent variables down to a reasonable list. If this is an exploratory study, you could try PROC GLMSELECT and use the LASSO option to get a handle on the independent variables. At least run PROC REG and check the collinearity diagnostics.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

01-24-2014 07:45 PM

Thanks @Ivm and Steve for your useful suggestions! I tried to minimize the number of predictor variables and I ran the following model. And I could get some output which seems right. But it gives me only estimates and p-values. I am attaching the output for your reference. Is there any way I can also get the odds ratio by using proc glimmix?

**proc** **glimmix** data=library.data;

class hospid female dm htn_c ANEMDEF arth race1 ZIPINC_QRTL bldloss chf chrnlung coag depress drug hypothy

liver lymph lytes mets neuro obese para perivasc pulmcirc renlfail wghtloss cararrhythmia;

model died (event='1') = female dm htn_c ANEMDEF arth race1 ZIPINC_QRTL bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc pulmcirc renlfail wghtloss cararrhythmia /dist=binary link=logit ddfm=bw solution;

random intercept / subject=hospid;

**run**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

03-18-2014 04:47 PM

Hello Steve,

As discussed in this post, I finally got the right code for Proc Glimmix and I have been using it thus far. Only today when I started using it on much larger dataset (over 250000 records) . It is the same code as I have posted above. The Proc Glimmix worked fine on the 100000 cases. But on the new dataset witha around 250000 cases, the output I am getting is not complete. Here is the code.I am not sure why it is not yielding a complete. Any help is greatly appreciated. I am also attaching the output for your reference.

**proc** **glimmix** data=library.nismi091011_gender5 ; *using oddsratio*/;

class hospid gender racenew ZIPINC_QRTL_1 dm_all cm_htn_c cm_ANEMDEF cm_arth cm_bldloss cm_chf

cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para

cm_perivasc cm_pulmcirc cm_renlfail cm_wghtloss cararrhythmia;

model died (event='1') = age gender racenew ZIPINC_QRTL_1 dm_all cm_htn_c cm_ANEMDEF cm_arth

cm_bldloss cm_chf cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para cm_perivasc cm_pulmcirc cm_renlfail cm_wghtloss cararrhythmia /oddsratio dist=binary link=logit ddfm=bwsolution;

random intercept / subject=hospid;

where pcionly=**1** and stemi=**0**;

weight discwt;

title 'Logi Reg in-hosp mortality vs gender in POST-PCI MI patients using "where" option with RACE +income for nonstemi with pcionly';

**run**;

**quit**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-24-2014 07:46 PM

Thanks @Ivm and Steve for your useful suggestions! I tried to minimize the number of predictor variables and I ran the following model. And I could get some output which seems right. But it gives me only estimates and p-values. I am attaching the output for your reference. Is there any way I can also get the odds ratio by using proc glimmix?

**proc** **glimmix** data=library.data;

class hospid female dm htn_c ANEMDEF arth race1 ZIPINC_QRTL bldloss chf chrnlung coag depress drug hypothy

liver lymph lytes mets neuro obese para perivasc pulmcirc renlfail wghtloss cararrhythmia;

model died (event='1') = female dm htn_c ANEMDEF arth race1 ZIPINC_QRTL bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc pulmcirc renlfail wghtloss cararrhythmia /dist=binary link=logit ddfm=bw solution;

random intercept / subject=hospid;

**run**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ashwini_uci

01-25-2014 08:12 PM

Just add the ODDSRATIO option on the model statement. See the documentation for suboptions, but this should give you what you want.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-18-2014 04:46 PM

Hello lvm,

As discussed in this post, I finally got the right code for Proc Glimmix and I have been using it thus far. Only today when I started using it on much larger dataset (over 250000 records) . It is the same code as I have posted above. The Proc Glimmix worked fine on the 100000 cases. But on the new dataset witha around 250000 cases, the output I am getting is not complete. Here is the code.I am not sure why it is not yielding a complete. Any help is greatly appreciated. I am also attaching the output for your reference.

**proc** **glimmix** data=library.nismi091011_gender5 ; *using oddsratio*/;

class hospid gender racenew ZIPINC_QRTL_1 dm_all cm_htn_c cm_ANEMDEF cm_arth cm_bldloss cm_chf

cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para

cm_perivasc cm_pulmcirc cm_renlfail cm_wghtloss cararrhythmia;

model died (event='1') = age gender racenew ZIPINC_QRTL_1 dm_all cm_htn_c cm_ANEMDEF cm_arth

cm_bldloss cm_chf cm_chrnlung cm_coag cm_depress cm_drug cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para cm_perivasc cm_pulmcirc cm_renlfail cm_wghtloss cararrhythmia /oddsratio dist=binary link=logit ddfm=bw solution;

random intercept / subject=hospid;

where pcionly=**1** and stemi=**0**;

weight discwt;

title 'Logi Reg in-hosp mortality vs gender in POST-PCI MI patients using "where" option with RACE +income for nonstemi with pcionly';

**run**;

**quit**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ashwini_uci

03-18-2014 05:00 PM

You may simply be running out of memory. What does the Log say?