10-31-2011 10:08 PM
I am trying to run the following program but keep facing foloowing warning messages
The information matrix is singular and thus the convergence is questionable.
Model Convergence Status |
Quasi-complete separation of data points detected. |
Warning: | The maximum likelihood estimate may not exist. |
Warning: | The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable. |
the progam is as below;
proc logistic data =library.nismicathcabg4 descending ;
class gender (ref= first) dm dmcx htn_c aids alcohol ANEMDEF arth race1(ref=first) income(ref=first)
bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc psych pulmcirc renlfail tumor
ulcer valve wghtloss cararrhythmia/param=ref;
model died= age gender dm dmcx htn_c aids alcohol ANEMDEF arth race1 income
bldloss chf chrnlung coag depress drug hypothy liver lymph lytes mets neuro obese para perivasc psych pulmcirc renlfail tumor
ulcer valve wghtloss cararrhythmia;
where diagcath=1 and stemi=0;
title 'Logi Reg in-hosp mortality vs gender in POST-diagcath MI patients using "where" option with RACE +income for nonstemi';
run;
quit;
Will highly apppreciate if someone can provide any suggestions, solutions and ideas for running the program without any of these warnings.
11-01-2011 07:44 AM
Simplify the model, or collapse some of the categories in your class variables. To see what is going on, look at a cross-tabulation of the data--I am willing to bet that there are enough sampling zeroes that you are facing quasi-separation.
Steve Denham
11-01-2011 09:47 AM
Usually, its either becoz of multicolinarity in your data set or the reason mentioned above by Steve. Your model seems to be overfitted try to build a more parsimonous model with less categorical variables. Also, try to check back for correlation amongst predictor variables included in your model.
Hope it helps.
11-01-2011 11:34 AM
In the spirit of the two who have already answered, you have an overparameterized model. Without knowing the sample size and the distributions that Steve suggests, it is difficult to make specific suggestions.
Harrell (in "Regression Modeling Strategies") has a rule of thumb of 15 persons in the smaller group PER DEGREE OF FREEDOM. It looks like you are running at about 40-45 d.f. so you need 600-700 observations in the smaller group to have a chance of a stabile model.
If you are replicating a published model, one approach might be to replace the variables that are in that model with the logit from that model (to adjust for the published impact on outcome) and then to just look at the variables of interest or that you have reason to believe don't behave as in the general model. That can reduce the DF significantly and make for a more stable model with smaller N's.
Doc Muhlbaier
Duke
11-01-2011 02:52 PM
All the categorical variables are dichotomous except race and income.Race has 5 levels and income has 4 levels.
I checked the co-relation matrix and I need to keep all the variables in the model. The sample size is 84000. As Steve has suggested, I will need to check the cross-tabulation of the data.