08-04-2013 04:22 AM
I need your help to understand something weird in the SAS output of the logistic regression.
I have an ordinal dependent variable with 3 levels: 0 - no outcome 1 - normal outcome 2 - double outcome
I am trying to fit a model, I have several independent variables, some discrete and some continuous, however, my main variable of interest is discrete with not less than 5 levels! This variable called "Type" represents 5 treatments, from which the last 2 are combinations of two of the first three treatments.
Before controlling for other variables, I wish to fit a simple model with this independent variable only. I ran a logistic regression with both SAS and JMP. The results were identical. My SAS code was simple:
proc logistic data=A descending;
class Type_ (ref='5') ;
model Y = Type_;
My main problem is, that the full model was not significant, however one of the dummy variables created by SAS is. And I don't understand why...I am attaching you some of my output. I attach the frequency histogram of both Y and Type(X). I attach a Mosaic plot JMP gave me, showing the two-way table (blue=no outcome, gray = normal outcome, red=double outcome - X axis is the categories of the IV), and I am attaching most of SAS's output of the logistic regression.
The P value of Type=1 is significant, however the model isn't. I understand why it can be significant, you can see that Type=1 has most red area, while Type=5, which is reference, doesn't have red at all. But why isn't the model significant then ? A chi square test and Fisher exact both were NOT significant.
My second question regarding this analysis is for when I added variable. Two continuous variables I added were very significant, making the entire model to be (and destroying the significant P value of Type=1). However, when I tried drawing a box-plots (reverse analysis) of the two variables vs. Y, I couldn't see any difference, I mean, the boxes were parallel, and perhaps the median was slightly different, nothing that catches the eye. How can I tell if the significant result is due to a large sample size, due to type I error, or real ? For t-test we have effect size, what shall I do here ?
Thank you in advance !
08-04-2013 01:21 PM
The test for statistical significance for all levels of TYPE_, which has four degrees of freedom (corresponding to its five levels), can be less statistically powerful than the Wald test of individual levels of TYPE_. Thus, one of the levels of TYPE_ can differ statistically significantly from zero, although a test that all levels do may not differ statistically significantly from zero. A chi-square test or a Fisher's exact test is also less statistically powerful than ordinal logistic regression because they do not account for the ordering of the dependent variable categories. Continuous variables may account better for the variation in the dependent variable than the category level, TYPE_=1, did. You should also add the option, PARAM=REF, to your CLASS statement to enable "reference" coding of the classification variable, TYPE_, since your program defaults to "effect" coding, which is much more difficult to interpret than "reference" coding.