06-03-2013 06:48 AM
I have 90 data divided between four types of organism, lets say fruit. I've tested them all for something and decided they each have none of it, some of it or all of it, so I have three ordinal levels.
I would like to run a logistic regression to see if the type of fruit is significantly associated with level of "something".
So, I would normally run a logistic regression and Proc Logistic would use a cummulative logit function.
The problem I have is that I could only test six fruit at a time, so my data are stratified. When I add a Strata statement to the Proc Logistic model I receive this message in the log:
NOTE: Conditional logistic regression with polytomous response data is not supported.
So, what are my options?
I could collapse the data into Binary except that isn't really satisfactory in this particular case, since having "some of it" is treatable and not terminal, so it's useful to know if "some of it" is associated or not.
Could I take out the Strata statement and simply include a strata variable in my model? I'm wrestling with the underlying implications of that possibility - maybe I have been at this for too long because it is not immediately apparent to me if that would be legitimate or not. I suspect not, so...
Any advice will be welcome.
06-03-2013 08:54 AM
The STRATA statement in PROC LOGISTIC is used to define variables that identify matched sets of observations so that these matched sets can be analyzed using conditional logistic regression, not the usual unconditional logistic regression.
Given your description, include your "stratum" variable in the CLASS statement (specifying a reference level) and in the MODEL statement as an independent variable. Then specify your ordinal variable as the dependent variable in the MODEL statement, which PROC LOGISTIC will interpret as an ordinal variable under the default LINK option specification.
06-03-2013 10:43 AM
That was my first reaction when the stratified cummulative regression wouldn't work but I know my limits and understanding the mechanics of SAS is among them so it's best to check.
06-27-2013 09:24 AM
Okay, me again, still working on this analysis.
So, I have four types of fruit: a, b, c and d.
Initially the reference fruit was d so, after I noted the p-values and ORs with 95% CIs for a:d, b:d, and c:d, I wanted to see if a is significantly different to b and c, and if b is different to c, so I changed the fruit around so the four types would still be a, b, c and d, but the letters would designate two of the fruit differently from the first analysis (I swapped b and d around, then later swapped c and d).
So, my p-value for apples vs. oranges was originally 0.0186 but when I swapped the letters around the value for oranges vs. apples was 0.0005, which is the difference between reporting a finding or not. In total I have 6 statistical tests (a:d, b:d, c:d, a:b, a:c, b:c) so I will accept significance at 0.01 instead of 0.05 (I figure the risk of a false positive with six independent tests at 1% are roughly equal to one test at 5%).
I've checked the data carefully, and there appears to have been no errorwhen I changed the letters around. There are only 90 data so visually checking is quick and easy.
Of course, I could just include two fruit at a time in the model and run six pairwise logistic regressions but that wouldn't explain why the values should change in a four-fruit model when I merely swap the reference levels around.
Does anyone have any advice on how I might proceed, or suggestions regarding the cause of this? It seems odd to me.
06-27-2013 04:31 PM
I don't know why you obtained the output you received. In this output, check the table, "Class Level Information", to be sure that you have parameterized the independent, classification variable, FRUIT, properly. Unfortunately, the default parameterization in PROC LOGISTIC is effect coding, which is often difficult to interpret. Instead, use reference cell coding instead, as described next. Use the CLASS statement of PROC LOGISTIC to specify the variable, FRUIT, as a classification variable with reference cell coding [PARAM=REFERENCE): PROC LOGISTIC RORDER=. . . .; CLASS FRUIT(PARAM=REFERENCE ORDER=FORMATTED) . . .; MODEL RESPONSE=FRUIT . . . ; The RORDER option of the PROC LOGISTIC statement specifies the ordering of the response/dependent variable, RESPONSE. The ORDER option associated with the classification variable, FRUIT, specifies the order of this independent variable in the model (including its reference level and type of ordering). If necessary, you can specify this reference level in a PROC FORMAT VALUE statement and include a FORMAT statement in the PROC LOGISTIC paragraph. Assuming that FRUIT is a character variable, PROC FORMAT; VALUE $FRUIT "A"="Fruit A" "B"="Fruit B" "C"="Fruit C" "D"="} Fruit D (reference)"; RUN; The right brace, "}", in the PROC FORMAT VALUE statement "sorts" the values of FRUIT so that level D becomes the last level, making it the reference category. Then a FORMAT statement in PROC LOGISTIC applies this FORMAT value to the character-valued classification variable, FRUIT: FORMAT FRUIT $FRUIT.; Changing the position of this right brace in the VALUE statement allows you to specify different levels of the FRUIT variable as the reference category. Then, use the CONTRAST statement to specify the specific contrasts you want to make among the four different kinds of FRUIT, including comparing one kind against another kind, one kind against two other kinds, one kind against three other kinds, or two kinds against the other two kinds. For example, if D were the reference-level fruit, a contrast of fruit A vs. the combination of fruits B and C using reference cell coding defined in the CLASS statement would be something like the following: CONTRAST "A vs. B+C" FRUIT 2 -1 -1 / E ESTIMATE-BOTH; You can include multiple CONTRAST statements in the same run of PROC LOGISTIC.
06-28-2013 05:32 AM
Wow, that's really a very detailed answer, thank you for taking the time to reply so comprehensively.
I will spend the afternoon experimenting with this syntax to ensure it removes the anomoly.
FRUIT was indeed in the CLASS statement but I had not specified PARAM=REFERENCE;
Once again, many thanks
02-18-2014 04:13 AM
Hi again ("rebonjour" they say here)
Can someone point me to a paper describing how to interpret the two intercepts that are produced when regressing to a trinary outcome variable?
I would like to put the effect sizes into the model and estimate probabilities of outcomes 0, 1 or 2. How the two intercepts fit into the final model equation (or equations if there are two) is what I am after.
Something I can cite would be handy (as a bonus, but not essential).
All the best
02-18-2014 01:25 PM
The two intercepts represent the cumulative approach. The first is outcome 0 vs 1 OR 2, with all predictors at the zero level, the second is outcome 0 or 1 (at least 1) vs 2, again with all predictors at the zero level.
04-10-2014 04:22 AM
Thanks for that advice. I’m about to submit our paper and I wondered if you would mind if we acknowledge your assistance? Please e-mail your name and affiliation to me at email@example.com and I’ll simply thank you for your ‘statistical advice”. I am happy to send you our draft before I submit, in case you would rather not be associated with it.
Next question, now the guy down the end of the hall has discovered I can handle polychotomous ordinal outcomes, he has asked me to analyse his data which have five ordinal outcomes. I’ve run the lot through proc logistic and hey, good times, the group you are in is significantly associated with your outcome (p=0.01). Thing is, now I have four intercepts.
Great! (I love this job) – ha ha.
So, my next question is; how do I interpret these intercepts?
The model is very simple, outcome=group
Thanks again, and I hope someone can help me out.
04-12-2014 12:24 PM
Actually, I think I've figured it out myself.
I can compare each of the five outcomes against the others grouped together in five binary logistic regressions.
Then I can compare the ten combinations of two vs. three.
Then I will have fifteen intercept values and I can "match" four of them to the four intercepts from the cumulative logit model (they should be close enough to recognise which is which) and then I will 'know' what each intercept in the cumulative model relates to. From that I can estimate the size of the effect of moving up a group in terms of probability of each outcome.
Message was edited by: Peter Buzzacott
04-14-2014 03:29 AM
Hmmm.... many of the intercepts appear to be similar, so that didn't work.
Okay, trying to figure out how the cumulative ligit works then...
04-14-2014 08:15 AM
Hoo-ray, I've finally figured it out over lunch.
There are 5 outcome states, let's say 1-5
alpha1 is for the logit of the probability of outcome 1
alpha2 is for the logit of the probability of outcomes 1 or 2
alpha3 is for outcomes 1, 2 or 3
alpha4 is for outcomes 1-4
and the probability of outcome 5 is simply 1-(probability of 1-4) , found by using alpha4.
Pretty much an extrapolation of Steve's answer above.
I'll calculate the probabilities and double-check they sum to 1 for each line of data.
02-27-2014 03:51 PM
It may be simpler to take a nonmodeling approach if you just have one predictor (type of fruit). This can be done with the CMH option in PROC FREQ. For example:
table strata*FruitType*response / cmh noprint;
Of the three CMH statistics that are produced, the second one (Row mean scores differ) treats the row variable (FruitType) as nominal and the response as ordinal.
04-10-2014 04:19 AM
Interesting, I will look into that.
Also, I should find out how to get notified when someone replies. Currently, it's rather a coincidental affair.