Solved: Re: Proc Logistic Question

K_Wils15 · Posted 04-15-2021 12:27 AM

Hi,

I am having an issue with this problem where I can't set the reference group. I am trying to get Odds Ratios for my project. I have a variable called Timeline that is divided into four different timeframes: All of 2019, and the first three quarters of 2020. I am trying to compare the each of the three quarters of 2020 to all of 2019 to see if there are any notable differences. The first variable I wanted to compare is Ethnicity. Ethnicity is made up of three subgroups of Hispanic/Latino, Non-Hispanic, and Missing. This is the table I am hoping to fill in:

Here is what I have so far:

proc logistic data= caphs12 descending ;
class Ethnicity (ref='1') Timeline (ref='1') / param=ref ;
model ethnicity = timeline;
run;

However, I get this error in the log and I cannot fill in for a different reference group:

NOTE: The REF= option for the response variable is ignored.
NOTE: PROC LOGISTIC is fitting the cumulative logit model. The probabilities modeled are summed
over the responses having the lower Ordered Values in the Response Profile table.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.

Is there anything I can do here to change the reference group or am I thinking about this in the wrong way?

StatDave · Posted 04-15-2021 01:33 PM

So, people's responses to a question is the actual response variable you want to model. From what you say, it sounds like that that question has multiple possible values. If those values are ordered (like strongly disagree to strongly agree), then an ordinal model is reasonable - you do not need to turn it into a binary variable. If, for example, the question response has 5 ordered levels and you want to model the probability of the higher levels, then you could use code like the following. It will model the higher levels of QUESTION_1 (because of the DESCENDING option) and the model will contain both the main effects and the interaction of ETHNICITY and TIMEFRAME. The REF='1' options specify that 1 is the reference level for each of those predictors. Because they are reference levels, those levels have a 0 parameter estimate. Don't worry about that or the multiple intercepts. The LSMEANS statement will give you cumulative predicted probabilities (in the Mean column of the LSMEANS table) for the levels of ETHNICITY and the Differences table shows the odds ratios comparing the ETHNICITY levels. Note that you have to use the PARAM=GLM option in order to use the LSMEANS statement. You could use exactly the same code if you really want to use a binary version of your response variable.

proc logistic data=s12;
class Ethnicity(Ref ='1') timeframe(ref = '1') / param=glm;
model question_1(descending) = ethnicity timeframe ethnicity timeframe;
lsmeans ethnicity / ilink diff oddsratio cl;
run;

View solution in original post

sbxkoenk · Posted 04-15-2021 05:24 AM

The CLASS statement names the classification variables to be used as explanatory variables in the analysis. Response variables do not need to be specified in the CLASS statement.

You can use the

REFERENCE=’category’

as a response variable option in the left-hand side of the MODEL statement.

StatDave · Posted 04-15-2021 11:07 AM

Is ETHNICITY really your *response* variable - the thing you want your model to predict? Typically this would be used as a predictor. If there is some binary response that is the response you want to model, then that should appear on the left side of the equal sign. You could then compare the ETHNICITY levels with respect to the probability of the response event level. If ETHNICITY is truly your response variable, then it surely is not ordinal, but your code is fitting an ordinal response model. You should use the LINK=GLOGIT option to treat the response variable as nominal (unordered levels).

K_Wils15 · Posted 04-15-2021 12:02 PM

Thank you, yes I found the link=glogit last night and it did fix the ordinal issue. I am still having an issue maybe with the math side of it. Ok, So I redid the chart and I will focus on looking at ethnicity in how people responded to a survey question. However, I am trying to look at four different time frames. What would be the best way to go about doing this? I reprogrammed the survey question so it is a binary response. I tried the code below, I am just confused why timeframe 1 is omitted from the results and when I don't add a reference to the timeframe variable it will just use one.

proc logistic data= s12 descending ;
class Ethnicity (Ref ='1') timeframe (ref = '1')/ param=ref ;
model question_1 (ref='1') = ethnicity*timeframe / link=glogit;
oddsratio question_1;
run;

StatDave · Posted 04-15-2021 01:33 PM

So, people's responses to a question is the actual response variable you want to model. From what you say, it sounds like that that question has multiple possible values. If those values are ordered (like strongly disagree to strongly agree), then an ordinal model is reasonable - you do not need to turn it into a binary variable. If, for example, the question response has 5 ordered levels and you want to model the probability of the higher levels, then you could use code like the following. It will model the higher levels of QUESTION_1 (because of the DESCENDING option) and the model will contain both the main effects and the interaction of ETHNICITY and TIMEFRAME. The REF='1' options specify that 1 is the reference level for each of those predictors. Because they are reference levels, those levels have a 0 parameter estimate. Don't worry about that or the multiple intercepts. The LSMEANS statement will give you cumulative predicted probabilities (in the Mean column of the LSMEANS table) for the levels of ETHNICITY and the Differences table shows the odds ratios comparing the ETHNICITY levels. Note that you have to use the PARAM=GLM option in order to use the LSMEANS statement. You could use exactly the same code if you really want to use a binary version of your response variable.

proc logistic data=s12;
class Ethnicity(Ref ='1') timeframe(ref = '1') / param=glm;
model question_1(descending) = ethnicity timeframe ethnicity timeframe;
lsmeans ethnicity / ilink diff oddsratio cl;
run;

K_Wils15 · Posted 04-15-2021 04:41 PM

Thank you!