BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ODY7
Fluorite | Level 6

PROC LOGISTIC often overrides all ref= statements when used with a multinomial model. Can SAS output results or a dataset indicating which reference category was selected for the outcome (dependent) variable? The highest category is generally selected by default, but it would be nice to have output to confirm this, particularly if one wishes to change the reference category and confirm that it has changed. For example, can this be determined using the outputted OUTMODEL dataset? The model in question is a (partial) proportional odds cumulative logit model with three different categories (although this applies to any predictor with at least three categories). This is SAS version 9.4M7.

 
Code is:
proc logistic data=datain  order=internal outest=testout_est outmodel=testout_model;
 		class outcome(ref=first) catvar1(ref=first) catvar2(ref=last) / param=ref ref=first; * param=effect is default, which compares average effect across all values. param=ref compares to reference category;
		model outcome(event='1') = catvar1 catvar2 / link=clogit unequalslopes=(catvar1);
	run;
This returns the following notes:
NOTE: The REF= option for the response variable is ignored.
NOTE: PROC LOGISTIC is fitting the cumulative logit model. The probabilities modeled are summed over the responses having the lower Ordered Values in the Response Profile table.

 

OUTMODEL output (testout_model). Some _MISC_ values have been changed slightly or replaced with xxx.

_TYPE__NAME__CATEGORY__NAMEIDX__CATIDX__MISC_
L    7
MNYYNYNNN   7
Goutcomeoutcome=00010
Goutcomeoutcome=10110
Goutcomeoutcome=202-10
Goutcome -1013
Goutcome -118
Goutcome -1235
Goutcome -1-2-16
Gcatvar1110-1
Gcatvar12111
Gcatvar1 -2-13
Gcatvar1 -2-2-6
Gcatvar21202
Gcatvar32212
Gcatvar4322-2
Gcatvar5 -3-13
Gcatvar6 -3-2-11
EInterceptE00xxx
EInterceptE01xxx
EEFFECTG001
EEFFECTX001
EEFFECTE00xxx
EEFFECTE01xxx
EEFFECTQ000
EEFFECTG103
EEFFECTX103
EEFFECTE10xxx
EEFFECTE11xxx
EEFFECTQ111
EEFFECTV 0xxx
EEFFECTV 1xxx
EEFFECTV 2xxx
EEFFECTV 3xxx
EEFFECTV 4xxx
EEFFECTV 5xxx
EEFFECTV 6xxx
EEFFECTV 7xxx
EEFFECTV 8xxx
EEFFECTV 9xxx
EEFFECTV 10xxx
EEFFECTV 11xxx
EEFFECTV 12xxx
EEFFECTV 13xxx
EEFFECTV 14xxx
EEFFECTV 15xxx
EEFFECTV 16xxx
EEFFECTV 17xxx
EEFFECTV 18xxx
EEFFECTV 19xxx
EEFFECTV 20xxx
X52 27232xxx
1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

First, you should never specify the response variable (OUTCOME) in the CLASS statement. Any options that you want to apply to the response levels should be specified in parentheses after the response variable in the MODEL statement. These are called the response variable options. You are fitting a cumulative logit model for an ordered response, so the only response level sorting and ordering are relevant. Neither the EVENT= option, which only applies to a binary response, nor the REF= option are relevant and are ignored. Since your response is ordinal, you should be concerned with whether the response levels are in proper ascending or descending order. The order being used is shown in the Response Profile table. For instance, if the response has levels High, Medium, and Low, you don't want the Response Profile table showing the response levels in the order Medium, Low, High. If the displayed order is not properly ascending or descending, you can use the ORDER= response variable option or you can create a format for the response whose values will sort properly. If they are in proper descending order but you want to model probabilities of higher response levels, then also add the DESCENDING response variable option. See Response Level Ordering in the Details section of the LOGISTIC documentation and this note.

View solution in original post

9 REPLIES 9
Ksharp
Super User

ods output   ClassLevelInfo=  ClassLevelInfo;
proc logistic data=sashelp.heart;
class bp_status sex;
model status=bp_status sex weight height;
run;

Ksharp_0-1718268443215.png

 

ODY7
Fluorite | Level 6

Apologies; I should have clarified that I am referring to the reference category of the outcome variable, not the predictor variables. I have edited this into the OP.

ODY7
Fluorite | Level 6

This was intended for a multinomial outcome (i.e. not binary). For example:

 

proc logistic data=sashelp.heart;
class bp_status sex;
model Chol_Status = bp_status sex weight height;
output out=want p=pred;
run;

Which shows values for each observation compared to a borderline and desirable response value. That makes me think that the other category, High, is the reference category.

 

PaigeMiller
Diamond | Level 26

In the output, the reference category will not have a parameter estimate. In this case, "Acura" does not have a parameter estimate. It will also show up in the Odds Ratio table where all the non-reference levels are compared to the reference level. It also shows up in Class Level Information output (I leave it as a homework assignment for you to look at the table and determine the reference level)

 

proc logistic data=sashelp.cars(obs=100);
    class make(ref='Acura');
    model origin=enginesize weight make;
run;

PaigeMiller_0-1718278925646.png

 

 

--
Paige Miller
SteveDenham
Jade | Level 19

The only way I know of specifying the reference level for the response variable is to shift to fitting a generalized logit to the multinomial distribution. If you go that way, you can specify any particular level of the response variable as the reference using the ref='  ' method. The source of the NOTE: regarding the reference category in the response variable is due to the link chosen. As I mentioned in the first sentence, you have to specify LINK=GLOGIT for it to apply the reference level. There are other PROCs that operate similarly (HPGENSELECT, GLIMMIX for example).

 

Just thought of another way, but it requires formatting the levels of the response variable. Just set up your format so that the level you want as the reference is either LAST (default) or FIRST (needs a REF=FIRST in either the MODEL or CLASS statement).

 

SteveDenham  

Ksharp
Super User


Since your Y variable Chol_Status is multinomial variable have three levels, then you fit TWO logistic models separatedly:

1)
where Chol_Status in ('borderline ' 'High');
model Chol_Status (event='borderline ')=
2)
where Chol_Status in ('desirable ' 'High');
model Chol_Status (event='desirable ')=

StatDave
SAS Super FREQ

First, you should never specify the response variable (OUTCOME) in the CLASS statement. Any options that you want to apply to the response levels should be specified in parentheses after the response variable in the MODEL statement. These are called the response variable options. You are fitting a cumulative logit model for an ordered response, so the only response level sorting and ordering are relevant. Neither the EVENT= option, which only applies to a binary response, nor the REF= option are relevant and are ignored. Since your response is ordinal, you should be concerned with whether the response levels are in proper ascending or descending order. The order being used is shown in the Response Profile table. For instance, if the response has levels High, Medium, and Low, you don't want the Response Profile table showing the response levels in the order Medium, Low, High. If the displayed order is not properly ascending or descending, you can use the ORDER= response variable option or you can create a format for the response whose values will sort properly. If they are in proper descending order but you want to model probabilities of higher response levels, then also add the DESCENDING response variable option. See Response Level Ordering in the Details section of the LOGISTIC documentation and this note.

SteveDenham
Jade | Level 19

For GLIMMIX and HPGENSELECT, you must specify the output variable in the CLASS statement if you are fitting a generalized logit link to a multivariate distribution. For LOGISTIC, it is as @StatDave says - don't put the output variable in the CLASS statement. I don't know why it isn't consistent.

 

SteveDenham

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1152 views
  • 6 likes
  • 5 in conversation