Solved: Re: Is it possible for SAS to output the reference category for PROC L...

ODY7 · Posted 06-13-2024 03:27 AM

PROC LOGISTIC often overrides all ref= statements when used with a multinomial model. Can SAS output results or a dataset indicating which reference category was selected for the outcome (dependent) variable? The highest category is generally selected by default, but it would be nice to have output to confirm this, particularly if one wishes to change the reference category and confirm that it has changed. For example, can this be determined using the outputted OUTMODEL dataset? The model in question is a (partial) proportional odds cumulative logit model with three different categories (although this applies to any predictor with at least three categories). This is SAS version 9.4M7.

Code is:

proc logistic data=datain  order=internal outest=testout_est outmodel=testout_model;
 		class outcome(ref=first) catvar1(ref=first) catvar2(ref=last) / param=ref ref=first; * param=effect is default, which compares average effect across all values. param=ref compares to reference category;
		model outcome(event='1') = catvar1 catvar2 / link=clogit unequalslopes=(catvar1);
	run;

This returns the following notes:

NOTE: The REF= option for the response variable is ignored.
NOTE: PROC LOGISTIC is fitting the cumulative logit model. The probabilities modeled are summed over the responses having the lower Ordered Values in the Response Profile table.

OUTMODEL output (testout_model). Some _MISC_ values have been changed slightly or replaced with xxx.

_TYPE_	_NAME_	_CATEGORY_	_NAMEIDX_	_CATIDX_	_MISC_
L					7
M	NYYNYNNN				7
G	outcome	outcome=0	0	0	10
G	outcome	outcome=1	0	1	10
G	outcome	outcome=2	0	2	-10
G	outcome		-1	0	13
G	outcome		-1	1	8
G	outcome		-1	2	35
G	outcome		-1	-2	-16
G	catvar1	1	1	0	-1
G	catvar1	2	1	1	1
G	catvar1		-2	-1	3
G	catvar1		-2	-2	-6
G	catvar2	1	2	0	2
G	catvar3	2	2	1	2
G	catvar4	3	2	2	-2
G	catvar5		-3	-1	3
G	catvar6		-3	-2	-11
E	Intercept	E	0	0	xxx
E	Intercept	E	0	1	xxx
E	EFFECT	G	0	0	1
E	EFFECT	X	0	0	1
E	EFFECT	E	0	0	xxx
E	EFFECT	E	0	1	xxx
E	EFFECT	Q	0	0	0
E	EFFECT	G	1	0	3
E	EFFECT	X	1	0	3
E	EFFECT	E	1	0	xxx
E	EFFECT	E	1	1	xxx
E	EFFECT	Q	1	1	1
E	EFFECT	V		0	xxx
E	EFFECT	V		1	xxx
E	EFFECT	V		2	xxx
E	EFFECT	V		3	xxx
E	EFFECT	V		4	xxx
E	EFFECT	V		5	xxx
E	EFFECT	V		6	xxx
E	EFFECT	V		7	xxx
E	EFFECT	V		8	xxx
E	EFFECT	V		9	xxx
E	EFFECT	V		10	xxx
E	EFFECT	V		11	xxx
E	EFFECT	V		12	xxx
E	EFFECT	V		13	xxx
E	EFFECT	V		14	xxx
E	EFFECT	V		15	xxx
E	EFFECT	V		16	xxx
E	EFFECT	V		17	xxx
E	EFFECT	V		18	xxx
E	EFFECT	V		19	xxx
E	EFFECT	V		20	xxx
X	52		27	232	xxx

StatDave · Posted 06-13-2024 10:47 AM

First, you should never specify the response variable (OUTCOME) in the CLASS statement. Any options that you want to apply to the response levels should be specified in parentheses after the response variable in the MODEL statement. These are called the response variable options. You are fitting a cumulative logit model for an ordered response, so the only response level sorting and ordering are relevant. Neither the EVENT= option, which only applies to a binary response, nor the REF= option are relevant and are ignored. Since your response is ordinal, you should be concerned with whether the response levels are in proper ascending or descending order. The order being used is shown in the Response Profile table. For instance, if the response has levels High, Medium, and Low, you don't want the Response Profile table showing the response levels in the order Medium, Low, High. If the displayed order is not properly ascending or descending, you can use the ORDER= response variable option or you can create a format for the response whose values will sort properly. If they are in proper descending order but you want to model probabilities of higher response levels, then also add the DESCENDING response variable option. See Response Level Ordering in the Details section of the LOGISTIC documentation and this note.

View solution in original post

Ksharp · Posted 06-13-2024 04:47 AM


ods output   ClassLevelInfo=  ClassLevelInfo;
proc logistic data=sashelp.heart;
class bp_status sex;
model status=bp_status sex weight height;
run;

ODY7 · Posted 06-13-2024 04:54 AM

Apologies; I should have clarified that I am referring to the reference category of the outcome variable, not the predictor variables. I have edited this into the OP.

Ksharp · Posted 06-13-2024 05:08 AM

ODY7 · Posted 06-13-2024 05:52 AM

This was intended for a multinomial outcome (i.e. not binary). For example:

proc logistic data=sashelp.heart;
class bp_status sex;
model Chol_Status = bp_status sex weight height;
output out=want p=pred;
run;

Which shows values for each observation compared to a borderline and desirable response value. That makes me think that the other category, High, is the reference category.

PaigeMiller · Posted 06-13-2024 07:42 AM

In the output, the reference category will not have a parameter estimate. In this case, "Acura" does not have a parameter estimate. It will also show up in the Odds Ratio table where all the non-reference levels are compared to the reference level. It also shows up in Class Level Information output (I leave it as a homework assignment for you to look at the table and determine the reference level)

proc logistic data=sashelp.cars(obs=100);
    class make(ref='Acura');
    model origin=enginesize weight make;
run;

--
Paige Miller

SteveDenham · Posted 06-13-2024 10:40 AM

The only way I know of specifying the reference level for the response variable is to shift to fitting a generalized logit to the multinomial distribution. If you go that way, you can specify any particular level of the response variable as the reference using the ref=' ' method. The source of the NOTE: regarding the reference category in the response variable is due to the link chosen. As I mentioned in the first sentence, you have to specify LINK=GLOGIT for it to apply the reference level. There are other PROCs that operate similarly (HPGENSELECT, GLIMMIX for example).

Just thought of another way, but it requires formatting the levels of the response variable. Just set up your format so that the level you want as the reference is either LAST (default) or FIRST (needs a REF=FIRST in either the MODEL or CLASS statement).

SteveDenham

Ksharp · Posted 06-13-2024 08:33 PM

Since your Y variable Chol_Status is multinomial variable have three levels, then you fit TWO logistic models separatedly:

1)
where Chol_Status in ('borderline ' 'High');
model Chol_Status (event='borderline ')=
2)
where Chol_Status in ('desirable ' 'High');
model Chol_Status (event='desirable ')=

StatDave · Posted 06-13-2024 10:47 AM

First, you should never specify the response variable (OUTCOME) in the CLASS statement. Any options that you want to apply to the response levels should be specified in parentheses after the response variable in the MODEL statement. These are called the response variable options. You are fitting a cumulative logit model for an ordered response, so the only response level sorting and ordering are relevant. Neither the EVENT= option, which only applies to a binary response, nor the REF= option are relevant and are ignored. Since your response is ordinal, you should be concerned with whether the response levels are in proper ascending or descending order. The order being used is shown in the Response Profile table. For instance, if the response has levels High, Medium, and Low, you don't want the Response Profile table showing the response levels in the order Medium, Low, High. If the displayed order is not properly ascending or descending, you can use the ORDER= response variable option or you can create a format for the response whose values will sort properly. If they are in proper descending order but you want to model probabilities of higher response levels, then also add the DESCENDING response variable option. See Response Level Ordering in the Details section of the LOGISTIC documentation and this note.

SteveDenham · Posted 06-13-2024 11:12 AM

For GLIMMIX and HPGENSELECT, you must specify the output variable in the CLASS statement if you are fitting a generalized logit link to a multivariate distribution. For LOGISTIC, it is as @StatDave says - don't put the output variable in the CLASS statement. I don't know why it isn't consistent.

SteveDenham

Is it possible for SAS to output the reference category for PROC LOGISTIC?

Re: Is it possible for SAS to output the reference category for PROC LOGISTIC?

Re: Is it possible for SAS to output the reference category for PROC LOGISTIC?

Re: Is it possible for SAS to output the reference category for PROC LOGISTIC?

Re: Is it possible for SAS to output the reference category for PROC LOGISTIC?

Re: Is it possible for SAS to output the reference category for PROC LOGISTIC?

Re: Is it possible for SAS to output the reference category for PROC LOGISTIC?

Re: Is it possible for SAS to output the reference category for PROC LOGISTIC?

Re: Is it possible for SAS to output the reference category for PROC LOGISTIC?

Re: Is it possible for SAS to output the reference category for PROC LOGISTIC?

Re: Is it possible for SAS to output the reference category for PROC LOGISTIC?

Catch up on SAS Innovate 2026