BookmarkSubscribeRSS Feed
greenie
Obsidian | Level 7

hi I want to put age, height, BMI, race, and history of hormone therapy into my logistic regression model, how should I write the code?

age, height are numerical

BMI

1="0-18.5" 2="18.5-25" 3="25-30" 4="30+" 5=missing 

race I set as:

NHW =(race = 1);

NHB =(race = 2);

hisp =(race = 3);

otherrace = (race = 4) and (race = 5) and (race= 6);

and I want NHW as reference.

hormone

.F="No Form" .G="Wrong Gender" .M="Not Answered" 0="No" 1="Yes" 2="Don't Know" 

and I set hormone as dummy variables

hormone0 =(horm_f = 0);/*No*/

hormone1 =(horm_f = 1);/*Yes*/

 

So my codes for logistic regression:

proc logistic data = e.tmp;
where em_f in (0,1);
model emf (event = '1')= lm_cat age height BMI NHB hisp otherrace hormone0 hormone1/cl;
run;

But the results showed:

Note:

The following parameters have been set to 0, since the variables are a linear combination of other variables as shown

otherrace =

0

 

I don't know why. Can anyone help me?

5 REPLIES 5
ballardw
Super User

With SAS you and Proc Logistic, indeed many regression procedures, you do not need to "set dummy variables". Categorical variables belong on a CLASS statement. SAS will create any internal dummies needed for calculations.

 

You would indicate which is the reference level you want on the CLASS statement.

The CLASS statement must come before the model statement.

 

 

For your "race" variable I would suggest creating a custom format and use that for creating a group like "otherrace"

 

proc format;
value myrace
1='NHW'
2='NHB'
3='Hisp'
4,5,6 = 'Other race'
;

Proc logistic data=e.tmp;
   where em_f in (0,1);
   class race (ref='NHW')
         horm_f
         bmi
   ;
   format race myrace. ;
   model emf (event = '1') = lm_cat age height race horm_f bmi /cl
   ;
run;

Variables on the Class statement, if you want to specify the reference level use the FORMATTED value in the (Ref= ) option.

Since I don't have your formats are data I can't test some possible problems in the code you posted.

Such as the above code assumes there is a variable named EM_F that is not the variable on the Model statement.

One might guess that your LM_CAT variable is also some sort of categorical variable. If so it also likely belongs on the CLASS statement.

 

 

The particular error you show I would expect to see if you actually had included the NHW variable you describe on the model statement.

The Otherrace is indeed dependent on the other "race" variables you created. The way you created them otherrace would be 1 only when all the others are 0, and 0 only when one of the others is a 1. So it is a linear combination of the other race variables.

 

mkeintz
PROC Star

You have 4 race categories, but only 3 degrees of freedom among them.  If you know the values of NWH, NWB, and HISP you automatically know the value for OTHERRACE - or more generally if you know three of the race dummies, you know what the fourth must be.

 

It appears that proc logistic is implicitly setting OTHERRACE as the reference condition, and all the race-variable beta's will be with respect to otherrace.

 

You could drop otherrace from the model specification, which should eliminate the note, but get the same parameter estimates.

 

But if you want NWH as the reference, drop it instead and keep the other three.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
greenie
Obsidian | Level 7

Thank you. Yeah I want to set NHW as the ref. So I didn't include NHW in my model, but I keep other three there (NHB, hisp, Otherrace). I think SAS would take NHW as ref, but why it takes Otherrace as ref?

mkeintz
PROC Star

@greenie wrote:

Thank you. Yeah I want to set NHW as the ref. So I didn't include NHW in my model, but I keep other three there (NHB, hisp, Otherrace). I think SAS would take NHW as ref, but why it takes Otherrace as ref?


I'm just guessing but I suppose that proc logistic takes the rightmost predictor listed as the reference when that predictor is a linear combination of predictors to its left.

 

This is the thing about SAS and many other computer programming tasks.  The computer is always ready for you to experiment to answer such questions as this.

 

But frankly I think you might be better off using @ballardw's suggestion, because its syntax allows you to clearly specify the reference category, and also avoid SAS needing to notify you that you have a linear relationship among the 4 final categories.

 

My comments were more directed at WHY you got that note.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
FreelanceReinh
Jade | Level 19

@greenie wrote:

otherrace = (race = 4) and (race = 5) and (race= 6);


If this is really how you coded otherrace, i.e., with ANDs rather than ORs, then this "predictor" is constantly zero. Which I think is what the log means by


Note:

The following parameters have been set to 0, since the variables are a linear combination of other variables as shown

otherrace =

0

 


(0 as a trivial "linear combination" of zero or more variables).

 

As @ballardw wrote, you should really use the CLASS statement for the categorical predictors. It was a great improvement of PROC LOGISTIC when this statement was introduced in SAS version 8.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1428 views
  • 0 likes
  • 4 in conversation