BookmarkSubscribeRSS Feed
Roxxanne
Fluorite | Level 6

Hi all,

I attempting an ordinal regression analysis and for some reason, SAS does not recognize one of the levels of my variable. I have a variable called 'cancer', with two levels, 'yes' or 'no'. When I set the reference value as 'no' I receive an error saying invalid reference. So instead, I try 'yes' instead, the procedure seems to work however, when I am looking at the output I have no estimates related to my cancer variable. This is my code:

proc logistic data=WORK.THESIS desc;
class age(ref='18 to 24 years') sex(ref='Male') education(ref='Some formal education') income(ref='Less than $30,000')
relationship(ref='Immediate family member') Healthcare(ref='No') Cancer(ref='No') / param=reference;
model category1 = age sex education income healthcare relationship cancer;
run;

Please help! Thank you. 

9 REPLIES 9
Reeza
Super User

Is there a format applied on the variable? SAS expects the formatted value.

 

Please post the actual log as well, it helps to indicate where the error may be, if you can't do that, what happens if you run a PROC FREQ on the CANCER variable?.

 

proc freq data=thesis;
table cancer*category1 / missing;
run;

 


@Roxxanne wrote:

Hi all,

I attempting an ordinal regression analysis and for some reason, SAS does not recognize one of the levels of my variable. I have a variable called 'cancer', with two levels, 'yes' or 'no'. When I set the reference value as 'no' I receive an error saying invalid reference. So instead, I try 'yes' instead, the procedure seems to work however, when I am looking at the output I have no estimates related to my cancer variable. This is my code:

proc logistic data=WORK.THESIS desc;
class age(ref='18 to 24 years') sex(ref='Male') education(ref='Some formal education') income(ref='Less than $30,000')
relationship(ref='Immediate family member') Healthcare(ref='No') Cancer(ref='No') / param=reference;
model category1 = age sex education income healthcare relationship cancer;
run;

Please help! Thank you. 


 

Roxxanne
Fluorite | Level 6

Hi Reeza,

Thank you for your reply. Here is the log when I put the code mentioned previously:

Roxxanne_0-1626999593626.png

 

When I run proc freq on the cancer variable everything looks ok to me (I am aware there are some missing values), here is the output:

Roxxanne_1-1626999687087.png
I am not sure if there is a format applied. 

Thanks again!

 

ballardw
Super User

The actual value for your Cancer variable could be " No", with one or more leading spaces. Proc Freq and almost all of the tabular output from SAS procs will left justify removing such spaces. As a reference level you need the actual value.

 

Proc contents run against the data set will show all of the formats and variable types. Or click on the column header in SAS table viewer to get details about the variable.

It is very common to have numeric 1/0 coded variables to have a format that displays Yes/No . If you see that the variable is numeric you might try using 0 as the reference.

StatDave
SAS Super FREQ

This sort of problem in any modeling procedure is usually due to missing values in the other variables involved in the model. An observation is ignored if any variable involved in the model (response, predictor, weight, freq, offset, etc.) is missing. The issue is probably the result of missing values in other variables excluding all observations in one level of your cancer variable. Note that observations may be excluded as a result of having missing values in different variables, so all observations can be excluded even if the cancer variable has nonmissing values in both levels.

Rick_SAS
SAS Super FREQ

You can test whether Dave's conjecture is correct by looking at the pattern of missing values. See the first PROC MI example in the article, "Visualize patterns of missing values."

StatDave
SAS Super FREQ

You might want to examine the observations that were ignored in the model fit to understand the cause. An easy way to do that is to add an OUTPUT statement in your PROC LOGISTIC step to save the predicted values. Predicted values are missing if any variable, other than the response, was missing. You can then easily create a data set of the observations that were ignored and examine them. For example, these statements create a data set, NotUsed, that contains all of the observations that were ignored. 

proc logistic data=WORK.THESIS desc;
class age(ref='18 to 24 years') sex(ref='Male') education(ref='Some formal education') income(ref='Less than $30,000')
relationship(ref='Immediate family member') Healthcare(ref='No') Cancer(ref='No') / param=reference;
model category1 = age sex education income healthcare relationship cancer;
output out=out p=p;
run;
data NotUsed; set out; if cmiss(of p category1); run;
Roxxanne
Fluorite | Level 6

Thank you for your reply. 

I ran the code you sent me but I am not too sure what I am suppose to be looking for.. I am aware my datasets has a decent amount of missing data as all of our predictor variables were optional questions, so every variable has some missing values. I am not sure if this is useful information however, the variable 'relationship' only has values for individuals who answered yes to the 'cancer' variable. I also tried removing relationship from the model however, I still receive the same error as previously mentioned (invalid reference for cancer).

I also tried what Rick mentioned by running proc MI however, I received the same error for all my variables:

Roxxanne_0-1627067116898.png

I also tried this code:

Roxxanne_1-1627067251515.png


I am not sure if the issue is because all my predictor variables are categorical? 

StatDave
SAS Super FREQ

All variables in PROC MEANS must be numeric, so the errors from PROC MEANS are because your variables are character. 

 

In the NotUsed data set, you need to see if every observation with Cancer="No" contains a missing value for some variable in your model - any of the predictors or the response. You will find that that is the case. Note that the observations will probably have missings in different variables, but as long as there is a missing in any of the model variables, then the observation will be ignored.

Reeza
Super User
Since your variables are categorical, PROC FREQ is more appropriate than MEANS.

When running a model, SAS excludes any row that has any missing values for any variables referenced in the PROC. Because you only have a few Cancer cases is it possible that those are being excluded entirely because of other missing values in your data?

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1706 views
  • 8 likes
  • 5 in conversation