BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
_maldini_
Barite | Level 11

"ERROR: Invalid reference value for mj_user."

 

I am getting the above error message for this syntax: 

PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED;
	CLASS ed_visit (REF="No")/PARAM=REF;
	MODEL mj_user (REF="Never") = ed_visit /LINK=GLOGIT RSQ;
   	FORMAT mj_user mj_user_. ed_visit ed_visit_.;
RUN;

But not this syntax: 

PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED;
	CLASS gender (REF="Female")/PARAM=REF;
	MODEL mj_user (REF="Never") = gender /LINK=GLOGIT RSQ;
   	FORMAT mj_user mj_user_. gender gender_.;
RUN;

This is the format:

VALUE
mj_user_
1 = "Current, irregular"
2 = "Current, regular"
3 = "Former"
4 = "Never";
	

There are other examples. There is no difference in the syntax - that I can see - other than the predictor variable. 

I don't get it. Any thoughts?

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You don't specify a REF variable for your outcome, you specify the EVENT of interest instead. 

See the documentation for details. 

 

PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED;
	CLASS gender (REF="Female")/PARAM=REF;
	MODEL mj_user (EVENT="Never") = gender /LINK=GLOGIT RSQ;
   	FORMAT mj_user mj_user_. gender gender_.;
RUN;

 

When I include a valid REF value instead of EVENT I get a note in the log as follows but no errors:

NOTE: The REF= option for the response variable is ignored.

 

Without the exact log/data it's hard to tell what is causing your code to generate the error but my guess would be that you have missing data for your variables and it's possible different observations are being excluded that mess up the data. If you post the exact log that may help. You can check if this is the issue by examining the output and looking at the number of observations used in the model. Or running some frequency tables of the variables against each other to look for empty cells using the SPARSE option. 

 

proc freq data=dataset_1;
     table mj_user*gender / list sparse missing;
     table mj_user*ed_visit / list sparse missing;
     table mj_user*ed_visit*gender / list sparse missing;
run;


 


@_maldini_ wrote:

"ERROR: Invalid reference value for mj_user."

 

I am getting the above error message for this syntax: 

PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED;
	CLASS ed_visit (REF="No")/PARAM=REF;
	MODEL mj_user (REF="Never") = ed_visit /LINK=GLOGIT RSQ;
   	FORMAT mj_user mj_user_. ed_visit ed_visit_.;
RUN;

But not this syntax: 

PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED;
	CLASS gender (REF="Female")/PARAM=REF;
	MODEL mj_user (REF="Never") = gender /LINK=GLOGIT RSQ;
   	FORMAT mj_user mj_user_. gender gender_.;
RUN;

This is the format:

VALUE
mj_user_
1 = "Current, irregular"
2 = "Current, regular"
3 = "Former"
4 = "Never";
	

There are other examples. There is no difference in the syntax - that I can see - other than the predictor variable. 

I don't get it. Any thoughts?

 


 

 

View solution in original post

10 REPLIES 10
Reeza
Super User

You don't specify a REF variable for your outcome, you specify the EVENT of interest instead. 

See the documentation for details. 

 

PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED;
	CLASS gender (REF="Female")/PARAM=REF;
	MODEL mj_user (EVENT="Never") = gender /LINK=GLOGIT RSQ;
   	FORMAT mj_user mj_user_. gender gender_.;
RUN;

 

When I include a valid REF value instead of EVENT I get a note in the log as follows but no errors:

NOTE: The REF= option for the response variable is ignored.

 

Without the exact log/data it's hard to tell what is causing your code to generate the error but my guess would be that you have missing data for your variables and it's possible different observations are being excluded that mess up the data. If you post the exact log that may help. You can check if this is the issue by examining the output and looking at the number of observations used in the model. Or running some frequency tables of the variables against each other to look for empty cells using the SPARSE option. 

 

proc freq data=dataset_1;
     table mj_user*gender / list sparse missing;
     table mj_user*ed_visit / list sparse missing;
     table mj_user*ed_visit*gender / list sparse missing;
run;


 


@_maldini_ wrote:

"ERROR: Invalid reference value for mj_user."

 

I am getting the above error message for this syntax: 

PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED;
	CLASS ed_visit (REF="No")/PARAM=REF;
	MODEL mj_user (REF="Never") = ed_visit /LINK=GLOGIT RSQ;
   	FORMAT mj_user mj_user_. ed_visit ed_visit_.;
RUN;

But not this syntax: 

PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED;
	CLASS gender (REF="Female")/PARAM=REF;
	MODEL mj_user (REF="Never") = gender /LINK=GLOGIT RSQ;
   	FORMAT mj_user mj_user_. gender gender_.;
RUN;

This is the format:

VALUE
mj_user_
1 = "Current, irregular"
2 = "Current, regular"
3 = "Former"
4 = "Never";
	

There are other examples. There is no difference in the syntax - that I can see - other than the predictor variable. 

I don't get it. Any thoughts?

 


 

 

_maldini_
Barite | Level 11

Here is the exact log: 

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 PROC LOGISTIC DATA=ucsd_ed.dataset_1 ORDER=FORMATTED;
74 CLASS med_or_rec (REF="For medical purposes only")/PARAM=REF;
75 MODEL mj_user (REF="Never") = med_or_rec /LINK=GLOGIT RSQ;
76 FORMAT mj_user mj_user_. med_or_rec med_or_rec_.;
77 TITLE "XXX";
78 RUN;

ERROR: Invalid reference value for mj_user.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2577 observations read from the data set UCSD_ED.DATASET_1.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 1.35 seconds
cpu time 0.91 seconds

79
80
81 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
93

When I change REF to EVENT, the PROC LOGISTIC seems to run fine.

Here is the log: 

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 PROC LOGISTIC DATA=ucsd_ed.dataset_1 ORDER=FORMATTED;
74 CLASS med_or_rec (REF="For medical purposes only")/PARAM=REF;
75 MODEL mj_user (EVENT="Never") = med_or_rec /LINK=GLOGIT RSQ;
76 FORMAT mj_user mj_user_. med_or_rec med_or_rec_.;
77 TITLE "XXX";
78 RUN;

NOTE: The EVENT= option for the response variable is ignored for LINK=GLOGIT.
NOTE: PROC LOGISTIC is fitting the generalized logit model. The logits modeled contrast each response level against the reference
level (mj_user='Current, regular'). Use the response variable option REF= if you want to change the reference level.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 2577 observations read from the data set UCSD_ED.DATASET_1.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 4.99 seconds
cpu time 3.40 seconds


79
80
81 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
93

"The logits modeled contrast each response level against the reference level (mj_user='Current, regular'). Use the response variable option REF= if you want to change the reference level."

 

I do want to change the reference level, but when I do...I get the original error message.

 

Yes, there is a lot of missing data in the variables that produce the error. These data are from a complex survey so MOST of the data are missing for these/certain variables.

_maldini_
Barite | Level 11
Also, the documentation you referred me to states, "EVENT=’category’ | keyword specifies the event category for the binary response model. PROC LOGISTIC models the probability of the event category. The EVENT= option has no effect when there are more than two response categories."

The response variable (mj_user) is NOT binary. It has 4 response levels. Should I be using a different PROC?
Reeza
Super User
Example 3 and 4 in the LOGISTIC documentation show how to do Ordinal or Nominal Logistic regression. Your categories appear nominal, though I guess you could re-order them to be ordinal if you wanted.
Which one are you trying to do here?
_maldini_
Barite | Level 11
I'm trying to do nominal and define the reference category.

Thank you for your help!
Reeza
Super User
Then you'll need to recode your variable so that your reference group is the lowest value in the group and it will be the default value. The examples go through the rest of how to handle a nominal logistic regression and how to interpret the output.

If you have survey weighted data, you may need to use SURVEYLOGISTIC, but that's beyond my knowledge.
_maldini_
Barite | Level 11
I understand what you're saying, but that doesn't explain why the original syntax works for some variables and not others.
Defining the reference level in the model statement has worked fine in the past with other data sets, and works fine for the SOME variables in this data set...Hmmm.

Thanks again for your help.
Reeza
Super User
If any variable mentioned in the PROC is missing, that entire row is deleted. I'm guessing sometimes that's removing all your NEVER categories, which you can confirm with the FREQ code I'd suggested earlier with the SPARSE options.
_maldini_
Barite | Level 11

The output for the PROC FREQ with the SPARSE option is attached.

 

If I'm understanding the output correctly, it looks like data is absent for mj_user at the levels "Current, irregular" and "Current, regular". Data in the "Never" category for mj_user (which I was defining as the reference level) is not absent for either level of ed_visit, but it's infrequent for "Yes" (n=2). 

 

Does this confirm your hypothesis?

 

I hate to ask such a basic question, but if the data are not there, I can't proceed with the regression analysis, correct? I would have to combine levels of mj_user to avoid the missing data problem?

 

 

 

 

Reeza
Super User
There's a lot of 0's and missing in that table so that is the issue.

I have no idea of what your research question is, how you collected the data or what you're trying to do so I cannot recommend next steps.

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 7981 views
  • 5 likes
  • 2 in conversation