"ERROR: Invalid reference value for mj_user."
I am getting the above error message for this syntax:
PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED;
CLASS ed_visit (REF="No")/PARAM=REF;
MODEL mj_user (REF="Never") = ed_visit /LINK=GLOGIT RSQ;
FORMAT mj_user mj_user_. ed_visit ed_visit_.;
RUN;
But not this syntax:
PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED;
CLASS gender (REF="Female")/PARAM=REF;
MODEL mj_user (REF="Never") = gender /LINK=GLOGIT RSQ;
FORMAT mj_user mj_user_. gender gender_.;
RUN;
This is the format:
VALUE mj_user_ 1 = "Current, irregular" 2 = "Current, regular" 3 = "Former" 4 = "Never";
There are other examples. There is no difference in the syntax - that I can see - other than the predictor variable.
I don't get it. Any thoughts?
You don't specify a REF variable for your outcome, you specify the EVENT of interest instead.
See the documentation for details.
PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED;
CLASS gender (REF="Female")/PARAM=REF;
MODEL mj_user (EVENT="Never") = gender /LINK=GLOGIT RSQ;
FORMAT mj_user mj_user_. gender gender_.;
RUN;
When I include a valid REF value instead of EVENT I get a note in the log as follows but no errors:
NOTE: The REF= option for the response variable is ignored.
Without the exact log/data it's hard to tell what is causing your code to generate the error but my guess would be that you have missing data for your variables and it's possible different observations are being excluded that mess up the data. If you post the exact log that may help. You can check if this is the issue by examining the output and looking at the number of observations used in the model. Or running some frequency tables of the variables against each other to look for empty cells using the SPARSE option.
proc freq data=dataset_1;
table mj_user*gender / list sparse missing;
table mj_user*ed_visit / list sparse missing;
table mj_user*ed_visit*gender / list sparse missing;
run;
@_maldini_ wrote:
"ERROR: Invalid reference value for mj_user."
I am getting the above error message for this syntax:
PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED; CLASS ed_visit (REF="No")/PARAM=REF; MODEL mj_user (REF="Never") = ed_visit /LINK=GLOGIT RSQ; FORMAT mj_user mj_user_. ed_visit ed_visit_.; RUN;But not this syntax:
PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED; CLASS gender (REF="Female")/PARAM=REF; MODEL mj_user (REF="Never") = gender /LINK=GLOGIT RSQ; FORMAT mj_user mj_user_. gender gender_.; RUN;This is the format:
VALUE mj_user_ 1 = "Current, irregular" 2 = "Current, regular" 3 = "Former" 4 = "Never";There are other examples. There is no difference in the syntax - that I can see - other than the predictor variable.
I don't get it. Any thoughts?
You don't specify a REF variable for your outcome, you specify the EVENT of interest instead.
See the documentation for details.
PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED;
CLASS gender (REF="Female")/PARAM=REF;
MODEL mj_user (EVENT="Never") = gender /LINK=GLOGIT RSQ;
FORMAT mj_user mj_user_. gender gender_.;
RUN;
When I include a valid REF value instead of EVENT I get a note in the log as follows but no errors:
NOTE: The REF= option for the response variable is ignored.
Without the exact log/data it's hard to tell what is causing your code to generate the error but my guess would be that you have missing data for your variables and it's possible different observations are being excluded that mess up the data. If you post the exact log that may help. You can check if this is the issue by examining the output and looking at the number of observations used in the model. Or running some frequency tables of the variables against each other to look for empty cells using the SPARSE option.
proc freq data=dataset_1;
table mj_user*gender / list sparse missing;
table mj_user*ed_visit / list sparse missing;
table mj_user*ed_visit*gender / list sparse missing;
run;
@_maldini_ wrote:
"ERROR: Invalid reference value for mj_user."
I am getting the above error message for this syntax:
PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED; CLASS ed_visit (REF="No")/PARAM=REF; MODEL mj_user (REF="Never") = ed_visit /LINK=GLOGIT RSQ; FORMAT mj_user mj_user_. ed_visit ed_visit_.; RUN;But not this syntax:
PROC LOGISTIC DATA=xxx_ed.dataset_1 ORDER=FORMATTED; CLASS gender (REF="Female")/PARAM=REF; MODEL mj_user (REF="Never") = gender /LINK=GLOGIT RSQ; FORMAT mj_user mj_user_. gender gender_.; RUN;This is the format:
VALUE mj_user_ 1 = "Current, irregular" 2 = "Current, regular" 3 = "Former" 4 = "Never";There are other examples. There is no difference in the syntax - that I can see - other than the predictor variable.
I don't get it. Any thoughts?
Here is the exact log:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 PROC LOGISTIC DATA=ucsd_ed.dataset_1 ORDER=FORMATTED;
74 CLASS med_or_rec (REF="For medical purposes only")/PARAM=REF;
75 MODEL mj_user (REF="Never") = med_or_rec /LINK=GLOGIT RSQ;
76 FORMAT mj_user mj_user_. med_or_rec med_or_rec_.;
77 TITLE "XXX";
78 RUN;
ERROR: Invalid reference value for mj_user.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2577 observations read from the data set UCSD_ED.DATASET_1.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 1.35 seconds
cpu time 0.91 seconds
79
80
81 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
93
When I change REF to EVENT, the PROC LOGISTIC seems to run fine.
Here is the log:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 PROC LOGISTIC DATA=ucsd_ed.dataset_1 ORDER=FORMATTED;
74 CLASS med_or_rec (REF="For medical purposes only")/PARAM=REF;
75 MODEL mj_user (EVENT="Never") = med_or_rec /LINK=GLOGIT RSQ;
76 FORMAT mj_user mj_user_. med_or_rec med_or_rec_.;
77 TITLE "XXX";
78 RUN;
NOTE: The EVENT= option for the response variable is ignored for LINK=GLOGIT.
NOTE: PROC LOGISTIC is fitting the generalized logit model. The logits modeled contrast each response level against the reference
level (mj_user='Current, regular'). Use the response variable option REF= if you want to change the reference level.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 2577 observations read from the data set UCSD_ED.DATASET_1.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 4.99 seconds
cpu time 3.40 seconds
79
80
81 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
93
"The logits modeled contrast each response level against the reference level (mj_user='Current, regular'). Use the response variable option REF= if you want to change the reference level."
I do want to change the reference level, but when I do...I get the original error message.
Yes, there is a lot of missing data in the variables that produce the error. These data are from a complex survey so MOST of the data are missing for these/certain variables.
The output for the PROC FREQ with the SPARSE option is attached.
If I'm understanding the output correctly, it looks like data is absent for mj_user at the levels "Current, irregular" and "Current, regular". Data in the "Never" category for mj_user (which I was defining as the reference level) is not absent for either level of ed_visit, but it's infrequent for "Yes" (n=2).
Does this confirm your hypothesis?
I hate to ask such a basic question, but if the data are not there, I can't proceed with the regression analysis, correct? I would have to combine levels of mj_user to avoid the missing data problem?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.