BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
desireatem
Pyrite | Level 9

I do not know what is wrong with my code:

Below is the distribution of deposi:

deposi Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 498 59.50 498 59.50
1 239 28.55 737 88.05
2 100 11.95 837 100.00

 

I fitted the SAS procedure below and had the error

proc genmod data=stroke;
class gender(ref="0") race_ethnic(ref="0") deposi(ref="0") pay_source(ref="0") cmg_tier(ref="a") ;
model mrs_discharge= age gender race_ethnic deposi pay_source cmg_tier report_cmg weight_cmg expected_los/dist=gamma link=log type3;
run;

 

ERROR: Invalid reference value for deposi.
ERROR: No valid observations due to invalid or missing values in the response, explanatory, offset,
frequency, or weight variable.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE GENMOD used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

@desireatem wrote:


_c1=20202020202020202020203120202020
_c2=20202020202020202020202020202020
_c3=20202020202020202020202020202020
_c4=20202020202020202020202020202020
_c5=20202020202020202020202020202020
NOTE: There were 230 observations read from the data set WORK.STROKE.
WHERE CMISS(age, gender, race_ethnic, pay_source, cmg_tier, report_cmg, weight_cmg,
expected_los)=0;


Thanks for this helpful information. So, the third bullet point of my list of potential issues described what happened: Missing values of one or more other predictor variables in the observations with deposi=0 have made ref="0" an invalid reference level specification for deposi. The values of _c1, ..., _c5 indicate that all observations used in the analysis have deposi=1. Therefore, ref="1" is the only possible reference value for deposi. But even this is actually useless because the parameter estimate of the constant variable deposi will be zero, i.e., you can remove deposi from the CLASS and MODEL statements without losing more information.

 

Given that 72.5% of the 837 observations in work.stroke (with non-missing deposi) have been excluded because of missing values in age, gender, race_ethnic, pay_source, cmg_tier, report_cmg, weight_cmg and/or expected_los, you should check if

  1. that many values are correctly missing
  2. most of the missing values are from a single variable (or from only two, ...) -- Reeza's suggestion will be helpful for this
  3. you can remove those largely missing variables from the model or impute their missing values.

Otherwise, you won't get a very useful model from PROC GENMOD as it would be based on a relatively small, not representative subset of your analysis dataset, disregarding information (e.g., about the relationship between deposi and mrs_discharge) which you could gain from the rest of the dataset.

 

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

Whenever you get an error in the log, SHOW US THE LOG. That is, show us the entire log, 100% of it, with nothing chopped out.

 

Do not show us error messages disconnected from the code. Do not pick and choose parts  of the log to show us and then not show us other parts.

--
Paige Miller
desireatem
Pyrite | Level 9

1204 proc genmod data=stroke;
1205 class gender(ref="0") ert(ref="0") race_ethnic(ref="0") deposi(ref="0") pay_source(ref="0")
1205! cmg_tier(ref="a") ;
1206 model mrs_discharge= age gender race_ethnic deposi ert*deposi pay_source cmg_tier report_cmg
1206! weight_cmg expected_los/dist=gamma link=log type3;
1207 run;

ERROR: Invalid reference value for deposi.
ERROR: No valid observations due to invalid or missing values in the response, explanatory, offset,
frequency, or weight variable.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE GENMOD used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

 

Reeza
Super User
One of the variables in the model has too many missing values.
See how you can figure that out here:
https://blogs.sas.com/content/iml/2016/04/18/patterns-of-missing-data-in-sas.html
desireatem
Pyrite | Level 9

Thank you but there is less than 10 % missing.

ballardw
Super User

Any chance you typed a capitol "o" instead of zero (or vice versa).

What is the format assigned to that variable?

 

If that variable is character you may also need to check to see if the values have leading blanks. Most of the output tables will justify the text hiding the leading space but the reference value wants the actual level.

Or perhaps just use the keyword First instead of a level .

 

To see what I am talking about run the code below:

data example;
   x="    0";
run;

proc freq data=example;
   tables x;
run;

The single value of X actually has several leading blanks. In the output for Proc Freq the blanks have been "justified" away and not visible.

FreelanceReinh
Jade | Level 19

Hello @desireatem,

 

To check your data for all three potential issues mentioned by ballardw and Reeza, i.e.,

  • unformatted or inappropriately formatted reference level for formatted variable deposi
  • hidden character(s) in the reference value of variable deposi or its format label (leading or trailing blanks would not matter, though)
  • missing values of one or more other predictor variables in the observations with deposi=reference value (missing values in the dependent variable are largely tolerated),

you could run this DATA step and post the log, in particular the values of _c1, _c2, ..., if any:

data _null_;
array _c[5] $16 (5*' ');
set stroke end=last;
where cmiss(age, gender, race_ethnic, pay_source, cmg_tier, report_cmg, weight_cmg, expected_los)=0;
if vvalue(deposi) ~in: _c & i<dim(_c) then do;
  i+1;
  _c[i]=vvalue(deposi);
end;
if last;
put (_c[*])(=$hex32. /);
run;
desireatem
Pyrite | Level 9


366 data _null_;
367 array _c[5] $16 (5*' ');
368 set stroke end=last;
369 where cmiss(age, gender, race_ethnic, pay_source, cmg_tier, report_cmg, weight_cmg,
369! expected_los)=0;
370 if vvalue(deposi) ~in: _c & i<dim(_c) then do;
371 i+1;
372 _c[i]=vvalue(deposi);
373 end;
374 if last;
375 put (_c[*])(=$hex32. /);
376 run;

_c1=20202020202020202020203120202020
_c2=20202020202020202020202020202020
_c3=20202020202020202020202020202020
_c4=20202020202020202020202020202020
_c5=20202020202020202020202020202020
NOTE: There were 230 observations read from the data set WORK.STROKE.
WHERE CMISS(age, gender, race_ethnic, pay_source, cmg_tier, report_cmg, weight_cmg,
expected_los)=0;
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

FreelanceReinh
Jade | Level 19

@desireatem wrote:


_c1=20202020202020202020203120202020
_c2=20202020202020202020202020202020
_c3=20202020202020202020202020202020
_c4=20202020202020202020202020202020
_c5=20202020202020202020202020202020
NOTE: There were 230 observations read from the data set WORK.STROKE.
WHERE CMISS(age, gender, race_ethnic, pay_source, cmg_tier, report_cmg, weight_cmg,
expected_los)=0;


Thanks for this helpful information. So, the third bullet point of my list of potential issues described what happened: Missing values of one or more other predictor variables in the observations with deposi=0 have made ref="0" an invalid reference level specification for deposi. The values of _c1, ..., _c5 indicate that all observations used in the analysis have deposi=1. Therefore, ref="1" is the only possible reference value for deposi. But even this is actually useless because the parameter estimate of the constant variable deposi will be zero, i.e., you can remove deposi from the CLASS and MODEL statements without losing more information.

 

Given that 72.5% of the 837 observations in work.stroke (with non-missing deposi) have been excluded because of missing values in age, gender, race_ethnic, pay_source, cmg_tier, report_cmg, weight_cmg and/or expected_los, you should check if

  1. that many values are correctly missing
  2. most of the missing values are from a single variable (or from only two, ...) -- Reeza's suggestion will be helpful for this
  3. you can remove those largely missing variables from the model or impute their missing values.

Otherwise, you won't get a very useful model from PROC GENMOD as it would be based on a relatively small, not representative subset of your analysis dataset, disregarding information (e.g., about the relationship between deposi and mrs_discharge) which you could gain from the rest of the dataset.

 

Ksharp
Super User
As Reeza pointed out , you have too many variables be missing. which conduct to "deposi" doesn't have value 0 .

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1977 views
  • 3 likes
  • 6 in conversation