BookmarkSubscribeRSS Feed
deengyn
Obsidian | Level 7

I'm using a data set called groin; it collects data on groin dissection surgery. 

Y is a binary dummy variable for any complications. 

X's include patient, disease, and treatment characteristics. This is but a sample of my data and it includes wound infection (binary), wound necrosis (binary), mitosis (binary), patient cancer status (binary), and operative time in min (continuous). 

 

I've done assessments in frequencies and univariate logistic models; everything went smoothly. 

data groin (keep = patient_id comp wnd_inf wnd_nec mitosis stats op_time); 
set raw_data; 
/*Complications*/
if comp_w = 'yes' then comp = '1'; 
if comp_w = 'no' then comp = '0';
/*Wound infection*/
if wnd_w = 'yes' then wnd_inf= '1'; 
if wnd_w  = 'no' then wnd_inf = '0';
/*Wound necrosis*/
if nec_w = 'yes' then wnd_nec = '1'; 
if nec_w  = 'no' then wnd_nec = '0';
/*Mitosis*/
if mit_w = ' 'yes' then mitosis = '1'; 
if mit_w  = 'no' then mitosis = '0';
/*Current cancer status*/
if pt_stat = 'cancer free' then stats = '0';
if pt_stat = 'recurred' then stats = '1'; run; 

/*Example of univariate log reg*/
proc logisitc data = groin descending; 
class mitosis (ref = '0') /param = ref; 
model comp = mitosis/ clodds = wald orpvalue; run; 

However, when doing the full model following this code, the error of INVALID REFERENCE VALUE shows up for mitosis

proc logisitc data = groin descending; 
class mitosis (ref = '0') 
wnd_inf (ref = '0')
wnd_nec (ref = '0')
stats (ref = '0')
op_time
/param = ref; model comp = wnd_inf wnd_nec mitosis stats op_time/ clodds = wald orpvalue; run;

Why would it work in the small model but not the full?

 

 

Here is the log that accompanies my real code

/*Small model*/
9543          proc logistic data = ga_2 descending;   /*Desceding puts complication = 1 as the
9543! reference group*/
9544                                                              /*Therefore it acesses the
9544! probability of having complication*/
9545                              class   sex_num  (ref = '0')
9546                                      thickness_num   (ref = '0')
9547                                      hist_num (ref = '0')
9548                                      mitosis_bin (ref = '0')
9549                                      ulcer_num   (ref = '0')
9550                                      lvi_num (ref = '0')
9551                                      regression_num (ref = '0')
9552                                      smoker_num  (ref = '0')
9553                                      diabetes_num (ref = '0')
9554                                      card_num (ref = '0')
9555                                      hypo_th_num (ref = '0')
9556                                      staff       (ref = '1')
9557                                      present_num (ref = '0')
9558                                      indication_num (ref = '0')
9559                                      dissec_num  (ref = '0')
9560                                      op_time
9561                                      blood_loss
9562                                      dur_immobile
9563                                      dur_postop_ab
9564                                      hosp_stay
9565                                      tot_ln
9566                                      tot_pos_ln
9567                                      neo_num (ref = '0')
9568                                      sys_num (ref = '0')
9569                                      adj_rt_num (ref = '0')
9570                                      recur_site_num  (ref = '1')
9571                                      /param = ref;
9572
9573                              model COMP_NUM =
9574                                  sex_num
9575                                  thickness_num   hist_num
9576                                  mitosis_bin
9577                                  ulcer_num       lvi_num     regression_num
9578                                  smoker_num  diabetes_num    card_num
9579                                  hypo_th_num
9580                                  staff   present_num     indication_num
9581                                  dissec_num  op_time  blood_loss     no_drains_left
9581! dur_immobile    dur_postop_ab   hosp_stay
9582                                  tot_ln  tot_pos_ln
9583                                  neo_num     sys_num     adj_rt_num
9584                                  recur_site_num
9585                                  / clodds = wald orpvalue;
9586                              title 'Multivariate Analyses, all variables'; run;

ERROR: Invalid reference value for mitosis_bin.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 141 observations read from the data set WORK.GA_2.
NOTE: PROCEDURE LOGISTIC used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
6 REPLIES 6
Tom
Super User Tom
Super User

Did your data step run?  There are extra quotes in the code you posted.

if wnd_w = ' 'yes' then wnd_inf= '1';
deengyn
Obsidian | Level 7
Apologies, it did indeed run; this is a sample example and I mistyped
ballardw
Super User

Show your log for that data step.

The code you show for mitosis (and WND_inf and WND_NEC) will generate errors:

if mit_w = ' 'yes' then mitosis = '1'; 

See the two ' after equal? That is the comparison, Yes would be treated as a variable in an incorrect position and I expect you get many other errors from unbalanced quotes.

 

Even the code you show has the color coding indicating that your code has unbalance quotes. Look at the places where YES appears black - that is a variable, vs the colored version inside quotes such as after COMP_W.

 

General hint: ALWAYS show code from the log with all the messages.

Did you check the log after the data step? It is quite likely that your Proc Logistic code is using a previous version of the data set, possibly with different variables.

 

 

 

deengyn
Obsidian | Level 7
Apologies, the code shared above is an edited version of what I have on my set (since it has a total of 23 variables).
The original code does indeed run.

Thank you, I'll add the log now and transfer it from the server.
ballardw
Super User

@deengyn wrote:
Apologies, the code shared above is an edited version of what I have on my set (since it has a total of 23 variables).
The original code does indeed run.

Thank you, I'll add the log now and transfer it from the server.

Your first post references Mitosis, the error is showing as Mitosis_bin, which is not shown in the data step. So one suspects more is missing.

Perhaps paste the code from the LOG for the data step as well. Or perhaps Proc Freq output for the variable Mitosis_bin.

deengyn
Obsidian | Level 7

So, here is the frequency output: 

deengyn_0-1633618532601.png

 

In general, these are all the variables:

Variable

Number of outputs used

Number of outputs missing

Variable description

1.      Sex

141

0

Binary

2.      Thickness

141

0

Categorical, discrete

3.      Histology, subtype

141

0

Binary

4.      Mitosis

68

73

Binary

5.      Ulceration

78

63

Binary

6.      LVI

69

72

Binary

7.      Regression

63

78

Binary

8.      Smoker

116

25

Binary

9.      Diabetes

141

0

Binary

10.   Cardiac disease

141

0

Binary

11.   Hypothyroidism

141

0

Binary

12.   Staff/surgeon

141

0

Categorical, discrete

13.   Presentation

141

0

Categorical, discrete

14.   Indication

141

0

Categorical, discrete

15.   Dissection type

139

2

Categorical, discrete

16.   OP time

133

8

Continuous

17.   Blood loss

91

50

Continuous

18.   Number of drains left

141

0

Continuous

19.   Duration of immobility

79

62

Continuous

20.   Duration of post-op AB use

80

61

Continuous

21.   Total hospital stay

80

61

Continuous

22.   Total LN

140

1

Continuous

23.   Total positive LN

141

0

Continuous

 

The code runs after removing MITOSIS_BINARY, but the numbers are all so weird. Some variables don't show up, the one of the continuous variables is seen as categorical. 

deengyn_1-1633618862974.png

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1341 views
  • 0 likes
  • 3 in conversation