I'm using a data set called groin; it collects data on groin dissection surgery.
Y is a binary dummy variable for any complications.
X's include patient, disease, and treatment characteristics. This is but a sample of my data and it includes wound infection (binary), wound necrosis (binary), mitosis (binary), patient cancer status (binary), and operative time in min (continuous).
I've done assessments in frequencies and univariate logistic models; everything went smoothly.
data groin (keep = patient_id comp wnd_inf wnd_nec mitosis stats op_time);
set raw_data;
/*Complications*/
if comp_w = 'yes' then comp = '1';
if comp_w = 'no' then comp = '0';
/*Wound infection*/
if wnd_w = 'yes' then wnd_inf= '1';
if wnd_w = 'no' then wnd_inf = '0';
/*Wound necrosis*/
if nec_w = 'yes' then wnd_nec = '1';
if nec_w = 'no' then wnd_nec = '0';
/*Mitosis*/
if mit_w = ' 'yes' then mitosis = '1';
if mit_w = 'no' then mitosis = '0';
/*Current cancer status*/
if pt_stat = 'cancer free' then stats = '0';
if pt_stat = 'recurred' then stats = '1'; run;
/*Example of univariate log reg*/
proc logisitc data = groin descending;
class mitosis (ref = '0') /param = ref;
model comp = mitosis/ clodds = wald orpvalue; run;
However, when doing the full model following this code, the error of INVALID REFERENCE VALUE shows up for mitosis
proc logisitc data = groin descending; class mitosis (ref = '0')wnd_inf (ref = '0')
/param = ref; model comp =
wnd_nec (ref = '0')
stats (ref = '0')
op_timewnd_inf wnd_nec mitosis stats op_time
/ clodds = wald orpvalue; run;
Why would it work in the small model but not the full?
Here is the log that accompanies my real code
/*Small model*/
9543 proc logistic data = ga_2 descending; /*Desceding puts complication = 1 as the
9543! reference group*/
9544 /*Therefore it acesses the
9544! probability of having complication*/
9545 class sex_num (ref = '0')
9546 thickness_num (ref = '0')
9547 hist_num (ref = '0')
9548 mitosis_bin (ref = '0')
9549 ulcer_num (ref = '0')
9550 lvi_num (ref = '0')
9551 regression_num (ref = '0')
9552 smoker_num (ref = '0')
9553 diabetes_num (ref = '0')
9554 card_num (ref = '0')
9555 hypo_th_num (ref = '0')
9556 staff (ref = '1')
9557 present_num (ref = '0')
9558 indication_num (ref = '0')
9559 dissec_num (ref = '0')
9560 op_time
9561 blood_loss
9562 dur_immobile
9563 dur_postop_ab
9564 hosp_stay
9565 tot_ln
9566 tot_pos_ln
9567 neo_num (ref = '0')
9568 sys_num (ref = '0')
9569 adj_rt_num (ref = '0')
9570 recur_site_num (ref = '1')
9571 /param = ref;
9572
9573 model COMP_NUM =
9574 sex_num
9575 thickness_num hist_num
9576 mitosis_bin
9577 ulcer_num lvi_num regression_num
9578 smoker_num diabetes_num card_num
9579 hypo_th_num
9580 staff present_num indication_num
9581 dissec_num op_time blood_loss no_drains_left
9581! dur_immobile dur_postop_ab hosp_stay
9582 tot_ln tot_pos_ln
9583 neo_num sys_num adj_rt_num
9584 recur_site_num
9585 / clodds = wald orpvalue;
9586 title 'Multivariate Analyses, all variables'; run;
ERROR: Invalid reference value for mitosis_bin.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 141 observations read from the data set WORK.GA_2.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Did your data step run? There are extra quotes in the code you posted.
if wnd_w = ' 'yes' then wnd_inf= '1';
Show your log for that data step.
The code you show for mitosis (and WND_inf and WND_NEC) will generate errors:
if mit_w = ' 'yes' then mitosis = '1';
See the two ' after equal? That is the comparison, Yes would be treated as a variable in an incorrect position and I expect you get many other errors from unbalanced quotes.
Even the code you show has the color coding indicating that your code has unbalance quotes. Look at the places where YES appears black - that is a variable, vs the colored version inside quotes such as after COMP_W.
General hint: ALWAYS show code from the log with all the messages.
Did you check the log after the data step? It is quite likely that your Proc Logistic code is using a previous version of the data set, possibly with different variables.
@deengyn wrote:
Apologies, the code shared above is an edited version of what I have on my set (since it has a total of 23 variables).
The original code does indeed run.
Thank you, I'll add the log now and transfer it from the server.
Your first post references Mitosis, the error is showing as Mitosis_bin, which is not shown in the data step. So one suspects more is missing.
Perhaps paste the code from the LOG for the data step as well. Or perhaps Proc Freq output for the variable Mitosis_bin.
So, here is the frequency output:
In general, these are all the variables:
Variable |
Number of outputs used |
Number of outputs missing |
Variable description |
1. Sex |
141 |
0 |
Binary |
2. Thickness |
141 |
0 |
Categorical, discrete |
3. Histology, subtype |
141 |
0 |
Binary |
4. Mitosis |
68 |
73 |
Binary |
5. Ulceration |
78 |
63 |
Binary |
6. LVI |
69 |
72 |
Binary |
7. Regression |
63 |
78 |
Binary |
8. Smoker |
116 |
25 |
Binary |
9. Diabetes |
141 |
0 |
Binary |
10. Cardiac disease |
141 |
0 |
Binary |
11. Hypothyroidism |
141 |
0 |
Binary |
12. Staff/surgeon |
141 |
0 |
Categorical, discrete |
13. Presentation |
141 |
0 |
Categorical, discrete |
14. Indication |
141 |
0 |
Categorical, discrete |
15. Dissection type |
139 |
2 |
Categorical, discrete |
16. OP time |
133 |
8 |
Continuous |
17. Blood loss |
91 |
50 |
Continuous |
18. Number of drains left |
141 |
0 |
Continuous |
19. Duration of immobility |
79 |
62 |
Continuous |
20. Duration of post-op AB use |
80 |
61 |
Continuous |
21. Total hospital stay |
80 |
61 |
Continuous |
22. Total LN |
140 |
1 |
Continuous |
23. Total positive LN |
141 |
0 |
Continuous |
The code runs after removing MITOSIS_BINARY, but the numbers are all so weird. Some variables don't show up, the one of the continuous variables is seen as categorical.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.