BookmarkSubscribeRSS Feed
Solly7
Pyrite | Level 9

Hi, i need help in getting cut off value from the deciles.. i have built my model using below steps:

1. Splitted full_data into training and validation

2. oversampled the training dataset

3. Trained the model on the oversampled dataset

4. binned the predicted probabilities into 10 deciles from step 3

 

My question is should i use only the oversampled dataset to determine cut off value or should I use the entire non-oversampled data set?see below code and data

                       Binning results

bin HIV_Positive %of_HIV Positive Cumulative
1 316 0.162970603 0.1629706
2 293 0.151108819 0.31407942
3 254 0.130995358 0.44507478
4 227 0.117070655 0.56214544
5 206 0.10624033 0.66838577
6 173 0.089221248 0.75760701
7 156 0.080453842 0.83806086
8 131 0.067560598 0.90562145
9 106 0.054667354 0.96028881
10 77 0.039711191 1
Total 1939    

 

 

proc logistic data = oversample outmodel=hiv_scoring desc PLOTS=ALL;
class Alcohol_Consumption Existing_cover Marital_status Lead_type Education gender product_line_id SMOKING_STATUS;
model Status = SUM_INSURED Monthly_income AGE_AT_INCEPT Alcohol_Consumption Existing_cover Marital_status Lead_type Education gender product_line_id SMOKING_STATUS /selection=stepwise;
output out = outreg predprobs=individual p=predicted;
run;



proc sort data=outreg;
by predicted;
run;

data outreg;
set outreg;
if _n_<=3870;
run;

/*binning the data for finding proper cut-off value*/
data test;
do i=1 to 10;
do j=1 to 387;
Output;
end;
end;
drop j;
run;

data outreg;
merge outreg test;
run;

proc sql;
create table summary as
select i as bin,
sum(Status=1) as HIV_Positive
from outreg
group by i
;quit;

 

1 REPLY 1
Reeza
Super User
Why are you binning your predicted probabilities?

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 919 views
  • 0 likes
  • 2 in conversation