BookmarkSubscribeRSS Feed
Solly7
Pyrite | Level 9

Hi, i need help in getting cut off value from the deciles.. i have built my model using below steps:

1. Splitted full_data into training and validation

2. oversampled the training dataset

3. Trained the model on the oversampled dataset

4. binned the predicted probabilities into 10 deciles from step 3

 

My question is should i use only the oversampled dataset to determine cut off value or should I use the entire non-oversampled data set?see below code and data

                       Binning results

bin HIV_Positive %of_HIV Positive Cumulative
1 316 0.162970603 0.1629706
2 293 0.151108819 0.31407942
3 254 0.130995358 0.44507478
4 227 0.117070655 0.56214544
5 206 0.10624033 0.66838577
6 173 0.089221248 0.75760701
7 156 0.080453842 0.83806086
8 131 0.067560598 0.90562145
9 106 0.054667354 0.96028881
10 77 0.039711191 1
Total 1939    

 

 

proc logistic data = oversample outmodel=hiv_scoring desc PLOTS=ALL;
class Alcohol_Consumption Existing_cover Marital_status Lead_type Education gender product_line_id SMOKING_STATUS;
model Status = SUM_INSURED Monthly_income AGE_AT_INCEPT Alcohol_Consumption Existing_cover Marital_status Lead_type Education gender product_line_id SMOKING_STATUS /selection=stepwise;
output out = outreg predprobs=individual p=predicted;
run;



proc sort data=outreg;
by predicted;
run;

data outreg;
set outreg;
if _n_<=3870;
run;

/*binning the data for finding proper cut-off value*/
data test;
do i=1 to 10;
do j=1 to 387;
Output;
end;
end;
drop j;
run;

data outreg;
merge outreg test;
run;

proc sql;
create table summary as
select i as bin,
sum(Status=1) as HIV_Positive
from outreg
group by i
;quit;

 

1 REPLY 1
Reeza
Super User
Why are you binning your predicted probabilities?

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 815 views
  • 0 likes
  • 2 in conversation