Hi, i need help in getting cut off value from the deciles.. i have built my model using below steps:
1. Splitted full_data into training and validation
2. oversampled the training dataset
3. Trained the model on the oversampled dataset
4. binned the predicted probabilities into 10 deciles from step 3
My question is should i use only the oversampled dataset to determine cut off value or should I use the entire non-oversampled data set?see below code and data
Binning results
bin | HIV_Positive | %of_HIV Positive | Cumulative |
1 | 316 | 0.162970603 | 0.1629706 |
2 | 293 | 0.151108819 | 0.31407942 |
3 | 254 | 0.130995358 | 0.44507478 |
4 | 227 | 0.117070655 | 0.56214544 |
5 | 206 | 0.10624033 | 0.66838577 |
6 | 173 | 0.089221248 | 0.75760701 |
7 | 156 | 0.080453842 | 0.83806086 |
8 | 131 | 0.067560598 | 0.90562145 |
9 | 106 | 0.054667354 | 0.96028881 |
10 | 77 | 0.039711191 | 1 |
Total | 1939 |
proc logistic data = oversample outmodel=hiv_scoring desc PLOTS=ALL;
class Alcohol_Consumption Existing_cover Marital_status Lead_type Education gender product_line_id SMOKING_STATUS;
model Status = SUM_INSURED Monthly_income AGE_AT_INCEPT Alcohol_Consumption Existing_cover Marital_status Lead_type Education gender product_line_id SMOKING_STATUS /selection=stepwise;
output out = outreg predprobs=individual p=predicted;
run;
proc sort data=outreg;
by predicted;
run;
data outreg;
set outreg;
if _n_<=3870;
run;
/*binning the data for finding proper cut-off value*/
data test;
do i=1 to 10;
do j=1 to 387;
Output;
end;
end;
drop j;
run;
data outreg;
merge outreg test;
run;
proc sql;
create table summary as
select i as bin,
sum(Status=1) as HIV_Positive
from outreg
group by i
;quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.