cut_off_value using deciles

Solly7 — Tue, 13 Jul 2021 07:27:26 GMT

Hi, i need help in getting cut off value from the deciles.. i have built my model using below steps:

1. Splitted full_data into training and validation

2. oversampled the training dataset

3. Trained the model on the oversampled dataset

4. binned the predicted probabilities into 10 deciles from step 3

My question is should i use only the oversampled dataset to determine cut off value or should I use the entire non-oversampled data set?see below code and data

Binning results

bin	HIV_Positive	%of_HIV Positive	Cumulative
1	316	0.162970603	0.1629706
2	293	0.151108819	0.31407942
3	254	0.130995358	0.44507478
4	227	0.117070655	0.56214544
5	206	0.10624033	0.66838577
6	173	0.089221248	0.75760701
7	156	0.080453842	0.83806086
8	131	0.067560598	0.90562145
9	106	0.054667354	0.96028881
10	77	0.039711191	1
Total	1939

proc logistic data = oversample outmodel=hiv_scoring desc PLOTS=ALL;
class Alcohol_Consumption Existing_cover Marital_status Lead_type Education gender product_line_id SMOKING_STATUS;
model Status = SUM_INSURED Monthly_income AGE_AT_INCEPT Alcohol_Consumption Existing_cover Marital_status Lead_type Education gender product_line_id SMOKING_STATUS /selection=stepwise;
output out = outreg predprobs=individual p=predicted;
run;



proc sort data=outreg;
by predicted;
run;

data outreg;
set outreg;
if _n_<=3870;
run;

/*binning the data for finding proper cut-off value*/
data test;
do i=1 to 10;
do j=1 to 387;
Output;
end;
end;
drop j;
run;

data outreg;
merge outreg test;
run;

proc sql;
create table summary as
select i as bin,
sum(Status=1) as HIV_Positive
from outreg
group by i
;quit;

Re: cut_off_value using deciles

Reeza — Tue, 13 Jul 2021 16:08:51 GMT

Why are you binning your predicted probabilities?

topic cut_off_value using deciles in SAS Data Science

cut_off_value using deciles

Re: cut_off_value using deciles