Solved: Gini credit risk model

Ronein · Posted 12-21-2024 12:45 PM

Hello
I build a credit risk model on 100,000 customers. I split the data into train 70% and test 30% and built the model on train data. Then the results are Gini 79.5% on train and 78.5% on test . My question- is this difference of 1% is okay or mention a problem ?

Ksharp · Posted 12-21-2024 09:22 PM

1) You could do Wilcoxon Test(non-parameter method) to check whether the score from TRAIN and TEST are conform to the same distribution.

data train test ;
 set sashelp.heart(keep=status ageatstart);
 if status='Alive' then output train;
  else output test;
 rename ageatstart=score;
run;

data all;
 set train test indsname=indsname;
 dsn=indsname;
run;
proc npar1way data=all edf;
class dsn;
var score;
run;

Here D is KS value which is > 0.3 and PValue=<.0001

that means it is significant(a.k.a the score is different from TRAIN and TEST, Gini 79.5% on train and 78.5% on test is different with each other).

2)You also can do ANOVA if your score from TRAIN and TEST both are conform to NORMAL distribution.


proc glm data=all ;
class dsn;
model score=dsn/solution;
quit;

3)You also could compare two ROC curve by Chisquare Test.

https://support.sas.com/kb/45/339.html

4) Calling @StatDave

View solution in original post

Ksharp · Posted 12-21-2024 09:22 PM

1) You could do Wilcoxon Test(non-parameter method) to check whether the score from TRAIN and TEST are conform to the same distribution.

data train test ;
 set sashelp.heart(keep=status ageatstart);
 if status='Alive' then output train;
  else output test;
 rename ageatstart=score;
run;

data all;
 set train test indsname=indsname;
 dsn=indsname;
run;
proc npar1way data=all edf;
class dsn;
var score;
run;

Here D is KS value which is > 0.3 and PValue=<.0001

that means it is significant(a.k.a the score is different from TRAIN and TEST, Gini 79.5% on train and 78.5% on test is different with each other).

2)You also can do ANOVA if your score from TRAIN and TEST both are conform to NORMAL distribution.


proc glm data=all ;
class dsn;
model score=dsn/solution;
quit;

3)You also could compare two ROC curve by Chisquare Test.

https://support.sas.com/kb/45/339.html

4) Calling @StatDave

Gini credit risk model

Re: Gini credit risk model

Re: Gini credit risk model

Gini credit risk model

Re: Gini credit risk model

Re: Gini credit risk model

Registration is open