New SAS User

Adrián_cyc · Posted 11-12-2019 06:58 AM

Hello,

We have a doubt about the value of the kolmogorof smirnov for one of our models. The model is a scorecard developed using SAS Enterprise Miner and recently the KS statistic is taking very high values, up to a 0.65. We would like to know why is this happening and, if it is possible, how to solve it.

Thank you very much for your attention.

Ksharp · Posted 11-12-2019 07:24 AM

Your AUC statistic of logistic must be near 0.9 ?

That display your model is overfit .

Do you have a lot of independent variables ? or a big data ?

OR

some one or two variables are very very significant for your model

you could use proc freq check:

proc freq data=have;

table good_bad*X1 ;

run;

OR

Check WOE of each X variables, see if there is very big like 500 or very small like -500 .

OR

Calling @Rick_SAS

Rick_SAS · Posted 11-12-2019 08:27 AM

You need to explain how the KS statistic is being used. What are you modeling and what hypothesis test is being run? The KS statistic is used for many things, including the modeling of a distribution or the normality of residuals.

There is a picture in the article "What is Kolmogorov's D statistic?", which shows the geometric meaning of the KS statistic. The value represents the maximum deviation between an empirical CDF and the CDF of a reference distribution (often the normal distribution). The situation you describe indicates that the empirical distribution of the data is very different from the reference distribution, such as the artificial example I've created below. Reasons might include that you are specifying some parameters for the reference distribution (for example, a threshold parameter) that is very different from the best choice for that parameter.

Ksharp · Posted 11-13-2019 06:35 AM

If I was right, your missing value(level) of each X variables must have a very big or small WOE like : 400,-400 .

And If I was also right , your every X variables must have a very big IV value like : 0.8 or 0.6 .

That is to say your data sampling method (or data quality ) does not look right .

You should keep missing value away from all your X variables .

Rick,

OP 's code should look like

data final_total_score;input good_bad $ total_score;cards;good 600good 620bad  520bad  440..........;
title "KS检验";
proc npar1way data=final_total_score plots=edfplot edf ;
class good_bad;
var total_score;
run;

But OP get this KS under SAS/EM ,an GUI windows component in SAS ,like SAS/EG .

Adrián_cyc · Posted 11-13-2019 06:42 AM

The problem is that this model has been working well during years, this problem has been going on for two months, we had never had this problem. When the model was developed, the WOE and the IV values for each variable seemed correct.

Ksharp · Posted 11-13-2019 06:57 AM

Is there some variable's IV is greater than 0.5 or 0.6 , If it does ,then your model is NOT trusted ,should drop these high IV variables .

ballardw · Posted 11-13-2019 06:06 PM

@Adrián_cyc wrote:

The problem is that this model has been working well during years, this problem has been going on for two months, we had never had this problem. When the model was developed, the WOE and the IV values for each variable seemed correct.

Quite often when something has been working reasonably well and then stops you might want to investigate something other than the model code.

Did the data collection methods change?

Did any variables change meaning but use the same values?

Did precision of an instrument change?

Did the number of records involved for any sort of grouping variable(s) change? Or change for just some grouping variable values?

Does anyone examine the logs of the step that brings the data into SAS? Are there warnings that weren't there before? Data conversion notes?

It may help to actually post an example of the model code you are using. Someone familiar with the proc may be able to point out places

New SAS User

Spurious values of the kolmogorov smirnov statistic

Re: Spurious values of the kolmogorov smirnov statistic

Re: Spurious values of the kolmogorov smirnov statistic

Re: Spurious values of the kolmogorov smirnov statistic

Re: Spurious values of the kolmogorov smirnov statistic

Re: Spurious values of the kolmogorov smirnov statistic

Re: Spurious values of the kolmogorov smirnov statistic

Follow Us

What is...

New SAS User

Spurious values ​​of the kolmogorov smirnov statistic

Re: Spurious values ​​of the kolmogorov smirnov statistic

Re: Spurious values ​​of the kolmogorov smirnov statistic

Re: Spurious values ​​of the kolmogorov smirnov statistic

Re: Spurious values ​​of the kolmogorov smirnov statistic

Re: Spurious values ​​of the kolmogorov smirnov statistic

Re: Spurious values ​​of the kolmogorov smirnov statistic

Our biggest data and AI event of the year.

Follow Us

What is...

Spurious values of the kolmogorov smirnov statistic

Re: Spurious values of the kolmogorov smirnov statistic

Re: Spurious values of the kolmogorov smirnov statistic

Re: Spurious values of the kolmogorov smirnov statistic

Re: Spurious values of the kolmogorov smirnov statistic

Re: Spurious values of the kolmogorov smirnov statistic

Re: Spurious values of the kolmogorov smirnov statistic