BookmarkSubscribeRSS Feed
Adrián_cyc
Calcite | Level 5

Hello, 

 

We have a doubt about the value of the kolmogorof smirnov for one of our models. The model is a scorecard developed using SAS Enterprise Miner and recently the KS statistic is taking very high values, up to a 0.65. We would like to know why is this happening and, if it is possible, how to solve it.

 

Thank you very much for your attention.

6 REPLIES 6
Ksharp
Super User

Your AUC statistic of logistic must be near 0.9 ?

That display your model is overfit .

Do you have a lot of independent variables ? or a big data ?

 

OR

some one or two variables are very very significant for your model

you could use proc freq check:

proc freq data=have;

table good_bad*X1 ;

run;

 

OR 

Check WOE of each X variables, see if there is very big like 500 or very small like -500 .

 

OR

Calling @Rick_SAS 

Rick_SAS
SAS Super FREQ

You need to explain how the KS statistic is being used. What are you modeling and what hypothesis test is being run? The KS statistic is used for many things, including the modeling of a distribution or the normality of residuals.

 

There is a picture in the article "What is Kolmogorov's D statistic?", which shows the geometric meaning of the KS statistic. The value represents the maximum deviation between an empirical CDF and the CDF of a reference distribution (often the normal distribution). The situation you describe indicates that the empirical distribution of the data is very different from the reference distribution, such as the artificial example I've created below. Reasons might include that you are specifying some parameters for the reference distribution (for example, a threshold parameter) that is very different from the best choice for that parameter.

 

 

Capture.PNG

Ksharp
Super User

If I was right, your missing value(level) of each X variables must have a very big or small WOE like : 400,-400 .

And If I was also right , your every X variables must have a very big IV value like : 0.8 or 0.6 .

 

That is to say your data sampling method (or data quality ) does not look right .

You should keep missing value away from all your X variables .

 

Rick,

OP 's code should look like 

data final_total_score;
input good_bad $ total_score;
cards;
good 600
good 620
bad 520
bad 440
..........
;


title "KS检验"; proc npar1way data=final_total_score plots=edfplot edf ; class good_bad; var total_score; run;

But OP get this KS under SAS/EM ,an GUI windows component in SAS ,like SAS/EG .

Adrián_cyc
Calcite | Level 5

The problem is that this model has been working well during years, this problem has been going on for two months, we had never had this problem. When the model was developed, the WOE and the IV values for each variable seemed correct.

Ksharp
Super User

Is there some variable's IV is greater than 0.5 or 0.6 , If it does ,then your model is NOT trusted ,should drop these high IV variables .

ballardw
Super User

@Adrián_cyc wrote:

The problem is that this model has been working well during years, this problem has been going on for two months, we had never had this problem. When the model was developed, the WOE and the IV values for each variable seemed correct.


Quite often when something has been working reasonably well and then stops you might want to investigate something other than the model code.

 

Did the data collection methods change?

Did any variables change meaning but use the same values?

Did precision of an instrument change?

Did the number of records involved for any sort of grouping variable(s) change? Or change for just some grouping variable values?

 

Does anyone examine the logs of the step that brings the data into SAS? Are there warnings that weren't there before? Data conversion notes?

 

 

It may help to actually post an example of the model code you are using. Someone familiar with the proc may be able to point out places

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 812 views
  • 0 likes
  • 4 in conversation