Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

KS calculation in SAS Miner

Reply
Contributor
Posts: 21

KS calculation in SAS Miner

Hello,

 

I am wondering if any of you have ever tried to replicate the KS output in the scorecard node in Miner? I am not asking about the theretical answer, I know how to calculate KS. But the thing is that when I try to replicate the KS values I get somehting else for the validation data set ( I have a partion node). Yes, I can replicate the TRAIN KS, but not the VALIDATION KS (the VALIDATION KS that I calculate is as in Miner up to the second decimal point, but not exactly the same as in the TRAIN dataset).

 

I am replicating the KS with my own code extracting data in the SAS miner environment, and i have also tried to use the Proc npar1way.

 

Any help is apprecietd! I have auditors replicating my values, and we cannot validate the KS.

Super User
Posts: 19,770

Re: KS calculation in SAS Miner

Posted in reply to rogelio_mancisidor

Given the importance you should consider contacting SAS Tech Support

Contributor
Posts: 21

Re: KS calculation in SAS Miner

Good point! I am contacting them now Smiley Happy

SAS Super FREQ
Posts: 306

Re: KS calculation in SAS Miner

Posted in reply to rogelio_mancisidor

I will look into it, just need you to clarify whether it is the KS in the Fit Statistics table or in the KS table used in the KS Plot?  

Thanks!

Wendy Czika

SAS Enterprise Miner R&D

Contributor
Posts: 21

Re: KS calculation in SAS Miner

Posted in reply to WendyCzika

Thanks for helping Wendy Smiley Happy

 

the KS that we want to replicate is the KS in the Fit Statistics table. Let me explain more carefully what we are doing, and what we already know about the KS calculation in miner.

 

- Roughly speaking, my project contains a partition node (70% train 30% test), then there is a IGN, then a scorecard node, then a RI node, then another IGN, and finally another scorecard node.

 

- in the same diagram, but separate flow, I have a new data set for OOT analysis. the structure is as above but without the RI part. One peculiarity in this flow, is that in the last scorecard node, we freese the scorecard points to match the partial scores calculated in the scorecard node after the RI node in the above flow.

 

Therefore, we have 3 scorecard nodes with their KS that we must replicate. These are 1) KGB, 2) AGB and OOT.

 

1) KGB:

- for this scorecard, we have now managed to replicate exacly the same  KS as in the fit statistics table. The 'thing' is that miner fits the logistic regression based on the train data, and uses those coefficients to calculate KS in both Train and Test. Makes sense. We have replicated the KS by using both the NPAR1WAY funciton and my onw code which calculates KS using queries and data steps.

 

2) AGB

- to calculate this KS is more complicated because we need to take into account the _FREQ_ (or weight) calculated in the RI node. Hence we cannot use NPAR1WAY, so we need to use my own code which takes into account the _FREQ_ column. As I mentioned before, my own code replicates exactly the same KS for the KGB model, so it is a good indication that it is bug free. However when I try to replicate the KS for the AGB model, the KS is equal to the one in the Fit statistics table only until the second decimal point.  

 

3) OOT

- Here we have test two different approaches: 1) to use the coefficients from the AGB model and based on them calculate the KS in the OOT (we think this the correct to do it in OOT analysis) and 2) to fit a new logistic regression with the OOT sample and use this new parameters to calcuate the KS. None of these approaches matches the KS in the fit statistics table. However, approach 2) has equal KS up to the second decimal point.  For this test we are using both the NPAR1WAY function and my own code.

 

thanks for helping us with this 'little' issue. Regulators are so stric that they must be able to replicate exactly the same values.

Ask a Question
Discussion stats
  • 4 replies
  • 594 views
  • 0 likes
  • 3 in conversation