BookmarkSubscribeRSS Feed
rogelio_mancisidor
Calcite | Level 5

Hello,

 

I am wondering if any of you have ever tried to replicate the KS output in the scorecard node in Miner? I am not asking about the theretical answer, I know how to calculate KS. But the thing is that when I try to replicate the KS values I get somehting else for the validation data set ( I have a partion node). Yes, I can replicate the TRAIN KS, but not the VALIDATION KS (the VALIDATION KS that I calculate is as in Miner up to the second decimal point, but not exactly the same as in the TRAIN dataset).

 

I am replicating the KS with my own code extracting data in the SAS miner environment, and i have also tried to use the Proc npar1way.

 

Any help is apprecietd! I have auditors replicating my values, and we cannot validate the KS.

4 REPLIES 4
Reeza
Super User

Given the importance you should consider contacting SAS Tech Support

rogelio_mancisidor
Calcite | Level 5

Good point! I am contacting them now 🙂

WendyCzika
SAS Employee

I will look into it, just need you to clarify whether it is the KS in the Fit Statistics table or in the KS table used in the KS Plot?  

Thanks!

Wendy Czika

SAS Enterprise Miner R&D

rogelio_mancisidor
Calcite | Level 5

Thanks for helping Wendy 🙂

 

the KS that we want to replicate is the KS in the Fit Statistics table. Let me explain more carefully what we are doing, and what we already know about the KS calculation in miner.

 

- Roughly speaking, my project contains a partition node (70% train 30% test), then there is a IGN, then a scorecard node, then a RI node, then another IGN, and finally another scorecard node.

 

- in the same diagram, but separate flow, I have a new data set for OOT analysis. the structure is as above but without the RI part. One peculiarity in this flow, is that in the last scorecard node, we freese the scorecard points to match the partial scores calculated in the scorecard node after the RI node in the above flow.

 

Therefore, we have 3 scorecard nodes with their KS that we must replicate. These are 1) KGB, 2) AGB and OOT.

 

1) KGB:

- for this scorecard, we have now managed to replicate exacly the same  KS as in the fit statistics table. The 'thing' is that miner fits the logistic regression based on the train data, and uses those coefficients to calculate KS in both Train and Test. Makes sense. We have replicated the KS by using both the NPAR1WAY funciton and my onw code which calculates KS using queries and data steps.

 

2) AGB

- to calculate this KS is more complicated because we need to take into account the _FREQ_ (or weight) calculated in the RI node. Hence we cannot use NPAR1WAY, so we need to use my own code which takes into account the _FREQ_ column. As I mentioned before, my own code replicates exactly the same KS for the KGB model, so it is a good indication that it is bug free. However when I try to replicate the KS for the AGB model, the KS is equal to the one in the Fit statistics table only until the second decimal point.  

 

3) OOT

- Here we have test two different approaches: 1) to use the coefficients from the AGB model and based on them calculate the KS in the OOT (we think this the correct to do it in OOT analysis) and 2) to fit a new logistic regression with the OOT sample and use this new parameters to calcuate the KS. None of these approaches matches the KS in the fit statistics table. However, approach 2) has equal KS up to the second decimal point.  For this test we are using both the NPAR1WAY function and my own code.

 

thanks for helping us with this 'little' issue. Regulators are so stric that they must be able to replicate exactly the same values.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2680 views
  • 0 likes
  • 3 in conversation