03-07-2016 03:31 PM
I am wondering if any of you have ever tried to replicate the KS output in the scorecard node in Miner? I am not asking about the theretical answer, I know how to calculate KS. But the thing is that when I try to replicate the KS values I get somehting else for the validation data set ( I have a partion node). Yes, I can replicate the TRAIN KS, but not the VALIDATION KS (the VALIDATION KS that I calculate is as in Miner up to the second decimal point, but not exactly the same as in the TRAIN dataset).
I am replicating the KS with my own code extracting data in the SAS miner environment, and i have also tried to use the Proc npar1way.
Any help is apprecietd! I have auditors replicating my values, and we cannot validate the KS.
03-08-2016 03:52 PM
I will look into it, just need you to clarify whether it is the KS in the Fit Statistics table or in the KS table used in the KS Plot?
SAS Enterprise Miner R&D
03-09-2016 02:44 AM
Thanks for helping Wendy
the KS that we want to replicate is the KS in the Fit Statistics table. Let me explain more carefully what we are doing, and what we already know about the KS calculation in miner.
- Roughly speaking, my project contains a partition node (70% train 30% test), then there is a IGN, then a scorecard node, then a RI node, then another IGN, and finally another scorecard node.
- in the same diagram, but separate flow, I have a new data set for OOT analysis. the structure is as above but without the RI part. One peculiarity in this flow, is that in the last scorecard node, we freese the scorecard points to match the partial scores calculated in the scorecard node after the RI node in the above flow.
Therefore, we have 3 scorecard nodes with their KS that we must replicate. These are 1) KGB, 2) AGB and OOT.
- for this scorecard, we have now managed to replicate exacly the same KS as in the fit statistics table. The 'thing' is that miner fits the logistic regression based on the train data, and uses those coefficients to calculate KS in both Train and Test. Makes sense. We have replicated the KS by using both the NPAR1WAY funciton and my onw code which calculates KS using queries and data steps.
- to calculate this KS is more complicated because we need to take into account the _FREQ_ (or weight) calculated in the RI node. Hence we cannot use NPAR1WAY, so we need to use my own code which takes into account the _FREQ_ column. As I mentioned before, my own code replicates exactly the same KS for the KGB model, so it is a good indication that it is bug free. However when I try to replicate the KS for the AGB model, the KS is equal to the one in the Fit statistics table only until the second decimal point.
- Here we have test two different approaches: 1) to use the coefficients from the AGB model and based on them calculate the KS in the OOT (we think this the correct to do it in OOT analysis) and 2) to fit a new logistic regression with the OOT sample and use this new parameters to calcuate the KS. None of these approaches matches the KS in the fit statistics table. However, approach 2) has equal KS up to the second decimal point. For this test we are using both the NPAR1WAY function and my own code.
thanks for helping us with this 'little' issue. Regulators are so stric that they must be able to replicate exactly the same values.