turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- KS calculation in SAS Miner

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-07-2016 03:31 PM

Hello,

I am wondering if any of you have ever tried to replicate the KS output in the scorecard node in Miner? I am not asking about the theretical answer, I know how to calculate KS. But the thing is that when I try to replicate the KS values I get somehting else for the validation data set ( I have a partion node). Yes, I can replicate the TRAIN KS, but not the VALIDATION KS (the VALIDATION KS that I calculate is as in Miner up to the second decimal point, but not exactly the same as in the TRAIN dataset).

I am replicating the KS with my own code extracting data in the SAS miner environment, and i have also tried to use the **Proc** **npar1way. **

**Any help is apprecietd! I have auditors replicating my values, and we cannot validate the KS.**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-07-2016 06:46 PM

Given the importance you should consider contacting SAS Tech Support

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-08-2016 02:13 AM

Good point! I am contacting them now

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-08-2016 03:52 PM

I will look into it, just need you to clarify whether it is the KS in the Fit Statistics table or in the KS table used in the KS Plot?

Thanks!

Wendy Czika

SAS Enterprise Miner R&D

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-09-2016 02:44 AM

Thanks for helping Wendy

the KS that we want to replicate is the KS in the Fit Statistics table. Let me explain more carefully what we are doing, and what we already know about the KS calculation in miner.

- Roughly speaking, my project contains a partition node (70% train 30% test), then there is a IGN, then a scorecard node, then a RI node, then another IGN, and finally another scorecard node.

- in the same diagram, but separate flow, I have a new data set for OOT analysis. the structure is as above but without the RI part. One peculiarity in this flow, is that in the last scorecard node, we freese the scorecard points to match the partial scores calculated in the scorecard node after the RI node in the above flow.

Therefore, we have 3 scorecard nodes with their KS that we must replicate. These are 1) KGB, 2) AGB and OOT.

1) KGB:

- for this scorecard, we have now managed to replicate exacly the same KS as in the fit statistics table. The 'thing' is that miner fits the logistic regression based on the train data, and uses those coefficients to calculate KS in both Train and Test. Makes sense. We have replicated the KS by using both the NPAR1WAY funciton and my onw code which calculates KS using queries and data steps.

2) AGB

- to calculate this KS is more complicated because we need to take into account the _FREQ_ (or weight) calculated in the RI node. Hence we cannot use NPAR1WAY, so we need to use my own code which takes into account the _FREQ_ column. As I mentioned before, my own code replicates exactly the same KS for the KGB model, so it is a good indication that it is bug free. However when I try to replicate the KS for the AGB model, the KS is equal to the one in the Fit statistics table only until the second decimal point.

3) OOT

- Here we have test two different approaches: 1) to use the coefficients from the AGB model and based on them calculate the KS in the OOT (we think this the correct to do it in OOT analysis) and 2) to fit a new logistic regression with the OOT sample and use this new parameters to calcuate the KS. None of these approaches matches the KS in the fit statistics table. However, approach 2) has equal KS up to the second decimal point. For this test we are using both the NPAR1WAY function and my own code.

thanks for helping us with this 'little' issue. Regulators are so stric that they must be able to replicate exactly the same values.