Hello everyone,
Currently, I try to analyze whether the two data sets(one of them is Model data set) are consistent with each another or not. In accordance with this purpose, firstly, I try to perform PSI(Population Stability Index), SSI(Stability Statistic Index) and Default Rate analysis. As is known, to understand this efficiently, we should examine the GINI value, however, this two datasets’s model variables similar to each other in the ratio of %69.
Lets call these data sets being populations and give more detail,
I have two populations, one of them is “A” population(model data set) and the other one is “B” population. I have a scoring code for “A” population and “B” population has only 69 percent of model variables of A population’s model variables. I tried to perform “A” population scoring code over the “B” population then I perform Logistic Regression on results in Enterprise Guide. Even though, whole analyses give inconsistent results such as PSI, SSI and Default Rate, the result of the GINI(Sommers ‘D) comes 0.800 and c(ROC) comes 0.900.
Here are my some questions about this case,
How the other analyses results come inconsistent for these population,even the GINI and ROC come so high? How is it possible?
Is it right to perform Model data set's(Population A) scoring code over the new data set(Population B) to uderstand the consistency between these data sets and learn the GINI value?
What can be the other methods to reach my aim, how can I make my decision to understand whether the datasets are consistent with each other or not?
Are there any other ways to find GINI values or are there any values to check whether the data sets are consistent or not?
Things I have
A population B population
Model Data Set New Data Set
Scoring Code No Scoring Code
Model Variables %69 of A population Model Variables
Thank you,
... View more