BookmarkSubscribeRSS Feed
ddddddddd1223
Calcite | Level 5

Hi everyone,

 

I got stuck on a problem in understanding the procedure of Interactive Binning node on SAS Enterprise Miner.

 

As far as I know, Gini index is a measure of impurity, so the lower the value, the better. When I read about Gini cutoff on the Miner Client, I was surprised that the software rejects variable with a Gini Statistic lower than the chosen cutoff (I expected it would have refused the bigger ones). I checked the syntax, and eventually the definition of Gini Statistics is different from the one that I know, e.g. for computation in Decision Tree training. According to this definition, you use also the product between the number of events of the group that you are considering and the numbers of non-events of the previous groups. I can't see this product in the common definition of Gini index. Finally, this formula is actually close to one if the target variable has the same value in each group (while in the classic one, if I am not wrong, in the same condition the index is close to 0).

 

I cannot find any reference about this expression of Gini Index. Can anyone help me?

 

Thanks in advance. 

3 REPLIES 3
ddddddddd1223
Calcite | Level 5

Yes, I saw it, but I can find just the formula there, which doesn't seems to me the standard one (for example, the formula described in Wikipedia page). That's why I was looking for some references about the statistical method. I suppose this binning algorithm is based on some papers.

Dan19
Calcite | Level 5

Hello,
I have the same question: the Gini test provided is not the standard one. So, is there any reference to how that formula was derived, any article?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2128 views
  • 0 likes
  • 3 in conversation