turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Gini Coefficient - Variable Importance Measure

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2015 02:37 PM

There is a whitepaper for selecting important variables in a linear regression model. The URL of the whitepaper is http://support.sas.com/resources/papers/proceedings15/3242-2015.pdf .

It explains gini coefficient can be used to check linearity in the model. And we can also rank variable based on their GINI coefficient. A higher Gini coefficient suggests a higher potential for the variable to be useful in a linear regression. If a numeric variable is high on IV Rank but low on Gini coefficient , it usually suggests a lack of linearity.

My Question - Is it the gini coefficient derived from decision tree? Or it is related to Area under Curve (AUC) -- (Gini = 2*AUC- 1)? What is the exact calculation of this Gini Coefficient and how it can be used to check linearity? I googled a lot. What i got is it is used in economics theory to check inequality.

Any help would be highly appreciated. Thanks!

Accepted Solutions

Solution

07-05-2017
07:31 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

06-26-2015 01:35 PM

The Gini coefficient or Somers' D statistic gives a measure of concordance in logistic models. It is a rank based statistic, where all results are paired (all observed with all predicted). In linear regression, it is a transformation of the Pearson correlation coefficient.

Here is a nice paper that covers a lot of what is buried in the SGF paper.

http://www.imperial.ac.uk/nhli/r.newson/miscdocs/intsomd1.pdf

Steve Denham.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

06-24-2015 09:14 AM

Check proc univariate who can calculate GINI .

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

06-24-2015 02:31 PM

Thanks Xia. The Gini that PROC UNIVARIATE produces is a measure of statistical dispersion. Correct me if i am wrong? A low Gini coefficient indicates a more equal distribution, with 0 corresponding to complete equality. How it can be used to check linearity? How it can be used in modeling process to select important linear variables?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

06-25-2015 07:46 AM

Sorry. I will leave it to Steve or lvm .

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

06-25-2015 05:01 PM

Thanks Xia for looking into it.

Solution

07-05-2017
07:31 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

06-26-2015 01:35 PM

The Gini coefficient or Somers' D statistic gives a measure of concordance in logistic models. It is a rank based statistic, where all results are paired (all observed with all predicted). In linear regression, it is a transformation of the Pearson correlation coefficient.

Here is a nice paper that covers a lot of what is buried in the SGF paper.

http://www.imperial.ac.uk/nhli/r.newson/miscdocs/intsomd1.pdf

Steve Denham.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

06-26-2015 04:55 PM

Thanks a ton Steve for your answer. I know Somer's D and Gini Coefficient. Gini Coefficient = 2 (AUC -1) and AUC = %Concordance + 0.5 (Tied Pairs). It would be great if you share an article of "In linear regression, it is a transformation of the Pearson correlation coefficient.". I am more intersted about application of Gini Coefficient in linear regression. I did not find a single article to support it.

Am i correct? -

In logistic regression, if gini coefficient is high, logit function is monotonically related to independent variable?

In linear regression, if gini coefficient is high, y is linearly related to independent variable?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

06-29-2015 08:51 AM

Regarding correctness of interpretation, that is the way I would interpret it.

The quote is from the Imperial College paper I linked to.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

07-13-2016 12:00 PM

What Is the best GINI cut off value for selecting the significant variable. I do understand that if gini value is high, it's a good variable in separating goods and bads.