Programming the statistical procedures from SAS

Gini Coefficient - Variable Importance Measure

Reply
Regular Contributor
Posts: 181

Gini Coefficient - Variable Importance Measure

There is a whitepaper for selecting important variables in a linear regression model. The URL of the whitepaper is http://support.sas.com/resources/papers/proceedings15/3242-2015.pdf .

It explains gini coefficient can be used to check linearity in the model. And we can also rank variable based on their GINI coefficient. A higher Gini coefficient suggests a higher potential for the variable to be useful in a linear regression. If a numeric variable is high on IV Rank but low on Gini coefficient , it usually suggests a lack of linearity.

My Question - Is it the gini coefficient derived from decision tree? Or it is related to Area under Curve (AUC) -- (Gini = 2*AUC- 1)? What is the exact calculation of this Gini Coefficient and how it can be used to check linearity? I googled a lot. What i got is it is used in economics theory to check inequality.

Any help would be highly appreciated. Thanks!

Grand Advisor
Posts: 9,447

Re: Gini Coefficient - Variable Importance Measure

Check proc univariate who can calculate GINI .

Regular Contributor
Posts: 181

Re: Gini Coefficient - Variable Importance Measure

Thanks Xia. The Gini that PROC UNIVARIATE produces is a measure of statistical dispersion. Correct me if i am wrong? A low Gini coefficient indicates a more equal distribution, with 0 corresponding to complete equality. How it can be used to check linearity? How it can be used in modeling process to select important linear variables?

Grand Advisor
Posts: 9,447

Re: Gini Coefficient - Variable Importance Measure

Sorry. I will leave it to Steve  or lvm .

Regular Contributor
Posts: 181

Re: Gini Coefficient - Variable Importance Measure

Thanks Xia for looking into it.

Respected Advisor
Posts: 2,655

Re: Gini Coefficient - Variable Importance Measure

The Gini coefficient or Somers' D statistic gives a measure of concordance in logistic models.  It is a rank based statistic, where all results are paired (all observed with all predicted). In linear regression, it is a transformation of the Pearson correlation coefficient.

Here is a nice paper that covers a lot of what is buried in the SGF paper.

http://www.imperial.ac.uk/nhli/r.newson/miscdocs/intsomd1.pdf

Steve Denham.

Regular Contributor
Posts: 181

Re: Gini Coefficient - Variable Importance Measure

Thanks a ton Steve for your answer. I know Somer's D and Gini Coefficient. Gini Coefficient = 2 (AUC -1) and AUC = %Concordance + 0.5 (Tied Pairs). It would be great if you share an article of   "In linear regression, it is a transformation of the Pearson correlation coefficient.". I am more intersted about application of Gini Coefficient in linear regression. I did not find a single article to support it.

Am i correct? -

In logistic regression, if gini coefficient is high, logit function is monotonically related to independent variable?

In linear regression, if gini coefficient is high, y is linearly related to independent variable?

Respected Advisor
Posts: 2,655

Re: Gini Coefficient - Variable Importance Measure

Regarding correctness of interpretation, that is the way I would interpret it.

The quote is from the Imperial College paper I linked to.

Steve Denham

User
Posts: 1

Re: Gini Coefficient - Variable Importance Measure

What Is the best GINI cut off value for selecting the significant variable. I do understand that if gini value is high, it's a good variable in separating goods and bads.
Ask a Question
Discussion stats
  • 8 replies
  • 6954 views
  • 0 likes
  • 4 in conversation