BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ujjawal
Quartz | Level 8

There is a whitepaper for selecting important variables in a linear regression model. The URL of the whitepaper is http://support.sas.com/resources/papers/proceedings15/3242-2015.pdf .

It explains gini coefficient can be used to check linearity in the model. And we can also rank variable based on their GINI coefficient. A higher Gini coefficient suggests a higher potential for the variable to be useful in a linear regression. If a numeric variable is high on IV Rank but low on Gini coefficient , it usually suggests a lack of linearity.

My Question - Is it the gini coefficient derived from decision tree? Or it is related to Area under Curve (AUC) -- (Gini = 2*AUC- 1)? What is the exact calculation of this Gini Coefficient and how it can be used to check linearity? I googled a lot. What i got is it is used in economics theory to check inequality.

Any help would be highly appreciated. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

The Gini coefficient or Somers' D statistic gives a measure of concordance in logistic models.  It is a rank based statistic, where all results are paired (all observed with all predicted). In linear regression, it is a transformation of the Pearson correlation coefficient.

Here is a nice paper that covers a lot of what is buried in the SGF paper.

http://www.imperial.ac.uk/nhli/r.newson/miscdocs/intsomd1.pdf

Steve Denham.

View solution in original post

8 REPLIES 8
Ksharp
Super User

Check proc univariate who can calculate GINI .

Ujjawal
Quartz | Level 8

Thanks Xia. The Gini that PROC UNIVARIATE produces is a measure of statistical dispersion. Correct me if i am wrong? A low Gini coefficient indicates a more equal distribution, with 0 corresponding to complete equality. How it can be used to check linearity? How it can be used in modeling process to select important linear variables?

Ksharp
Super User

Sorry. I will leave it to Steve  or lvm .

Ujjawal
Quartz | Level 8

Thanks Xia for looking into it.

SteveDenham
Jade | Level 19

The Gini coefficient or Somers' D statistic gives a measure of concordance in logistic models.  It is a rank based statistic, where all results are paired (all observed with all predicted). In linear regression, it is a transformation of the Pearson correlation coefficient.

Here is a nice paper that covers a lot of what is buried in the SGF paper.

http://www.imperial.ac.uk/nhli/r.newson/miscdocs/intsomd1.pdf

Steve Denham.

Ujjawal
Quartz | Level 8

Thanks a ton Steve for your answer. I know Somer's D and Gini Coefficient. Gini Coefficient = 2 (AUC -1) and AUC = %Concordance + 0.5 (Tied Pairs). It would be great if you share an article of   "In linear regression, it is a transformation of the Pearson correlation coefficient.". I am more intersted about application of Gini Coefficient in linear regression. I did not find a single article to support it.

Am i correct? -

In logistic regression, if gini coefficient is high, logit function is monotonically related to independent variable?

In linear regression, if gini coefficient is high, y is linearly related to independent variable?

SteveDenham
Jade | Level 19

Regarding correctness of interpretation, that is the way I would interpret it.

The quote is from the Imperial College paper I linked to.

Steve Denham

Srikanthg
Calcite | Level 5
What Is the best GINI cut off value for selecting the significant variable. I do understand that if gini value is high, it's a good variable in separating goods and bads.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 30742 views
  • 0 likes
  • 4 in conversation