06-23-2015 02:37 PM
There is a whitepaper for selecting important variables in a linear regression model. The URL of the whitepaper is http://support.sas.com/resources/papers/proceedings15/3242-2015.pdf .
It explains gini coefficient can be used to check linearity in the model. And we can also rank variable based on their GINI coefficient. A higher Gini coefficient suggests a higher potential for the variable to be useful in a linear regression. If a numeric variable is high on IV Rank but low on Gini coefficient , it usually suggests a lack of linearity.
My Question - Is it the gini coefficient derived from decision tree? Or it is related to Area under Curve (AUC) -- (Gini = 2*AUC- 1)? What is the exact calculation of this Gini Coefficient and how it can be used to check linearity? I googled a lot. What i got is it is used in economics theory to check inequality.
Any help would be highly appreciated. Thanks!
06-24-2015 02:31 PM
Thanks Xia. The Gini that PROC UNIVARIATE produces is a measure of statistical dispersion. Correct me if i am wrong? A low Gini coefficient indicates a more equal distribution, with 0 corresponding to complete equality. How it can be used to check linearity? How it can be used in modeling process to select important linear variables?
06-26-2015 01:35 PM
The Gini coefficient or Somers' D statistic gives a measure of concordance in logistic models. It is a rank based statistic, where all results are paired (all observed with all predicted). In linear regression, it is a transformation of the Pearson correlation coefficient.
Here is a nice paper that covers a lot of what is buried in the SGF paper.
06-26-2015 04:55 PM
Thanks a ton Steve for your answer. I know Somer's D and Gini Coefficient. Gini Coefficient = 2 (AUC -1) and AUC = %Concordance + 0.5 (Tied Pairs). It would be great if you share an article of "In linear regression, it is a transformation of the Pearson correlation coefficient.". I am more intersted about application of Gini Coefficient in linear regression. I did not find a single article to support it.
Am i correct? -
In logistic regression, if gini coefficient is high, logit function is monotonically related to independent variable?
In linear regression, if gini coefficient is high, y is linearly related to independent variable?
07-13-2016 12:00 PM