Hello
Let's say that I run Logistic regression and then I get the model results (Model coeficients).
I would like to know how to calculate the relative contribution of each predictor (in %)?
data Train_Data;
call streaminit(614325);
do trt=1 to 3;
do rep=1 to ceil(rand('uniform')*35);
x1=rand('uniform')*10;
x2=rand('normal')*2;
x3=rand('normal')*3;
e=rand('normal');
y=2 + trt + x1 + 0*x2 + 1.4*x3 + e;
output;
end;
end;
run;
/***Build the Regression model and get model coeficneints**/
proc glmselect data=Train_Data;
class trt;
model y=trt x1 x2 x3 / selection=stepwise;
store out=OurModel;
run;
@Ronein wrote:
Hello
Let's say that I run Logistic regression and then I get the model results (Model coeficients).
I would like to know how to calculate the relative contribution of predictor (in %)?
data Train_Data; call streaminit(614325); do trt=1 to 3; do rep=1 to ceil(rand('uniform')*35); x1=rand('uniform')*10; x2=rand('normal')*2; x3=rand('normal')*3; e=rand('normal'); y=2 + trt + x1 + 0*x2 + 1.4*x3 + e; output; end; end; run; /***Build the Regression model and get model coeficneints**/ proc glmselect data=Train_Data; class trt; model y=trt x1 x2 x3 / selection=stepwise; store out=OurModel; run;
Because of multicollinearity, there is really no such thing as a "relative contribution of predictor (in %)". It only makes sense in the case where your predictors are uncorrelated, and that only happens in designed experiments.
You can standardize the coefficients and compare them to one another, to determine which has the biggest impact and which has the smallest impact, and so on.
Thank you.
In credit risk models it is common to report the contribution of predictors (in %).
May you please show how to do it?
No such calculation exists. And if you could program it, it would still be meaningless.
Is there a term for the statistics that you're looking for?
Shapley values are one method, but this is typically done in ML/AI.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/casactml/casactml_explainmodel_details28.htm
@Ronein wrote:
Thank you.
In credit risk models it is common to report the contribution of predictors (in %).
May you please show how to do it?
If you are talking about credit risk score-card model . it is called IV .
Generally it would pick up the variables which IV is greater than 0.1 .
Here is the code how to calculated IV for a variable, but you need make a GROUP variable for this variable whether it is character or numeric firstly.
%let var=marital ; title "变量: &var"; proc sql; create table woe_&var as select &var as group, sum(good_bad='bad') as n_bad label='bad的个数',sum(good_bad='good') as n_good label='good的个数', sum(good_bad='bad')/(select sum(good_bad='bad') from have ) as bad_dist format=percent7.2 label='bad的占比', sum(good_bad='good')/(select sum(good_bad='good') from have ) as good_dist format=percent7.2 label='good的占比', log(calculated Bad_Dist/calculated Good_Dist) as woe from have group by &var order by woe; select *,sum( (Bad_Dist-Good_Dist)*woe ) as iv from woe_&var ; quit; title ' ';
Again, I agree with you @Ksharp, that would be another way to determine (and rank) which variables are important. But @Ronein was specifically talking about "relative contribution of each predictor (in %)" in the context of stepwise regression; and no method so far ranks the predictors in % (not even sure what he means by that) but as far as I know, there is no such thing as "relative contribution of each predictor (in %)" in the context of stepwise regression.
1) Try Partial Least Square Model .
data Train_Data;
call streaminit(614325);
do trt=1 to 3;
do rep=1 to ceil(rand('uniform')*35);
x1=rand('uniform')*10;
x2=rand('normal')*2;
x3=rand('normal')*3;
e=rand('normal');
y=2 + trt + x1 + 0*x2 + 1.4*x3 + e;
output;
end;
end;
run;
/***Build the Regression model and get model coeficneints**/
proc pls data=Train_Data missing=em nfac=2 plot=(ParmProfiles VIP) details;
class trt;
model y=trt x1 x2 x3 ;
run;
2) you can use estimated coefficient as weight to evaluate the importance of variables . the bigger of abs(beta) ,the more important the variable is.
@Ksharp wrote:
1) Try Partial Least Square Model .
data Train_Data; call streaminit(614325); do trt=1 to 3; do rep=1 to ceil(rand('uniform')*35); x1=rand('uniform')*10; x2=rand('normal')*2; x3=rand('normal')*3; e=rand('normal'); y=2 + trt + x1 + 0*x2 + 1.4*x3 + e; output; end; end; run; /***Build the Regression model and get model coeficneints**/ proc pls data=Train_Data missing=em nfac=2 plot=(ParmProfiles VIP) details; class trt; model y=trt x1 x2 x3 ; run;
2) you can use estimated coefficient as weight to evaluate the importance of variables . the bigger of abs(beta) ,the more important the variable is.
Yes, you can certainly get measures of how important a variable is in a regression. The above method is fine. I also mentioned above that you can standardize the coefficients to get a (different) measure of importance.
None of this relates to the original request of "how to calculate the relative contribution of each predictor (in %)" as in the context of regression using correlated x-variables, this request is meaningless. There is no such thing as relative contribution of each predictor in percent in this case.
my thoughts -
1. decide what is your target index (how good is your model) - i tend to use the gini (or sommer'sD)
2. re estimate the model n times (n is the number of variables in the model), every time eliminate one var, the diference in Gini between the original model and the new one (var i eliminated) is the contrebution of var i
3. to make the resaults more "nice" you can calibrate them so that the sum will be 1 (100%)
its not exactly 'relative contribution' but i think its as close as possiable 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.