determine the relative contribution of predictors in multiple regressi...

Ronein · Posted 03-04-2022 09:33 AM

Hello

Let's say that I run Logistic regression and then I get the model results (Model coeficients).

I would like to know how to calculate the relative contribution of each predictor (in %)?

data Train_Data;
call streaminit(614325);
do trt=1 to 3;
do rep=1 to ceil(rand('uniform')*35);
x1=rand('uniform')*10;
x2=rand('normal')*2;
x3=rand('normal')*3;
e=rand('normal');
y=2 + trt + x1 + 0*x2 + 1.4*x3 + e;
output;
end;
end;
run;


/***Build the Regression model and get model coeficneints**/
proc glmselect data=Train_Data;
class trt;
model y=trt x1 x2 x3 / selection=stepwise;
store out=OurModel;
run;

PaigeMiller · Posted 03-04-2022 09:37 AM

@Ronein wrote:

Hello

Let's say that I run Logistic regression and then I get the model results (Model coeficients).

I would like to know how to calculate the relative contribution of predictor (in %)?
data Train_Data;
call streaminit(614325);
do trt=1 to 3;
do rep=1 to ceil(rand('uniform')*35);
x1=rand('uniform')*10;
x2=rand('normal')*2;
x3=rand('normal')*3;
e=rand('normal');
y=2 + trt + x1 + 0*x2 + 1.4*x3 + e;
output;
end;
end;
run;


/***Build the Regression model and get model coeficneints**/
proc glmselect data=Train_Data;
class trt;
model y=trt x1 x2 x3 / selection=stepwise;
store out=OurModel;
run;

Because of multicollinearity, there is really no such thing as a "relative contribution of predictor (in %)". It only makes sense in the case where your predictors are uncorrelated, and that only happens in designed experiments.

You can standardize the coefficients and compare them to one another, to determine which has the biggest impact and which has the smallest impact, and so on.

--
Paige Miller

Ronein · Posted 03-04-2022 09:40 AM

Thank you.

In credit risk models it is common to report the contribution of predictors (in %).

May you please show how to do it?

PaigeMiller · Posted 03-04-2022 09:41 AM

No such calculation exists. And if you could program it, it would still be meaningless.

--
Paige Miller

Reeza · Posted 03-04-2022 01:09 PM

Is there a term for the statistics that you're looking for?

Shapley values are one method, but this is typically done in ML/AI.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/casactml/casactml_explainmodel_details28.htm

https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainab...

@Ronein wrote:

Thank you.

In credit risk models it is common to report the contribution of predictors (in %).

May you please show how to do it?

Ksharp · Posted 03-06-2022 04:44 AM

If you are talking about credit risk score-card model . it is called IV .

Generally it would pick up the variables which IV is greater than 0.1 .

Here is the code how to calculated IV for a variable, but you need make a GROUP variable for this variable whether it is character or numeric firstly.

%let var=marital   ;


title "变量: &var";
proc sql;
create table woe_&var as
 select &var as group,
sum(good_bad='bad') as n_bad label='bad的个数',sum(good_bad='good') as n_good label='good的个数',
sum(good_bad='bad')/(select sum(good_bad='bad') from have ) as bad_dist  format=percent7.2 label='bad的占比',
sum(good_bad='good')/(select sum(good_bad='good') from have ) as good_dist  format=percent7.2 label='good的占比',
log(calculated Bad_Dist/calculated Good_Dist) as woe
from have
   group by &var
    order by woe;


select *,sum(  (Bad_Dist-Good_Dist)*woe  ) as iv
 from woe_&var ;

quit;
title ' ';

PaigeMiller · Posted 03-06-2022 05:56 AM

Again, I agree with you @Ksharp, that would be another way to determine (and rank) which variables are important. But @Ronein was specifically talking about "relative contribution of each predictor (in %)" in the context of stepwise regression; and no method so far ranks the predictors in % (not even sure what he means by that) but as far as I know, there is no such thing as "relative contribution of each predictor (in %)" in the context of stepwise regression.

--
Paige Miller

Ksharp · Posted 03-07-2022 07:11 AM

"(not even sure what he means by that)"
Yeah. Agree with that.
In the credit risk field , the most used model is Score Card ( a.k.a proc logistic ) . Therefore I assume OP is talking about IV , WOE .....

Ksharp · Posted 03-05-2022 05:42 AM

1) Try Partial Least Square Model .

data Train_Data;
call streaminit(614325);
do trt=1 to 3;
do rep=1 to ceil(rand('uniform')*35);
x1=rand('uniform')*10;
x2=rand('normal')*2;
x3=rand('normal')*3;
e=rand('normal');
y=2 + trt + x1 + 0*x2 + 1.4*x3 + e;
output;
end;
end;
run;


/***Build the Regression model and get model coeficneints**/
proc pls data=Train_Data  missing=em   nfac=2 plot=(ParmProfiles VIP) details;
class trt;
model y=trt x1 x2 x3 ;
run;

2) you can use estimated coefficient as weight to evaluate the importance of variables . the bigger of abs(beta) ,the more important the variable is.

PaigeMiller · Posted 03-05-2022 07:18 AM

@Ksharp wrote:

1) Try Partial Least Square Model .
data Train_Data;
call streaminit(614325);
do trt=1 to 3;
do rep=1 to ceil(rand('uniform')*35);
x1=rand('uniform')*10;
x2=rand('normal')*2;
x3=rand('normal')*3;
e=rand('normal');
y=2 + trt + x1 + 0*x2 + 1.4*x3 + e;
output;
end;
end;
run;


/***Build the Regression model and get model coeficneints**/
proc pls data=Train_Data  missing=em   nfac=2 plot=(ParmProfiles VIP) details;
class trt;
model y=trt x1 x2 x3 ;
run;
2) you can use estimated coefficient as weight to evaluate the importance of variables . the bigger of abs(beta) ,the more important the variable is.

Yes, you can certainly get measures of how important a variable is in a regression. The above method is fine. I also mentioned above that you can standardize the coefficients to get a (different) measure of importance.

None of this relates to the original request of "how to calculate the relative contribution of each predictor (in %)" as in the context of regression using correlated x-variables, this request is meaningless. There is no such thing as relative contribution of each predictor in percent in this case.

--
Paige Miller

Haim · Posted 09-16-2024 02:52 AM

my thoughts -

1. decide what is your target index (how good is your model) - i tend to use the gini (or sommer'sD)

2. re estimate the model n times (n is the number of variables in the model), every time eliminate one var, the diference in Gini between the original model and the new one (var i eliminated) is the contrebution of var i

3. to make the resaults more "nice" you can calibrate them so that the sum will be 1 (100%)

its not exactly 'relative contribution' but i think its as close as possiable 🙂

Ksharp · Posted 09-16-2024 03:21 AM

For the sake of conventient , you also could try PROC GLMSELECT or PROC HPGENSELECT .

determine the relative contribution of predictors in multiple regression model

Re: determine the relative contribution of predictors in multiple regression model

Re: determine the relative contribution of predictors in multiple regression model

Re: determine the relative contribution of predictors in multiple regression model

Re: determine the relative contribution of predictors in multiple regression model

Re: determine the relative contribution of predictors in multiple regression model

Re: determine the relative contribution of predictors in multiple regression model

Re: determine the relative contribution of predictors in multiple regression model

Re: determine the relative contribution of predictors in multiple regression model

Re: determine the relative contribution of predictors in multiple regression model

Re: determine the relative contribution of predictors in multiple regression model

Re: determine the relative contribution of predictors in multiple regression model

SAS Innovate 2025: Call for Content

Classroom Training Available!