BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello

Let's say that I run Logistic regression and then I get the model results (Model coeficients).

I would like to know how to calculate the relative contribution of each predictor (in %)?

data Train_Data;
call streaminit(614325);
do trt=1 to 3;
do rep=1 to ceil(rand('uniform')*35);
x1=rand('uniform')*10;
x2=rand('normal')*2;
x3=rand('normal')*3;
e=rand('normal');
y=2 + trt + x1 + 0*x2 + 1.4*x3 + e;
output;
end;
end;
run;


/***Build the Regression model and get model coeficneints**/
proc glmselect data=Train_Data;
class trt;
model y=trt x1 x2 x3 / selection=stepwise;
store out=OurModel;
run;

 

11 REPLIES 11
PaigeMiller
Diamond | Level 26

@Ronein wrote:

Hello

Let's say that I run Logistic regression and then I get the model results (Model coeficients).

I would like to know how to calculate the relative contribution of predictor (in %)?

data Train_Data;
call streaminit(614325);
do trt=1 to 3;
do rep=1 to ceil(rand('uniform')*35);
x1=rand('uniform')*10;
x2=rand('normal')*2;
x3=rand('normal')*3;
e=rand('normal');
y=2 + trt + x1 + 0*x2 + 1.4*x3 + e;
output;
end;
end;
run;


/***Build the Regression model and get model coeficneints**/
proc glmselect data=Train_Data;
class trt;
model y=trt x1 x2 x3 / selection=stepwise;
store out=OurModel;
run;

Because of multicollinearity, there is really no such thing as a "relative contribution of predictor (in %)". It only makes sense in the case where your predictors are uncorrelated, and that only happens in designed experiments.

 

You can standardize the coefficients and compare them to one another, to determine which has the biggest impact and which has the smallest impact, and so on.

--
Paige Miller
Ronein
Meteorite | Level 14

Thank you.

In credit risk models it is common to report the contribution of predictors (in %).

May you please show how to do it?

 

PaigeMiller
Diamond | Level 26

No such calculation exists. And if you could program it, it would still be meaningless.

--
Paige Miller
Reeza
Super User

Is there a term for the statistics that you're looking for?

 

Shapley values are one method, but this is typically done in ML/AI. 

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/casactml/casactml_explainmodel_details28.htm

 

https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainab...

 


@Ronein wrote:

Thank you.

In credit risk models it is common to report the contribution of predictors (in %).

May you please show how to do it?

 


 

Ksharp
Super User

If you are talking about credit risk score-card model . it is called IV .

 

Generally it would pick up the variables which IV is greater than 0.1 .

 

Here is the code how to calculated IV for a variable, but you need make a GROUP variable for this variable whether it is character or numeric  firstly.

 

%let var=marital   ;


title "变量: &var";
proc sql;
create table woe_&var as
 select &var as group,
sum(good_bad='bad') as n_bad label='bad的个数',sum(good_bad='good') as n_good label='good的个数',
sum(good_bad='bad')/(select sum(good_bad='bad') from have ) as bad_dist  format=percent7.2 label='bad的占比',
sum(good_bad='good')/(select sum(good_bad='good') from have ) as good_dist  format=percent7.2 label='good的占比',
log(calculated Bad_Dist/calculated Good_Dist) as woe
from have
   group by &var
    order by woe;


select *,sum(  (Bad_Dist-Good_Dist)*woe  ) as iv
 from woe_&var ;

quit;
title ' ';
PaigeMiller
Diamond | Level 26

Again, I agree with you @Ksharp, that would be another way to determine (and rank) which variables are important. But @Ronein was specifically talking about "relative contribution of each predictor (in %)" in the context of stepwise regression; and no method so far ranks the predictors in % (not even sure what he means by that) but as far as I know, there is no such thing as "relative contribution of each predictor (in %)" in the context of stepwise regression.

--
Paige Miller
Ksharp
Super User
"(not even sure what he means by that)"
Yeah. Agree with that.
In the credit risk field , the most used model is Score Card ( a.k.a proc logistic ) . Therefore I assume OP is talking about IV , WOE .....
Ksharp
Super User

1) Try Partial Least Square Model .

data Train_Data;
call streaminit(614325);
do trt=1 to 3;
do rep=1 to ceil(rand('uniform')*35);
x1=rand('uniform')*10;
x2=rand('normal')*2;
x3=rand('normal')*3;
e=rand('normal');
y=2 + trt + x1 + 0*x2 + 1.4*x3 + e;
output;
end;
end;
run;


/***Build the Regression model and get model coeficneints**/
proc pls data=Train_Data  missing=em   nfac=2 plot=(ParmProfiles VIP) details;
class trt;
model y=trt x1 x2 x3 ;
run;

Ksharp_0-1646476860584.png

 

2) you can use estimated coefficient as weight to evaluate the importance of variables . the bigger of abs(beta) ,the more important the variable is.

PaigeMiller
Diamond | Level 26

@Ksharp wrote:

1) Try Partial Least Square Model .

data Train_Data;
call streaminit(614325);
do trt=1 to 3;
do rep=1 to ceil(rand('uniform')*35);
x1=rand('uniform')*10;
x2=rand('normal')*2;
x3=rand('normal')*3;
e=rand('normal');
y=2 + trt + x1 + 0*x2 + 1.4*x3 + e;
output;
end;
end;
run;


/***Build the Regression model and get model coeficneints**/
proc pls data=Train_Data  missing=em   nfac=2 plot=(ParmProfiles VIP) details;
class trt;
model y=trt x1 x2 x3 ;
run;

Ksharp_0-1646476860584.png

 

2) you can use estimated coefficient as weight to evaluate the importance of variables . the bigger of abs(beta) ,the more important the variable is.


Yes, you can certainly get measures of how important a variable is in a regression. The above method is fine. I also mentioned above that you can standardize the coefficients to get a (different) measure of importance.

 

None of this relates to the original request of "how to calculate the relative contribution of each predictor (in %)" as in the context of regression using correlated x-variables, this request is meaningless. There is no such thing as relative contribution of each predictor in percent in this case.

--
Paige Miller
Haim
Calcite | Level 5

my thoughts - 

1. decide what is your target index (how good is your model) - i tend to use the gini (or sommer'sD)

2. re estimate the model n times (n is the number of variables in the model), every time eliminate one var, the diference in Gini between the original model and the new one (var i eliminated) is the contrebution of var i

3. to make the resaults more "nice" you can calibrate them so that the sum will be 1 (100%)

 

its not exactly 'relative contribution' but i think its as close as possiable 🙂

 

Ksharp
Super User
For the sake of conventient , you also could try PROC GLMSELECT or PROC HPGENSELECT .

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2977 views
  • 0 likes
  • 5 in conversation