BookmarkSubscribeRSS Feed
MariaD
Barite | Level 11

Hi folks,

We're using a proc logistic with cloglog link. Our variables are mostly categorical. If we had the following results:

 

Screen Shot 2022-02-01 at 11.45.48.png

 

Which will be the best way to illustrate the variable importance? Use an scale based on biggest Wald Chi-Square as show below?

 

Screen Shot 2022-02-01 at 11.45.53.png

 

Or create an scale based on the sum of Wald Chi-Square?

 

Screen Shot 2022-02-01 at 11.45.58.png

 

Thanks,

10 REPLIES 10
PaigeMiller
Diamond | Level 26

There may be some standard way of computing variable importance for CLOGLOG link that I am not aware of.

 

However, in my mind, the importance of a variable is given by the regression coefficients and not the Wald Chi-Squared value. Variables with large regression coefficients (either large positive or large negative) are the important variables. (In some cases you may get large regression coefficients on variables that are not statistically significant, and then its up to you whether or not you consider these to have high importance — I do not consider these to have high importance).

--
Paige Miller
MariaD
Barite | Level 11

Thanks @PaigeMiller . Normally, we use regression coefficients, but in this case all the variables included are categorical, so we have a multiple coefficients for a variable (one for each category). 

 

Considering then Wald chi-square is the effect size squared ((estimate/stderr)**2), I wondering if it could use to represent the variable importance. 

PaigeMiller
Diamond | Level 26

Since I have little experience with CLOGLOG, I again point out that there may be some standard method of determining variable importance that I am not aware of.

 

However 

 

The Wald Chi-square measures if the variable is statistically significant. It does not measure how important the variable is. I don't see a translation of Wald Chi-square to variable importance. My mind can't seem to do that. Important variables (big regression coefficients) can be only barely significant (pvalue = 0.049), and much less important variables (smaller regression coefficients) can be extremely significant (pvalue = 0.0001).

 

If your variable VAR1 has 5 df, then it has six levels, and each individual level may have a big coefficient, or not. Example: If the six levels are DOG CAT PIG CHICKEN AMOEBA GORILLA, it could be that GORILLA has a huge coefficient, much bigger than the rest which are close to zero. Then the coefficient for GORILLA has high variable importance, and the rest have lower or nearly zero importance.


And what @Rick_SAS said applies too, although I think it doesn't matter in the case of categorical predictors.

 

 

--
Paige Miller
MariaD
Barite | Level 11

Hi @PaigeMiller , that's correct. But in this case, we'd like to understand if VAR1, for example, as whole is more or less important that VAR2 as whole, and of course, how much importance is it. 

PaigeMiller
Diamond | Level 26

Ok, I understand what you want, but to be very blunt, since speaking diplomatically hasn't gotten anywhere: Wald's Chi Square does not measure importance.

 

If you really want to get somewhere close to importance, you can compute the LSMEANS for each level of VAR1, and compute what the maximum difference is between the LSMEANS; then do this for VAR2 and VAR3 and so on. That will get you something that is close to what I think importance means.

--
Paige Miller
Rick_SAS
SAS Super FREQ

The Wald chi-square statistic is a statistic that tests the null hypothesis beta = 0. It is used to compute the p-value: large Chi-square for small p-values.

 

I think Paige meant to say that the magnitude of the STANDARDIZED regression coefficient can be used to assess the strength of the relationship between the response and an explanatory variable. You can use the STB option on the MODEL statement to obtain standardized estimates.  If you don't standardize, then you can make any coefficient larger just by changing the way that it is measured. For example, if you measure fuel efficiency in "feet per gallon" you will get an estimate that is 5000 times larger than if you measure  fuel efficiency in "miles per gallon":

 

data Have;
set sashelp.cars;
where Origin ^= "Europe";
FeetPerGallon = MPG_City / 5280;
run;

proc logistic data=Have plots=None;
model Origin = FeetPerGallon / STB;
ods select ParameterEstimates;
run;

proc logistic data=Have plots=None;
model Origin = MPG_City / STB;
ods select ParameterEstimates;
run;

 

StatDave
SAS Super FREQ

See this note. The PCORR option in PROC LOGISTIC is based on the Wald chi-square statistic. The RsquareV macro is another measure that can be used. 

Rick_SAS
SAS Super FREQ

@MariaD In case you missed Stat_Dave's suggestion, I think you should read the SAS Note that he mentions, and the documentation for the PCORR option on the MODEL statement (which is somewhat terse). The note discusses several statistics and (if I am reading it correctly) suggests that the PCORR statistic is a good choice for continuous main effects where each variable has a single parameter. The partial correlation is based on the Wald chi-square value, so it might suit your needs.

 

At the bottom of Dave's note is a link to D. Thompson's 2009 paper on this topic. It discusses a total of six statistics, including the Wald chi-square, and explains the difference between them. Based on Thompson's remarks, you should be able to make a wise choice as to which statistic you want to use and how to interpret it.

Ksharp
Super User

Another way is using PROC PLS .

 

Ksharp_0-1643789990822.png

 

 

Or you could try Decision Tree Model.

Ksharp_1-1643790167621.png

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1885 views
  • 4 likes
  • 5 in conversation