roc Logistic: Wald Chi-Square to classify variable importance

MariaD · Posted 02-01-2022 09:57 AM

Hi folks,

We're using a proc logistic with cloglog link. Our variables are mostly categorical. If we had the following results:

Which will be the best way to illustrate the variable importance? Use an scale based on biggest Wald Chi-Square as show below?

Or create an scale based on the sum of Wald Chi-Square?

Thanks,

PaigeMiller · Posted 02-01-2022 10:02 AM

There may be some standard way of computing variable importance for CLOGLOG link that I am not aware of.

However, in my mind, the importance of a variable is given by the regression coefficients and not the Wald Chi-Squared value. Variables with large regression coefficients (either large positive or large negative) are the important variables. (In some cases you may get large regression coefficients on variables that are not statistically significant, and then its up to you whether or not you consider these to have high importance — I do not consider these to have high importance).

--
Paige Miller

MariaD · Posted 02-01-2022 10:15 AM

Thanks @PaigeMiller . Normally, we use regression coefficients, but in this case all the variables included are categorical, so we have a multiple coefficients for a variable (one for each category).

Considering then Wald chi-square is the effect size squared ((estimate/stderr)**2), I wondering if it could use to represent the variable importance.

PaigeMiller · Posted 02-01-2022 10:26 AM

Since I have little experience with CLOGLOG, I again point out that there may be some standard method of determining variable importance that I am not aware of.

However

The Wald Chi-square measures if the variable is statistically significant. It does not measure how important the variable is. I don't see a translation of Wald Chi-square to variable importance. My mind can't seem to do that. Important variables (big regression coefficients) can be only barely significant (pvalue = 0.049), and much less important variables (smaller regression coefficients) can be extremely significant (pvalue = 0.0001).

If your variable VAR1 has 5 df, then it has six levels, and each individual level may have a big coefficient, or not. Example: If the six levels are DOG CAT PIG CHICKEN AMOEBA GORILLA, it could be that GORILLA has a huge coefficient, much bigger than the rest which are close to zero. Then the coefficient for GORILLA has high variable importance, and the rest have lower or nearly zero importance.

And what @Rick_SAS said applies too, although I think it doesn't matter in the case of categorical predictors.

--
Paige Miller

MariaD · Posted 02-01-2022 04:06 PM

Hi @PaigeMiller , that's correct. But in this case, we'd like to understand if VAR1, for example, as whole is more or less important that VAR2 as whole, and of course, how much importance is it.

PaigeMiller · Posted 02-01-2022 04:35 PM

Ok, I understand what you want, but to be very blunt, since speaking diplomatically hasn't gotten anywhere: Wald's Chi Square does not measure importance.

If you really want to get somewhere close to importance, you can compute the LSMEANS for each level of VAR1, and compute what the maximum difference is between the LSMEANS; then do this for VAR2 and VAR3 and so on. That will get you something that is close to what I think importance means.

--
Paige Miller

MariaD · Posted 02-01-2022 04:56 PM

Thanks a lot!

Rick_SAS · Posted 02-01-2022 10:23 AM

The Wald chi-square statistic is a statistic that tests the null hypothesis beta = 0. It is used to compute the p-value: large Chi-square for small p-values.

I think Paige meant to say that the magnitude of the STANDARDIZED regression coefficient can be used to assess the strength of the relationship between the response and an explanatory variable. You can use the STB option on the MODEL statement to obtain standardized estimates. If you don't standardize, then you can make any coefficient larger just by changing the way that it is measured. For example, if you measure fuel efficiency in "feet per gallon" you will get an estimate that is 5000 times larger than if you measure fuel efficiency in "miles per gallon":

data Have;
set sashelp.cars;
where Origin ^= "Europe";
FeetPerGallon = MPG_City / 5280;
run;

proc logistic data=Have plots=None;
model Origin = FeetPerGallon / STB;
ods select ParameterEstimates;
run;

proc logistic data=Have plots=None;
model Origin = MPG_City / STB;
ods select ParameterEstimates;
run;

StatDave · Posted 02-01-2022 10:25 AM

See this note. The PCORR option in PROC LOGISTIC is based on the Wald chi-square statistic. The RsquareV macro is another measure that can be used.

Rick_SAS · Posted 02-01-2022 04:56 PM

@MariaD In case you missed Stat_Dave's suggestion, I think you should read the SAS Note that he mentions, and the documentation for the PCORR option on the MODEL statement (which is somewhat terse). The note discusses several statistics and (if I am reading it correctly) suggests that the PCORR statistic is a good choice for continuous main effects where each variable has a single parameter. The partial correlation is based on the Wald chi-square value, so it might suit your needs.

At the bottom of Dave's note is a link to D. Thompson's 2009 paper on this topic. It discusses a total of six statistics, including the Wald chi-square, and explains the difference between them. Based on Thompson's remarks, you should be able to make a wise choice as to which statistic you want to use and how to interpret it.

Ksharp · Posted 02-02-2022 03:22 AM

Another way is using PROC PLS .

Or you could try Decision Tree Model.

roc Logistic: Wald Chi-Square to classify variable importance

Re: roc Logistic: Wald Chi-Square to classify variable importance

Re: roc Logistic: Wald Chi-Square to classify variable importance

Re: roc Logistic: Wald Chi-Square to classify variable importance

Re: roc Logistic: Wald Chi-Square to classify variable importance

Re: roc Logistic: Wald Chi-Square to classify variable importance

Re: roc Logistic: Wald Chi-Square to classify variable importance

Re: roc Logistic: Wald Chi-Square to classify variable importance

Re: roc Logistic: Wald Chi-Square to classify variable importance

Re: roc Logistic: Wald Chi-Square to classify variable importance

Re: roc Logistic: Wald Chi-Square to classify variable importance