BookmarkSubscribeRSS Feed
Ujjawal
Quartz | Level 8

I am using CLASS statement with PARAM =REF option in proc logistic to include categorical variables. My question - When i run PROC LOGISTIC with Selection = STEPWISE, it does not check significance of LEVELS (GROUPS) of a categorical variable. It only checks whether a caterical variable as a whole is significant or not. In other words, even if a category of a categorical variable is insignificant, it does not exclude it. But if create dummy variables with reference category manually, it removes the dummy variable that is insignificant. I understand it considers it as a separate variable itself. But isnt it statistically incorrect? Any workaround?

10 REPLIES 10
plf515
Lapis Lazuli | Level 10

Everything in STEPWISE is, at best, highly questionable and, at worst, outright wrong.

However, here, you have shown that you can make stepwise behave in either of two ways: Treat the categorical variable as a single variable or treat each level as a single variable.  I recommend the first. Perhaps you want to exclude any variable that is insignificant at any level?  I think that would be an (added) mistake, but you could certainly do it by  hand (e.g. by removing that variable from the list).

Ujjawal
Quartz | Level 8

You recommend backward or forward selection? I don't want to remove a variable. I want to remove that level from a variable. It may overestime / underestimate my predicted probability.

SteveDenham
Jade | Level 19

It appears that you want to collapse levels within a categorical variable, but I may be misinterpreting.

Why would you want to do that?  Please explain.

Steve Denham

Ujjawal
Quartz | Level 8

It's a marketing (churn) model. Most of the significant variables are continuous and only two character variables are appearing and they make sense in terms of business logic and statistical significance. So i was just checking their significance so i put them in CLASS statement with PARAM = REF option. And run stepwise, some levels are coming out insignificant at 5% level, even 10% level. SO i thought better to ignore these categories (levels). But SAS does not check levels while selecting variables via STEPWISE or any selection technique. I guess it's better to ignore these levels and make model more parsimonious  with low degree of freedom.

plf515
Lapis Lazuli | Level 10

No, I don't think it's better to delete some levels of a categorical variable.  That winds up being an uninterpretable model.

E.g. suppose the variable is race and you have White, Black, Asian, Other.  Suppose only White and Asian are significant.  Then if you delete the other levels, you are comparing Whites to Asians without controlling for Black or Other.  Keep all levels.

Parsimony is often the enemy

SteveDenham
Jade | Level 19

Good point, Peter, about parsimony.

Proceeding from the maxim "All models are wrong, but some models are useful" using parsimony as the only tool to select a model is, at least to me, akin to choosing the nearest rock as a weapon when a dragon attacks, while ten feet farther away is a sword designed especially for dragon slaying.  It may take a little more work to get to the sword, and it takes some skill to use it, but one is far likelier to be happy with the results.

Steve Denham

Rick_SAS
SAS Super FREQ

It sounds like you are looking for the SPLIT option, which is supported in the CLASS statement of HPLOGISITC and HPGENSELECT.

SAS/STAT(R) 13.2 User's Guide: High-Performance Procedures

I think most (all?) of the HP regression procedures that support variable selection also support the SPLIT option.

Ujjawal
Quartz | Level 8

Thanks! No, i don't want any interaction between variables. It's a dummy variable with K-1 coding. Setting one value as a reference category. And then evaluating significance of each categories of a variable.

Rick_SAS
SAS Super FREQ

I was wrong. The doc says that the SPLIT option is only available for HPREG.

eisforendo
Calcite | Level 5

It is the same thing. Since your reference level is not part of your regression (dropped), removing insignificant dummy is essentially the same as combining it with your reference. So you just have new reference variable implicitly.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 3126 views
  • 0 likes
  • 5 in conversation