Programming the statistical procedures from SAS

Proc Logistic for categorical variables

Reply
Regular Contributor
Posts: 181

Proc Logistic for categorical variables

I am using CLASS statement with PARAM =REF option in proc logistic to include categorical variables. My question - When i run PROC LOGISTIC with Selection = STEPWISE, it does not check significance of LEVELS (GROUPS) of a categorical variable. It only checks whether a caterical variable as a whole is significant or not. In other words, even if a category of a categorical variable is insignificant, it does not exclude it. But if create dummy variables with reference category manually, it removes the dummy variable that is insignificant. I understand it considers it as a separate variable itself. But isnt it statistically incorrect? Any workaround?

Frequent Contributor
Posts: 140

Re: Proc Logistic for categorical variables

Everything in STEPWISE is, at best, highly questionable and, at worst, outright wrong.

However, here, you have shown that you can make stepwise behave in either of two ways: Treat the categorical variable as a single variable or treat each level as a single variable.  I recommend the first. Perhaps you want to exclude any variable that is insignificant at any level?  I think that would be an (added) mistake, but you could certainly do it by  hand (e.g. by removing that variable from the list).

Regular Contributor
Posts: 181

Re: Proc Logistic for categorical variables

You recommend backward or forward selection? I don't want to remove a variable. I want to remove that level from a variable. It may overestime / underestimate my predicted probability.

Respected Advisor
Posts: 2,655

Re: Proc Logistic for categorical variables

It appears that you want to collapse levels within a categorical variable, but I may be misinterpreting.

Why would you want to do that?  Please explain.

Steve Denham

Regular Contributor
Posts: 181

Re: Proc Logistic for categorical variables

It's a marketing (churn) model. Most of the significant variables are continuous and only two character variables are appearing and they make sense in terms of business logic and statistical significance. So i was just checking their significance so i put them in CLASS statement with PARAM = REF option. And run stepwise, some levels are coming out insignificant at 5% level, even 10% level. SO i thought better to ignore these categories (levels). But SAS does not check levels while selecting variables via STEPWISE or any selection technique. I guess it's better to ignore these levels and make model more parsimonious  with low degree of freedom.

Frequent Contributor
Posts: 140

Re: Proc Logistic for categorical variables

No, I don't think it's better to delete some levels of a categorical variable.  That winds up being an uninterpretable model.

E.g. suppose the variable is race and you have White, Black, Asian, Other.  Suppose only White and Asian are significant.  Then if you delete the other levels, you are comparing Whites to Asians without controlling for Black or Other.  Keep all levels.

Parsimony is often the enemy

Respected Advisor
Posts: 2,655

Re: Proc Logistic for categorical variables

Good point, Peter, about parsimony.

Proceeding from the maxim "All models are wrong, but some models are useful" using parsimony as the only tool to select a model is, at least to me, akin to choosing the nearest rock as a weapon when a dragon attacks, while ten feet farther away is a sword designed especially for dragon slaying.  It may take a little more work to get to the sword, and it takes some skill to use it, but one is far likelier to be happy with the results.

Steve Denham

SAS Super FREQ
Posts: 3,307

Re: Proc Logistic for categorical variables

It sounds like you are looking for the SPLIT option, which is supported in the CLASS statement of HPLOGISITC and HPGENSELECT.

SAS/STAT(R) 13.2 User's Guide: High-Performance Procedures

I think most (all?) of the HP regression procedures that support variable selection also support the SPLIT option.

Regular Contributor
Posts: 181

Re: Proc Logistic for categorical variables

Thanks! No, i don't want any interaction between variables. It's a dummy variable with K-1 coding. Setting one value as a reference category. And then evaluating significance of each categories of a variable.

SAS Super FREQ
Posts: 3,307

Re: Proc Logistic for categorical variables

I was wrong. The doc says that the SPLIT option is only available for HPREG.

Regular Learner
Posts: 1

Re: Proc Logistic for categorical variables

It is the same thing. Since your reference level is not part of your regression (dropped), removing insignificant dummy is essentially the same as combining it with your reference. So you just have new reference variable implicitly.

 

Ask a Question
Discussion stats
  • 10 replies
  • 685 views
  • 0 likes
  • 5 in conversation