BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ruth
Fluorite | Level 6

I am using PROC GENMOD  to run logistic regression for a data. There are many explanatory variables (>25), most of which are nominal type with multiple levels.

It seemed that I have to put all variables into the model, and manually exclude one at a time until achieving all significant variables. Given that there are too many variables, this is not a clever way to proceed. I searched and cound not find any automatic selection method in PROC GENMOD.

I am posting my problem here. Any good idea?

Ruth

1 ACCEPTED SOLUTION

Accepted Solutions
JacobSimonsen
Barite | Level 11

You can use the procedure HPGENSELECT. It Works exactly as GENMOD, except that it also can do some selection algorithms.

forexample you can write:

PROC HPGENSELECT data=MYDATA;

CLASS classvariables1-classvariables20;

model y=classvariables1-classvariables20/dist=binary link=logit;

selection method=stepwise;

RUN;

It requires SAS 9.4.

View solution in original post

7 REPLIES 7
Ruth
Fluorite | Level 6

Does PROC GENMOD really have no automatic variable selection method? This will be a cumbersome task to run the model and manually delete insignificant terms.

Any thought?

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

There is no automatic variable selection in GENMOD. You should know that such methods are controversial in statistics, and many argue strongly against automatic methods. At best, you should use the methods in an exploratory sense, to help you understand your data.

Since you have binary data, you can use PROC LOGISTIC instead of GENMOD. LOGISTIC does have variable selection methods. Check out the SELECTION= option on the model statement.

Ruth
Fluorite | Level 6

Thanks for your idea, Ivm.

My data cannot allow me to use proc logistic as most nominal variables have over 5 levels. So better use proc genmod which automatically creates dummy variables.

Anyway, I can do the manual stepwise deletion of variables.

Dale
Pyrite | Level 9

Ruth,

Why do you say that you cannot use PROC LOGISTIC?  The LOGISTIC procedure has a CLASS statement.  The purpose of the CLASS statement is to expand categorical variables so that the design matrix has dummy variables representing all of the levels of the predictor variables.  It is only through the mechanisms of the CLASS statement that PROC GENMOD is able to expand a nominal predictor variable into a set of dummy variables.

By the way, I would be careful about application of stepwise selection methods when you have categorical predictor variables.  If categorical predictor variable A has 5 levels, then your stepwise selection may keep a couple of levels of A as important predictors and remove the other levels of A as unimportant predictors.  This can lead to some real model confusion.  I don't know if the implementation of stepwise selection methods in PROC LOGISTIC operates this way (selecting one column at a time from the design matrix).  But that is a typical implementation of stepwise selection.  It is rare to find implementation of stepwise selection methods which test all levels of a categorical predictor variable for simultaneous inclusion/exclusion from the model.

There are other statistical issues with stepwise selection method.  They typically produce incorrect models.  As lvm has stated, stepwise selection should only be used for exploratory analysis.  Models suggested by stepwise selection methods should be confirmed in a separate investigation.

Ruth
Fluorite | Level 6

Hi Dale,

Thanks for correcting me. I am using the book: Logistic Regression Using SAS: Theory and Application, by Paul D. Allison.

Following your suggestion, I checked and found many contents of the book are out of date. For example, it says that PROC LOGISTIC needs to manually create dummy variables, it cannot specify multiplicative terms (i.e. interaction) in the MODEL statement. As new SAS version is released, more updates have been added for many procedures.

The book was published in 1999. Smiley Sad

Thanks again.

ccasboy
Calcite | Level 5


Dear Ruth

I have exactly the same problem. My data has about 70 variables that are to used in the logistic regression as predictor variables (all norminal with multiple levels) and I started by running a Pearson's Chi-Square between each of them and the binary outcome. Then I picked only the significant ones for my logistic regression which again are still too many (27 of them). I have tried running the model using Proc Genmod but it is not converging as a result of too many predictors I suppose. I thought of using Proc Logistic but the problem is that Proc Logistic does not allow you to specify the reference category in the class statement and I want particular categories as references. The advice I got from a friend is that I should run Spearman Rank Correlation between the predictors and then drop one of two highly correlated variables. I think this approach is not that bad and I suggest you try it

Best

Kingsly

JacobSimonsen
Barite | Level 11

You can use the procedure HPGENSELECT. It Works exactly as GENMOD, except that it also can do some selection algorithms.

forexample you can write:

PROC HPGENSELECT data=MYDATA;

CLASS classvariables1-classvariables20;

model y=classvariables1-classvariables20/dist=binary link=logit;

selection method=stepwise;

RUN;

It requires SAS 9.4.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 26769 views
  • 5 likes
  • 5 in conversation