BookmarkSubscribeRSS Feed
Reeza
Super User

I'm doing variable selection and need to use the WALD criteria which is not in GLMSELECT. 

 

However, I have to dummy code my categorical variables and I'm wondering how to go about ensuring that a variable entirely is included or dropped at the same time. 

 

For example, If I have 3 dummy variables for ethnicity, and I want ethnicity to remain in teh model, is there a setting to force them to stay together in the selection?

Ie, I want eth1-eth3 to be entered and leave at the same time. 

 

I'm open to using another proc if it supports this functionality. 

 

proc reg data=myData;

model outcome = var1 var2 var3 cat1 cat2 cat3 eth1 eth2 eth3 ;

run;

Any help or advice is appreciated. 

3 REPLIES 3
PaigeMiller
Diamond | Level 26

My advice, as always, is to not use variable selection methods at all. The reason people perform variable selection methods is to avoid the problems with correlations between the variables. But variable selection methods have many drawbacks, including the one you have run into. 

 

The alternative is Partial Least Squares, or PLS, which allows you to include all the candidate variables in the model, and is not (much) affected by the correlation between the variables. Then, there's no problem keeping all of your ethnicity levels in the model. But there is no such thing as the Wald test.

 

Essentially, you are trading off the advantages and disadvantages of regression variable selection techniques for the advantages and disadvantages of PLS. In my opinion, the advantages of PLS far outweigh the disadvantages of PLS, and make it a clear winner over variable selection methods. PLS is used successfully in many applications; in some applications it is the gold standard for modeling.

 

So, I understand that not everyone wants to give up variable selection methods, and they have a different preference for the advantages and disadvantages.

--
Paige Miller
Rick_SAS
SAS Super FREQ

Can you clarify what the WALD criteria is? I did not find it in the PROC REG documentation.

Ksharp
Super User
Did you check PROC HPGENSELECT ?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 568 views
  • 4 likes
  • 4 in conversation