BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
efibbi
Calcite | Level 5

Hi,

I am using the unsupervised option within the Variable Selection node in SAS Model Studio to reduce the number of variables in my dataset, which contains both interval and categorical variables, and I'm trying to understand what is actually going on under the hood, i.e. how SAS is handling categorical variables.

To my understanding, the Variable Selection node is essentially based on the VARREDUCE procedure. Now, I've read the documentation for this procedure but it's not clear to me as to how categorical variables are handled when the unsupervised option is selected.

I know that in general the GLM parametrization is used, but is it safe to assume that it's used for unsupervised variable selection as well?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
JasonColon
SAS Employee

Hi there!

 

Looking at the documentation for the VARREDUCE procedure, it states that it supports the GLM method of class variable parametrization. As you're probably aware since you mentioned the GLM parametrization, that type of parametrization is one in which the model is over parametrized. This is due to the fact that every level belonging to a categorical column is treated as it's own parameter.

As you've also stated the variable selection node in Model Studio isn't super explicit with how we're parametrizing everything. But if you think about it, we're ultimately using this selection method to remove redundant inputs from the model (i.e. inputs that can explain the same variability as other inputs). For this reason it would make sense to parametrize each level of a categorical variable to ensure that we explore all of the possible relationships. While the documentation isn't explicit in this regard, experimenting with the node and exploring the output supports this conclusion that each level is parametrized when using this selection method. 

 

Hope this helps!  

View solution in original post

2 REPLIES 2
JasonColon
SAS Employee

Hi there!

 

Looking at the documentation for the VARREDUCE procedure, it states that it supports the GLM method of class variable parametrization. As you're probably aware since you mentioned the GLM parametrization, that type of parametrization is one in which the model is over parametrized. This is due to the fact that every level belonging to a categorical column is treated as it's own parameter.

As you've also stated the variable selection node in Model Studio isn't super explicit with how we're parametrizing everything. But if you think about it, we're ultimately using this selection method to remove redundant inputs from the model (i.e. inputs that can explain the same variability as other inputs). For this reason it would make sense to parametrize each level of a categorical variable to ensure that we explore all of the possible relationships. While the documentation isn't explicit in this regard, experimenting with the node and exploring the output supports this conclusion that each level is parametrized when using this selection method. 

 

Hope this helps!  

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 662 views
  • 2 likes
  • 2 in conversation