BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
efibbi
Calcite | Level 5

Hi,

I am using the unsupervised option within the Variable Selection node in SAS Model Studio to reduce the number of variables in my dataset, which contains both interval and categorical variables, and I'm trying to understand what is actually going on under the hood, i.e. how SAS is handling categorical variables.

To my understanding, the Variable Selection node is essentially based on the VARREDUCE procedure. Now, I've read the documentation for this procedure but it's not clear to me as to how categorical variables are handled when the unsupervised option is selected.

I know that in general the GLM parametrization is used, but is it safe to assume that it's used for unsupervised variable selection as well?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
JasonColon
SAS Employee

Hi there!

 

Looking at the documentation for the VARREDUCE procedure, it states that it supports the GLM method of class variable parametrization. As you're probably aware since you mentioned the GLM parametrization, that type of parametrization is one in which the model is over parametrized. This is due to the fact that every level belonging to a categorical column is treated as it's own parameter.

As you've also stated the variable selection node in Model Studio isn't super explicit with how we're parametrizing everything. But if you think about it, we're ultimately using this selection method to remove redundant inputs from the model (i.e. inputs that can explain the same variability as other inputs). For this reason it would make sense to parametrize each level of a categorical variable to ensure that we explore all of the possible relationships. While the documentation isn't explicit in this regard, experimenting with the node and exploring the output supports this conclusion that each level is parametrized when using this selection method. 

 

Hope this helps!  

View solution in original post

2 REPLIES 2
JasonColon
SAS Employee

Hi there!

 

Looking at the documentation for the VARREDUCE procedure, it states that it supports the GLM method of class variable parametrization. As you're probably aware since you mentioned the GLM parametrization, that type of parametrization is one in which the model is over parametrized. This is due to the fact that every level belonging to a categorical column is treated as it's own parameter.

As you've also stated the variable selection node in Model Studio isn't super explicit with how we're parametrizing everything. But if you think about it, we're ultimately using this selection method to remove redundant inputs from the model (i.e. inputs that can explain the same variability as other inputs). For this reason it would make sense to parametrize each level of a categorical variable to ensure that we explore all of the possible relationships. While the documentation isn't explicit in this regard, experimenting with the node and exploring the output supports this conclusion that each level is parametrized when using this selection method. 

 

Hope this helps!  

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 481 views
  • 2 likes
  • 2 in conversation