turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Target variables and Variable Clustering vs. Varia...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-18-2016 10:57 AM

I have a data set with a large number of input variables, many of which are highly correlated. The variable clustering node does a nice job of reducing the number of variable and selecting a cluster representative, but I have a question about the algorithm that the documentation doesn't seem to address.

What role does the target variable play in the Variable Clustering node? Are the variables in a cluster selected just because they are similar, or do they have to have a simialr relationship to the target variabel as well?

In contrast, the Variable Selection node takes into account the strength of association between an input and the target.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-18-2016 01:23 PM

The target variable is not used in the Variable Clustering node. It is an unsupervised method similar to Principal Component Analysis that only looks at the relationship among the input variables. Hope that helps!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-19-2016 10:10 PM

Hi,

If you believe the variance associated with each observation is 'according to' the target variable, you may consider listing the target variable at the WEIGHT statement in proc varclus.

If you believe the variance associated with each observation is 'according to' the target variable, you may consider listing the target variable at the WEIGHT statement in proc varclus.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-24-2016 09:29 AM - edited 05-24-2016 09:30 AM

I should just withdraw the question. If two variables act alike, then they would be correlated with the target in the same way as well. Since my goal was to use clustering for variable selection before constructing a regression model, variable that were aligned enough to be in the same cluster would necessarily have similar relationships with the target for regression. There was no need to consider the target variable in the the Variable Clustering node other than to withhold it from all of the clusters.

I actually got fairly strong results from regression using clustering as my method of variable selection, although a LARS node with the LASSO option proved to be the best model.