Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- SAS Data Science
- /
- Does clustering helps improve regression predictions?

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-24-2021 02:51 PM
(646 views)

Hi,

I would be grateful if any one could kindly tell me if clustering of observations, and clustering of variables improve the results of regression.

Regards,

Shibbir

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello @shibbir63 ,

I am in favor of this statement made by @PaigeMiller .

> You could always try it both ways and see how good the regressions fit.

**Clustering of observations :**

I suppose you mean making a few segments in your observations and then fitting one model per segment. For example one model for smokers and another model for non-smokers or one model per gender.

This can be beneficial of course, given the segments remain big enough.

It can also be done with one overall model for sure but that overall model can be quite complex and full of interactions if the explanatory variables explain the response / target very differently for your multiple segments.

**Clustering of variables :**That's a very good idea (to combat multi-collinearity and to do dimension reduction for example).

I do it almost always.

A decade ago I always used PROC VARCLUS (SAS/Stat) for this, but nowadays there are multiple techniques that you can use (feature construction from variable clusters). See Model Studio (SAS VIYA) doc.

Good luck,

Koen

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I think it depends on the data, and I'm not aware of any global advice here (maybe others do have some advice). It would also help if you described your data and why you think clustering exists and would help.

You could always try it both ways and see how good the regressions fit.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Agreed, it's context and data dependent.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello @shibbir63 ,

I am in favor of this statement made by @PaigeMiller .

> You could always try it both ways and see how good the regressions fit.

**Clustering of observations :**

I suppose you mean making a few segments in your observations and then fitting one model per segment. For example one model for smokers and another model for non-smokers or one model per gender.

This can be beneficial of course, given the segments remain big enough.

It can also be done with one overall model for sure but that overall model can be quite complex and full of interactions if the explanatory variables explain the response / target very differently for your multiple segments.

**Clustering of variables :**That's a very good idea (to combat multi-collinearity and to do dimension reduction for example).

I do it almost always.

A decade ago I always used PROC VARCLUS (SAS/Stat) for this, but nowadays there are multiple techniques that you can use (feature construction from variable clusters). See Model Studio (SAS VIYA) doc.

Good luck,

Koen

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Interesting 🙂

I interpreted 'clustering of observations' more to how you'd define a strata but essentially one you don't know.

I interpreted 'clustering of observations' more to how you'd define a strata but essentially one you don't know.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Koen,

Many thanks. This is really helpful.!

Regards,

Shibbir

Many thanks. This is really helpful.!

Regards,

Shibbir

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes. I think so.

@Rick_SAS might have some words ?

@Rick_SAS might have some words ?

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.