Solved: Does clustering helps improve regression predictions?

shibbir63 · Posted 11-24-2021 02:51 PM

Hi,

I would be grateful if any one could kindly tell me if clustering of observations, and clustering of variables improve the results of regression.

Regards,

Shibbir

sbxkoenk · Posted 11-24-2021 03:41 PM

Hello @shibbir63 ,

I am in favor of this statement made by @PaigeMiller .

> You could always try it both ways and see how good the regressions fit.

Clustering of observations :

I suppose you mean making a few segments in your observations and then fitting one model per segment. For example one model for smokers and another model for non-smokers or one model per gender.
This can be beneficial of course, given the segments remain big enough.
It can also be done with one overall model for sure but that overall model can be quite complex and full of interactions if the explanatory variables explain the response / target very differently for your multiple segments.

Clustering of variables :
That's a very good idea (to combat multi-collinearity and to do dimension reduction for example).
I do it almost always.
A decade ago I always used PROC VARCLUS (SAS/Stat) for this, but nowadays there are multiple techniques that you can use (feature construction from variable clusters). See Model Studio (SAS VIYA) doc.

Good luck,

Koen

View solution in original post

PaigeMiller · Posted 11-24-2021 03:29 PM

I think it depends on the data, and I'm not aware of any global advice here (maybe others do have some advice). It would also help if you described your data and why you think clustering exists and would help.

You could always try it both ways and see how good the regressions fit.

--
Paige Miller

Reeza · Posted 11-24-2021 03:33 PM

Agreed, it's context and data dependent.

sbxkoenk · Posted 11-24-2021 03:41 PM

Hello @shibbir63 ,

I am in favor of this statement made by @PaigeMiller .

> You could always try it both ways and see how good the regressions fit.

Clustering of observations :

I suppose you mean making a few segments in your observations and then fitting one model per segment. For example one model for smokers and another model for non-smokers or one model per gender.
This can be beneficial of course, given the segments remain big enough.
It can also be done with one overall model for sure but that overall model can be quite complex and full of interactions if the explanatory variables explain the response / target very differently for your multiple segments.

Clustering of variables :
That's a very good idea (to combat multi-collinearity and to do dimension reduction for example).
I do it almost always.
A decade ago I always used PROC VARCLUS (SAS/Stat) for this, but nowadays there are multiple techniques that you can use (feature construction from variable clusters). See Model Studio (SAS VIYA) doc.

Good luck,

Koen

Reeza · Posted 11-24-2021 04:11 PM

Interesting 🙂
I interpreted 'clustering of observations' more to how you'd define a strata but essentially one you don't know.

shibbir63 · Posted 11-24-2021 04:12 PM

Hi Koen,

Many thanks. This is really helpful.!

Regards,

Shibbir

Ksharp · Posted 11-25-2021 08:03 AM

Yes. I think so.
@Rick_SAS might have some words ?

Does clustering helps improve regression predictions?

Re: Does clustering helps improve regression predictions?

Re: Does clustering helps improve regression predictions?

Re: Does clustering helps improve regression predictions?

Re: Does clustering helps improve regression predictions?

Re: Does clustering helps improve regression predictions?

Re: Does clustering helps improve regression predictions?

Re: Does clustering helps improve regression predictions?

Registration is open