BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shibbir63
Calcite | Level 5

Hi,

 

I would be grateful if any one could kindly tell me if clustering of observations, and clustering of variables improve the results of regression.

 

Regards,

 

Shibbir

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello @shibbir63 ,

 

I am in favor of this statement made by @PaigeMiller .

You could always try it both ways and see how good the regressions fit.

 

Clustering of observations :

I suppose you mean making a few segments in your observations and then fitting one model per segment. For example one model for smokers and another model for non-smokers or one model per gender.
This can be beneficial of course, given the segments remain big enough.
It can also be done with one overall model for sure but that overall model can be quite complex and full of interactions if the explanatory variables explain the response / target very differently for your multiple segments.

 

Clustering of variables :
That's a very good idea (to combat multi-collinearity and to do dimension reduction for example).
I do it almost always.
A decade ago I always used PROC VARCLUS (SAS/Stat) for this, but nowadays there are multiple techniques that you can use (feature construction from variable clusters). See Model Studio (SAS VIYA) doc.

 

Good luck,

Koen

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

I think it depends on the data, and I'm not aware of any global advice here (maybe others do have some advice). It would also help if you described your data and why you think clustering exists and would help.

 

You could always try it both ways and see how good the regressions fit.

 

 

--
Paige Miller
Reeza
Super User
Agreed, it's context and data dependent.
sbxkoenk
SAS Super FREQ

Hello @shibbir63 ,

 

I am in favor of this statement made by @PaigeMiller .

You could always try it both ways and see how good the regressions fit.

 

Clustering of observations :

I suppose you mean making a few segments in your observations and then fitting one model per segment. For example one model for smokers and another model for non-smokers or one model per gender.
This can be beneficial of course, given the segments remain big enough.
It can also be done with one overall model for sure but that overall model can be quite complex and full of interactions if the explanatory variables explain the response / target very differently for your multiple segments.

 

Clustering of variables :
That's a very good idea (to combat multi-collinearity and to do dimension reduction for example).
I do it almost always.
A decade ago I always used PROC VARCLUS (SAS/Stat) for this, but nowadays there are multiple techniques that you can use (feature construction from variable clusters). See Model Studio (SAS VIYA) doc.

 

Good luck,

Koen

Reeza
Super User
Interesting 🙂
I interpreted 'clustering of observations' more to how you'd define a strata but essentially one you don't know.

shibbir63
Calcite | Level 5
Hi Koen,

Many thanks. This is really helpful.!

Regards,

Shibbir
Ksharp
Super User
Yes. I think so.
@Rick_SAS might have some words ?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 647 views
  • 3 likes
  • 5 in conversation