Solved: Multivariate regression by category or subgroup

laurenhosking · Posted 11-18-2020 05:59 AM

I’m using a Multivariate regression to model the Total volume as a function Of two variables Units and Price.

However I’m not sure wether to model the total volume as a whole or whether to model it by subgroups of brand. How would I go about finding this out?

I have performed cluster analysis for the category as a whole and by brand but unsure how to interpret the results in order to answer this question

Rick_SAS · Posted 11-18-2020 06:50 AM

I think you are asking about a classical ANOVA model that asks whether the mean Volume differs according to the brand. All you need to do is use the CLASS statement to specify the brand variable, then include that variable in your model. FOr example, if you are using PROC GLM, the code looks like this:

proc glm data=Have plots=all;
class Brand;
model Volume = Units Price Brand;
quit;

View solution in original post

Rick_SAS · Posted 11-18-2020 06:50 AM

I think you are asking about a classical ANOVA model that asks whether the mean Volume differs according to the brand. All you need to do is use the CLASS statement to specify the brand variable, then include that variable in your model. FOr example, if you are using PROC GLM, the code looks like this:

proc glm data=Have plots=all;
class Brand;
model Volume = Units Price Brand;
quit;

laurenhosking · Posted 11-18-2020 06:55 AM

Fabulous thank you, and if they do differ then I would need to model my regression by subgroups which is brand?

Rick_SAS · Posted 11-18-2020 07:05 AM

Correct. You can look at the F-tests and p-values for the Type 3 sums of squares to assess whether an effect is statistically significant.

laurenhosking · Posted 11-18-2020 10:33 AM

I have observed that 2 of my groups behave the same but 1 behaves differently. So I decided to use brand as a subgroup.

However doing this I’ve been told my R-square value should be close to 1 but it’s not it’s as low as 0.3 in certain cases. Does this matter?

PaigeMiller · Posted 11-18-2020 10:58 AM

In an ideal world, R-squared should be close to 1. But different data has different amounts of noise, and so a r-squared of 0.3 may be the proper value for this data. The real question in my mind would be to look at the root mean square error reported by SAS and decide if this is an acceptable level of variation (or not). If, for example, you have some idea of measurement variability or sample-to-sample variability, and the root mean square error is somewhat close, then I'd say that's fine. Or, if the confidence intervals around your predictions or around you parameter estimates are usable, then that's fine as well. All of this is context and problem dependent, there are no rules of thumb, every data set is different, every application is different, every use is different.

--
Paige Miller

Multivariate regression by category or subgroup

Re: Multivariate regression by category or subgroup

Re: Multivariate regression by category or subgroup

Re: Multivariate regression by category or subgroup

Re: Multivariate regression by category or subgroup

Re: Multivariate regression by category or subgroup

Re: Multivariate regression by category or subgroup

Catch up on SAS Innovate 2026