BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
laurenhosking
Quartz | Level 8
I’m using a Multivariate regression to model the Total volume as a function Of two variables Units and Price.

However I’m not sure wether to model the total volume as a whole or whether to model it by subgroups of brand. How would I go about finding this out?

I have performed cluster analysis for the category as a whole and by brand but unsure how to interpret the results in order to answer this question
1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I think you are asking about a classical ANOVA model that asks whether the mean Volume differs according to the brand. All you need to do is use the CLASS statement to specify the brand variable, then include that variable in your model. FOr example, if you are using PROC GLM, the code looks like this:

proc glm data=Have plots=all;
class Brand;
model Volume = Units Price Brand;
quit;

View solution in original post

5 REPLIES 5
Rick_SAS
SAS Super FREQ

I think you are asking about a classical ANOVA model that asks whether the mean Volume differs according to the brand. All you need to do is use the CLASS statement to specify the brand variable, then include that variable in your model. FOr example, if you are using PROC GLM, the code looks like this:

proc glm data=Have plots=all;
class Brand;
model Volume = Units Price Brand;
quit;
laurenhosking
Quartz | Level 8
Fabulous thank you, and if they do differ then I would need to model my regression by subgroups which is brand?
Rick_SAS
SAS Super FREQ

Correct. You can look at the F-tests and p-values for the Type 3 sums of squares to assess whether an effect is statistically significant.

laurenhosking
Quartz | Level 8
I have observed that 2 of my groups behave the same but 1 behaves differently. So I decided to use brand as a subgroup.

However doing this I’ve been told my R-square value should be close to 1 but it’s not it’s as low as 0.3 in certain cases. Does this matter?
PaigeMiller
Diamond | Level 26

In an ideal world, R-squared should be close to 1. But different data has different amounts of noise, and so a r-squared of 0.3 may be the proper value for this data. The real question in my mind would be to look at the root mean square error reported by SAS and decide if this is an acceptable level of variation (or not). If, for example, you have some idea of measurement variability or sample-to-sample variability, and the root mean square error is somewhat close, then I'd say that's fine. Or, if the confidence intervals around your predictions or around you parameter estimates are usable, then that's fine as well. All of this is context and problem dependent, there are no rules of thumb, every data set is different, every application is different, every use is different.

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 912 views
  • 3 likes
  • 3 in conversation