Hello,
I have the below data structure.
Dress_ID | Style | Price | Rating | Size | Season | NeckLine | SleeveLength | waiseline | Total Sales |
1006032852 | Sexy | Low | 4.6 | M | Summer | o-neck | sleevless | empire | 2114 |
1212192089 | Casual | Low | 0 | L | Summer | o-neck | Petal | natural | 2160 |
1190380701 | vintage | High | 0 | L | Automn | o-neck | full | natural | 2206 |
966005983 | Brief | Average | 4.6 | L | Spring | o-neck | full | natural | 2252 |
876339541 | cute | Low | 4.5 | M | Summer | o-neck | butterfly | natural | 2298 |
1068332458 | bohemian | Low | 0 | M | Summer | v-neck | sleevless | empire | 2344 |
1220707172 | Casual | Average | 0 | XL | Summer | o-neck | full | null | 2390 |
1219677488 | Novelty | Average | 0 | free | Automn | o-neck | short | natural | 2436 |
1113094204 | Flare | Average | 0 | free | Spring | v-neck | short | empire | 2482 |
985292672 | bohemian | Low | 0 | free | Summer | v-neck | sleevless | natural | 2528 |
1117293701 | party | Average | 5 | free | Summer | o-neck | full | natural | 2574 |
898481530 | Flare | Average | 0 | free | Spring | v-neck | short | null | 2620
|
I have to find out the factors which are effecting the Total Sales variable. Since most of the variables are categorical or rather all are, is using Anova a better option then going for Regression?
Regards,
Aditya
Regression. ANOVA primarily deals with one variable at a time where regression does not and you can consider interaction terms in a regression model.
Make sure you understand the assumptions for your regression model and how categorical values are handled.
Regression. ANOVA primarily deals with one variable at a time where regression does not and you can consider interaction terms in a regression model.
Make sure you understand the assumptions for your regression model and how categorical values are handled.
I would agree with @Reeza, and go with categorical regression using PROC GLM. Hopefully, there is much more data than this, as I see at least 18 different levels across the categorical variables, and only 12 results, so finding a least squares solution may be impossible for this small sample.
Steve Denham
I would not agree with Reeza. Your data did not conform to Normal data,so not suited for ANOVA.But I would try Log-Linear Model. data have; infile cards expandtabs truncover; input Dress_ID :$20. Style :$20. Price :$20. Rating $ Size :$20. Season :$20. NeckLine :$20. SleeveLength :$20. waiseline :$20. TotalSales ; cards; 1006032852 Sexy Low 4.6 M Summer o-neck sleevless empire 2114 1212192089 Casual Low 0 L Summer o-neck Petal natural 2160 1190380701 vintage High 0 L Automn o-neck full natural 2206 966005983 Brief Average 4.6 L Spring o-neck full natural 2252 876339541 cute Low 4.5 M Summer o-neck butterfly natural 2298 1068332458 bohemian Low 0 M Summer v-neck sleevless empire 2344 1220707172 Casual Average 0 XL Summer o-neck full null 2390 1219677488 Novelty Average 0 free Automn o-neck short natural 2436 1113094204 Flare Average 0 free Spring v-neck short empire 2482 985292672 bohemian Low 0 free Summer v-neck sleevless natural 2528 1117293701 party Average 5 free Summer o-neck full natural 2574 898481530 Flare Average 0 free Spring v-neck short null 2620 ; run; proc catmod data=have; weight totalsales; model Dress_ID*Style*Price*Rating*Size*Season*NeckLine*SleeveLength*waiseline =_response_ / noparm pred=freq; loglin Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline; quit; Your data actually is count data,so you could try Poisson Regression,But I would doubt the result, Since You have so many levels for each variable, especially for Dress_ID. Which mean you have N Dress_ID and you have N obs, I could doubt if it could be convergent. proc genmod data=have; class Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline; model totalsales= Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline / dist=poisson link=log; run;
@Ksharp I didn't recommend ANOVA 😉
I suggest you remove Dress_ID from model if you don't take it as a influence variable.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.