# What to use anova or regression model

Hello,

I have the below data structure.

 Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline Total Sales 1006032852 Sexy Low 4.6 M Summer o-neck sleevless empire 2114 1212192089 Casual Low 0 L Summer o-neck Petal natural 2160 1190380701 vintage High 0 L Automn o-neck full natural 2206 966005983 Brief Average 4.6 L Spring o-neck full natural 2252 876339541 cute Low 4.5 M Summer o-neck butterfly natural 2298 1068332458 bohemian Low 0 M Summer v-neck sleevless empire 2344 1220707172 Casual Average 0 XL Summer o-neck full null 2390 1219677488 Novelty Average 0 free Automn o-neck short natural 2436 1113094204 Flare Average 0 free Spring v-neck short empire 2482 985292672 bohemian Low 0 free Summer v-neck sleevless natural 2528 1117293701 party Average 5 free Summer o-neck full natural 2574 898481530 Flare Average 0 free Spring v-neck short null 2620

I have to find out the factors which are effecting the Total Sales variable. Since most of the variables are categorical or rather all are, is using Anova a better option then going for Regression?

Regards,

Re: What to you anova or regression model

Regression. ANOVA primarily deals with one variable at a time where regression does not and you can consider interaction terms in a regression model.

Make sure you understand the assumptions for your regression model and how categorical values are handled.

Re: What to you anova or regression model

## Re: What to you anova or regression model

I would agree with @Reeza, and go with categorical regression using PROC GLM.  Hopefully, there is much more data than this, as I see at least 18 different levels across the categorical variables, and only 12 results, so finding a least squares solution may be impossible for this small sample.

Steve Denham

Re: What to use anova or regression model

I would not agree with Reeza.
Your data did not conform to Normal data,so not suited for ANOVA.But I would try Log-Linear Model.

data have;
infile cards expandtabs truncover;
input Dress_ID :\$20.    Style :\$20. Price :\$20. Rating \$ Size :\$20.
Season  :\$20. NeckLine  :\$20. SleeveLength  :\$20. waiseline :\$20. TotalSales ;
cards;
1006032852  Sexy    Low 4.6 M   Summer  o-neck  sleevless empire   2114
1212192089  Casual  Low 0   L   Summer  o-neck  Petal   natural 2160
1190380701  vintage High    0   L   Automn  o-neck  full    natural 2206
966005983   Brief   Average 4.6 L   Spring  o-neck  full    natural 2252
876339541   cute    Low 4.5 M   Summer  o-neck  butterfly natural 2298
1068332458  bohemian    Low 0   M   Summer  v-neck  sleevless empire 2344
1220707172  Casual  Average 0   XL  Summer  o-neck  full    null 2390
1219677488  Novelty Average 0   free    Automn  o-neck  short   natural 2436
1113094204  Flare   Average 0   free    Spring  v-neck  short   empire  2482
985292672   bohemian    Low 0   free    Summer  v-neck  sleevless natural 2528
1117293701  party   Average 5   free    Summer  o-neck  full    natural 2574
898481530   Flare   Average 0   free    Spring  v-neck  short   null    2620
;
run;
proc catmod data=have;
weight totalsales;
model Dress_ID*Style*Price*Rating*Size*Season*NeckLine*SleeveLength*waiseline
=_response_
/ noparm pred=freq;
loglin Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline;
quit;

Your data actually is count data,so you could try Poisson Regression,But I would doubt the result,
Since You have so many levels for each variable, especially for Dress_ID.
Which mean you have N Dress_ID and you have N obs, I could doubt if it could be convergent.

proc genmod data=have;
class Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline;
model totalsales= Dress_ID Style Price Rating
Size Season NeckLine SleeveLength waiseline / dist=poisson link=log;
run;

```
Re: What to use anova or regression model

@Ksharp I didn't recommend ANOVA

Re: What to use anova or regression model

```I suggest you remove Dress_ID from model if you don't take it as a influence variable.