BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AdityaKir
Fluorite | Level 6

Hello,

 

I have the below data structure.

 

Dress_IDStylePriceRatingSizeSeasonNeckLineSleeveLengthwaiselineTotal Sales
1006032852SexyLow4.6MSummero-necksleevlessempire2114
1212192089CasualLow0LSummero-neckPetalnatural2160
1190380701vintageHigh0LAutomno-neckfullnatural2206
966005983BriefAverage4.6LSpringo-neckfullnatural2252
876339541cuteLow4.5MSummero-neckbutterflynatural2298
1068332458bohemianLow0MSummerv-necksleevlessempire2344
1220707172CasualAverage0XLSummero-neckfullnull2390
1219677488NoveltyAverage0freeAutomno-neckshortnatural2436
1113094204FlareAverage0freeSpringv-neckshortempire2482
985292672bohemianLow0freeSummerv-necksleevlessnatural2528
1117293701partyAverage5freeSummero-neckfullnatural2574
898481530FlareAverage0freeSpringv-neckshortnull

2620

 

 

I have to find out the factors which are effecting the Total Sales variable. Since most of the variables are categorical or rather all are, is using Anova a better option then going for Regression?

 

Regards,

 

Aditya

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Regression. ANOVA primarily deals with one variable at a time where regression does not and you can consider interaction terms in a regression model. 

 

Make sure you understand the assumptions for your regression model and how categorical values are handled. 

View solution in original post

6 REPLIES 6
Reeza
Super User

Regression. ANOVA primarily deals with one variable at a time where regression does not and you can consider interaction terms in a regression model. 

 

Make sure you understand the assumptions for your regression model and how categorical values are handled. 

SteveDenham
Jade | Level 19

I would agree with @Reeza, and go with categorical regression using PROC GLM.  Hopefully, there is much more data than this, as I see at least 18 different levels across the categorical variables, and only 12 results, so finding a least squares solution may be impossible for this small sample.

 

Steve Denham

Ksharp
Super User
I would not agree with Reeza.
Your data did not conform to Normal data,so not suited for ANOVA.But I would try Log-Linear Model.

data have;
infile cards expandtabs truncover;
input Dress_ID :$20.    Style :$20. Price :$20. Rating $ Size :$20.
Season  :$20. NeckLine  :$20. SleeveLength  :$20. waiseline :$20. TotalSales ;
cards;
1006032852  Sexy    Low 4.6 M   Summer  o-neck  sleevless empire   2114
1212192089  Casual  Low 0   L   Summer  o-neck  Petal   natural 2160
1190380701  vintage High    0   L   Automn  o-neck  full    natural 2206
966005983   Brief   Average 4.6 L   Spring  o-neck  full    natural 2252
876339541   cute    Low 4.5 M   Summer  o-neck  butterfly natural 2298
1068332458  bohemian    Low 0   M   Summer  v-neck  sleevless empire 2344
1220707172  Casual  Average 0   XL  Summer  o-neck  full    null 2390
1219677488  Novelty Average 0   free    Automn  o-neck  short   natural 2436
1113094204  Flare   Average 0   free    Spring  v-neck  short   empire  2482
985292672   bohemian    Low 0   free    Summer  v-neck  sleevless natural 2528
1117293701  party   Average 5   free    Summer  o-neck  full    natural 2574
898481530   Flare   Average 0   free    Spring  v-neck  short   null    2620
;
run;
proc catmod data=have;
weight totalsales;
model Dress_ID*Style*Price*Rating*Size*Season*NeckLine*SleeveLength*waiseline
=_response_
/ noparm pred=freq;
loglin Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline;
quit;






Your data actually is count data,so you could try Poisson Regression,But I would doubt the result,
Since You have so many levels for each variable, especially for Dress_ID.
Which mean you have N Dress_ID and you have N obs, I could doubt if it could be convergent.

proc genmod data=have;
class Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline;
model totalsales= Dress_ID Style Price Rating 
Size Season NeckLine SleeveLength waiseline / dist=poisson link=log;
run;


Reeza
Super User

@Ksharp I didn't recommend ANOVA 😉

Ksharp
Super User
I suggest you remove Dress_ID from model if you don't take it as a influence variable.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1982 views
  • 2 likes
  • 4 in conversation