BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AdityaKir
Fluorite | Level 6

Hello,

 

I am trying to use N-way Anova to check the dependency of a variables on my Response variable. But the problem is the execution has been going on for around 1 hour and still its not over. Below is the data structure. Does it really this long to process the data ?

 

BouncesExitsContinentSourcegroupTimeinpageUniquepageviewsVisitsBouncesNew
00OC(direct)18100
00N.America(direct)4100
00N.AmericaOthers35100
00N.Americapublic.tableausoftware.com70100
00N.Americapublic.tableausoftware.com81100
00N.Americapublic.tableausoftware.com75100
00N.Americapublic.tableausoftware.com186100
00N.America(direct)710100
00OC(direct)712110
00ASOthers344110
00EUOthers27110
00EUvisualisingdata.com0110
00N.AmericaOthers294110
00N.Americapublic.tableausoftware.com111110
00SA(direct)1430110
00N.America(direct)29110
00N.AmericaOthers637110

 

I have put just an abstract here, totally the CSV file consists of around 32000 rows.

 

My dependent variable is Exits and i have added all the rest of the variables as categorical variables.

 

Regards,

 

Aditya


SAS.png
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Yes - with user caution that this generally isn't recommended. But it sounds like you have more categories than observations so your running into dimenstionality problems. You can also look at clustering techniques to reduce your input variables. Perhaps varclus except I'm not sure how that will work when you have a lot of categorical data. Categorical data analysis is a weakness of mine 😞

View solution in original post

6 REPLIES 6
Reeza
Super User

No. I find sometimes SAS Studio will hang for an error rather than show an error. 

 

Check your selections. Also, how many unique combinations do you have compared to your N? 

 

Ie 3 level x 2 levels x ... x 2 levels = # of combinations

AdityaKir
Fluorite | Level 6

Hello,

 

To fix the problem I did a correlaion analysis and found that one of the variables had very less relationship of 0.00132 with the Dependent Variable and therefore when i removed it from the N-way Anova, I got the result in 5 mins.

 

Is this the right approach?

 

Regards,

 

Aditya

Reeza
Super User

Why not run individual anovas first and reduce the # of variables. 

AdityaKir
Fluorite | Level 6

Hello,

 

Are you suggesting run individual anovas and the variables whose P-value is not significant they should be omitted in the final model ?

 

Regards,

 

Aditya

Reeza
Super User

Yes - with user caution that this generally isn't recommended. But it sounds like you have more categories than observations so your running into dimenstionality problems. You can also look at clustering techniques to reduce your input variables. Perhaps varclus except I'm not sure how that will work when you have a lot of categorical data. Categorical data analysis is a weakness of mine 😞

AdityaKir
Fluorite | Level 6

Thanks

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1784 views
  • 0 likes
  • 2 in conversation