Hello,
I am trying to use N-way Anova to check the dependency of a variables on my Response variable. But the problem is the execution has been going on for around 1 hour and still its not over. Below is the data structure. Does it really this long to process the data ?
Bounces | Exits | Continent | Sourcegroup | Timeinpage | Uniquepageviews | Visits | BouncesNew |
0 | 0 | OC | (direct) | 18 | 1 | 0 | 0 |
0 | 0 | N.America | (direct) | 4 | 1 | 0 | 0 |
0 | 0 | N.America | Others | 35 | 1 | 0 | 0 |
0 | 0 | N.America | public.tableausoftware.com | 70 | 1 | 0 | 0 |
0 | 0 | N.America | public.tableausoftware.com | 81 | 1 | 0 | 0 |
0 | 0 | N.America | public.tableausoftware.com | 75 | 1 | 0 | 0 |
0 | 0 | N.America | public.tableausoftware.com | 186 | 1 | 0 | 0 |
0 | 0 | N.America | (direct) | 710 | 1 | 0 | 0 |
0 | 0 | OC | (direct) | 712 | 1 | 1 | 0 |
0 | 0 | AS | Others | 344 | 1 | 1 | 0 |
0 | 0 | EU | Others | 27 | 1 | 1 | 0 |
0 | 0 | EU | visualisingdata.com | 0 | 1 | 1 | 0 |
0 | 0 | N.America | Others | 294 | 1 | 1 | 0 |
0 | 0 | N.America | public.tableausoftware.com | 111 | 1 | 1 | 0 |
0 | 0 | SA | (direct) | 1430 | 1 | 1 | 0 |
0 | 0 | N.America | (direct) | 29 | 1 | 1 | 0 |
0 | 0 | N.America | Others | 637 | 1 | 1 | 0 |
I have put just an abstract here, totally the CSV file consists of around 32000 rows.
My dependent variable is Exits and i have added all the rest of the variables as categorical variables.
Regards,
Aditya
Yes - with user caution that this generally isn't recommended. But it sounds like you have more categories than observations so your running into dimenstionality problems. You can also look at clustering techniques to reduce your input variables. Perhaps varclus except I'm not sure how that will work when you have a lot of categorical data. Categorical data analysis is a weakness of mine 😞
No. I find sometimes SAS Studio will hang for an error rather than show an error.
Check your selections. Also, how many unique combinations do you have compared to your N?
Ie 3 level x 2 levels x ... x 2 levels = # of combinations
Hello,
To fix the problem I did a correlaion analysis and found that one of the variables had very less relationship of 0.00132 with the Dependent Variable and therefore when i removed it from the N-way Anova, I got the result in 5 mins.
Is this the right approach?
Regards,
Aditya
Why not run individual anovas first and reduce the # of variables.
Hello,
Are you suggesting run individual anovas and the variables whose P-value is not significant they should be omitted in the final model ?
Regards,
Aditya
Yes - with user caution that this generally isn't recommended. But it sounds like you have more categories than observations so your running into dimenstionality problems. You can also look at clustering techniques to reduce your input variables. Perhaps varclus except I'm not sure how that will work when you have a lot of categorical data. Categorical data analysis is a weakness of mine 😞
Thanks
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.