Solved: N-way Anova duration

AdityaKir · Posted 09-28-2016 04:42 AM

Hello,

I am trying to use N-way Anova to check the dependency of a variables on my Response variable. But the problem is the execution has been going on for around 1 hour and still its not over. Below is the data structure. Does it really this long to process the data ?

Bounces	Exits	Continent	Sourcegroup	Timeinpage	Uniquepageviews	Visits	BouncesNew
0	0	OC	(direct)	18	1	0	0
0	0	N.America	(direct)	4	1	0	0
0	0	N.America	Others	35	1	0	0
0	0	N.America	public.tableausoftware.com	70	1	0	0
0	0	N.America	public.tableausoftware.com	81	1	0	0
0	0	N.America	public.tableausoftware.com	75	1	0	0
0	0	N.America	public.tableausoftware.com	186	1	0	0
0	0	N.America	(direct)	710	1	0	0
0	0	OC	(direct)	712	1	1	0
0	0	AS	Others	344	1	1	0
0	0	EU	Others	27	1	1	0
0	0	EU	visualisingdata.com	0	1	1	0
0	0	N.America	Others	294	1	1	0
0	0	N.America	public.tableausoftware.com	111	1	1	0
0	0	SA	(direct)	1430	1	1	0
0	0	N.America	(direct)	29	1	1	0
0	0	N.America	Others	637	1	1	0

I have put just an abstract here, totally the CSV file consists of around 32000 rows.

My dependent variable is Exits and i have added all the rest of the variables as categorical variables.

Regards,

Aditya

Reeza · Posted 09-28-2016 08:33 AM

Yes - with user caution that this generally isn't recommended. But it sounds like you have more categories than observations so your running into dimenstionality problems. You can also look at clustering techniques to reduce your input variables. Perhaps varclus except I'm not sure how that will work when you have a lot of categorical data. Categorical data analysis is a weakness of mine 😞

View solution in original post

Reeza · Posted 09-28-2016 05:21 AM

No. I find sometimes SAS Studio will hang for an error rather than show an error.

Check your selections. Also, how many unique combinations do you have compared to your N?

Ie 3 level x 2 levels x ... x 2 levels = # of combinations

AdityaKir · Posted 09-28-2016 07:22 AM

Hello,

To fix the problem I did a correlaion analysis and found that one of the variables had very less relationship of 0.00132 with the Dependent Variable and therefore when i removed it from the N-way Anova, I got the result in 5 mins.

Is this the right approach?

Regards,

Aditya

Reeza · Posted 09-28-2016 07:59 AM

Why not run individual anovas first and reduce the # of variables.

AdityaKir · Posted 09-28-2016 08:21 AM

Hello,

Are you suggesting run individual anovas and the variables whose P-value is not significant they should be omitted in the final model ?

Regards,

Aditya

Reeza · Posted 09-28-2016 08:33 AM

Yes - with user caution that this generally isn't recommended. But it sounds like you have more categories than observations so your running into dimenstionality problems. You can also look at clustering techniques to reduce your input variables. Perhaps varclus except I'm not sure how that will work when you have a lot of categorical data. Categorical data analysis is a weakness of mine 😞

AdityaKir · Posted 09-28-2016 08:35 AM

Thanks

N-way Anova duration

Re: N-way Anova duration

Re: N-way Anova duration

Re: N-way Anova duration

Re: N-way Anova duration

Re: N-way Anova duration

Re: N-way Anova duration

Re: N-way Anova duration

SAS Innovate 2025: Register Now