- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm looking to get the SS between groups for each categorical variable of a model.
For instance, I have :
Data HAVE;
INPUT VAR_1 $ Var_2 $ Y $ ;
DATALINES;
0,50_A N 24,14
0,50_A N 30,47
0,50_B N 20,41
0,50_B N 17,6
0,50_B N 34,67
0,50_B N 25,29
0,50_B N 26,14
0,50_B N 22,89
0,50_B N 27,36
0,50_B O 41,85
0,50_B O 34,31
0,50_B O 22,82
0,50_B O 14,15
0,50_B O 20,87
0,50_B O 20,38
0,50_B O 20,33
0,76_0,80 O 20,3
0,76_0,80 O 42,98
0,81_0,86 O 23,61
0,9 O 24,91
;
run;
Where my model is Y = VAR_1 VAR_2
On Excel I managed to find the SS between groups for VAR_2 (which is 1,81) as the Proc ANOVA outputed. Nevertheless, when doing the same calculus but for VAR_1, I get a SS of 88,82 when the Proc ANOVA outputs 872.3818586.
The formula I use is this one : https://arc.lib.montana.edu/book/statistics-with-r-textbook/meta/img/Equation2.5.jpeg.
Where J is the number of groups (2 for VAR_2 and 5 for VAR_1) and nj is the number of observations in each group.
My questions are : How can i get the SS between groups (i.e the 88,82 that I got with the formula) automatically with SAS for all the variables of my model ? Besides how is the Anova SS for each variable is calculated ?
Thank you for your help.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Mathis1
I have compared datalines and your SAS dataset.
It seems that there are some discrepancies between group allocations for Y values.
I think this is why you get different results.
Best,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Mathis1
Here is what I get with PROC ANOVA and PROC GLM (88.8). Ho did you get 872.3818586? Could you please share the code you used?
Data HAVE;
INPUT VAR_1 $ Var_2 $ Y ;
DATALINES;
0,50_A N 24.14
0,50_A N 30.47
0,50_B N 20.41
0,50_B N 17.6
0,50_B N 34.67
0,50_B N 25.29
0,50_B N 26.14
0,50_B N 22.89
0,50_B N 27.36
0,50_B O 41.85
0,50_B O 34.31
0,50_B O 22.82
0,50_B O 14.15
0,50_B O 20.87
0,50_B O 20.38
0,50_B O 20.33
0,76_0,80 O 20.3
0,76_0,80 O 42.98
0,81_0,86 O 23.61
0,9 O 24.91
;
run;
proc glm data=have;
class VAR_1 VAR_2;
model Y = VAR_1 VAR_2;
run;
Output (my apologies for the French display):
You get similar results with PROC ANOVA:
proc anova data=have;
class VAR_1 VAR_2;
model Y = VAR_1 VAR_2;
run;
Best,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @ed_sas_member and thank you very much for your reply.
Actually, i didn't try running the proc anova on this table and i'm glad that it outputed those results. However I tried the proc anova on an other table with more variables but exactly the same groups for those 2 variables, and of course the same Y variable.
Find attached the table I'm talking about. You will find the same VAR_1, VAR_2 and Y variables, but also extra variables.
When executing :
proc anova data = HAVE_2 outstat= ANOVA ;
class VAR_1 VAR_2;
model Y = VAR_1 VAR_2;
run;
I find the 872 i was mentionning ealier. I don't see why the presence of extra columns would change the results...
Thank you 😉
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Mathis1
I have compared datalines and your SAS dataset.
It seems that there are some discrepancies between group allocations for Y values.
I think this is why you get different results.
Best,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PROC ANOVA should not be used here. It should only be used for cases where the data is balanced (equal numbers in each cell) or a one-way analysis of variance (which this is not). So I would ignore the PROC ANOVA results.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Totally agree with @PaigeMiller
-> please see the warning message in the log when you run PROC ANOVA:
WARNING: PROC ANOVA has determined that the number of observations in each cell is not equal.
PROC GLM may be more appropriate.
Best,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much ed_sas_member, this is where the problem came from !
About proc Anova, this is the only way I know for getting the SS between groups. Is there any way to get this variance given by proc anova with the proc GLM and one of its option ?
Thanks 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@ed_sas_member already showed you where the SS for groups is in the PROC GLM output:
The part in yellow is the sum of squares.
SteveDenham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This part is giving me the SS for groups for all the variable. The Anova gives me the SS between groups for each variable. I can't find those results in the proc glm.
I'm talking about these SS, corresponding to the attached table in my earlier post (and where VAR_1 became CRM2 and VAR_2 became PetitRouleur, but it doesn't matter).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Mathis1
Please try this option:
proc glm data=have;
class VAR_1 VAR_2;
model Y = VAR_1 VAR_2 / e3;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Mathis1 ,
Head here https://support.sas.com/documentation/onlinedoc/stat/141/glm.pdf . Drop down to the Getting Started section. the second page will give you some example outputs. There are Type I and Type III sums of squares for each variable. You will likely want the Type III.
SteveDenham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Remember that you have unbalanced data. Running PROC ANOVA on unbalanced data will give the following in the log window (using interactive SAS):
WARNING: PROC ANOVA has determined that the number of observations in each cell is not equal.
PROC GLM may be more appropriate.
This warning in the log is telling you that the SS presented by PROC ANOVA are NOT accurate for your data. Please stop assuming that PROC ANOVA is a gold standard. For unbalanced data, if you want SS, use the Type III sums of squares from PROC GLM>
SteveDenham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm aware about Type I and III SS in the PROC GLM and about the Warning message thank you 🙂 . I was just ingenuously asking myself if there was any way to get the same SS outputed by Proc ANOVA with the proc GLM, although (and I completely agree with you) they may be totally irrelevant. I'm not assuming that PROC ANOVA is a gold standard. This morning I was not even sure how the SS outputed by this proc were computed. That's how much I'm not assuming anything about proc ANOVA 🙂
But thank you for helping though...