Hello,
I'm looking to get the SS between groups for each categorical variable of a model.
For instance, I have :
Data HAVE;
INPUT VAR_1 $ Var_2 $ Y $ ;
DATALINES;
0,50_A N 24,14
0,50_A N 30,47
0,50_B N 20,41
0,50_B N 17,6
0,50_B N 34,67
0,50_B N 25,29
0,50_B N 26,14
0,50_B N 22,89
0,50_B N 27,36
0,50_B O 41,85
0,50_B O 34,31
0,50_B O 22,82
0,50_B O 14,15
0,50_B O 20,87
0,50_B O 20,38
0,50_B O 20,33
0,76_0,80 O 20,3
0,76_0,80 O 42,98
0,81_0,86 O 23,61
0,9 O 24,91
;
run;
Where my model is Y = VAR_1 VAR_2
On Excel I managed to find the SS between groups for VAR_2 (which is 1,81) as the Proc ANOVA outputed. Nevertheless, when doing the same calculus but for VAR_1, I get a SS of 88,82 when the Proc ANOVA outputs 872.3818586.
The formula I use is this one : https://arc.lib.montana.edu/book/statistics-with-r-textbook/meta/img/Equation2.5.jpeg.
Where J is the number of groups (2 for VAR_2 and 5 for VAR_1) and nj is the number of observations in each group.
My questions are : How can i get the SS between groups (i.e the 88,82 that I got with the formula) automatically with SAS for all the variables of my model ? Besides how is the Anova SS for each variable is calculated ?
Thank you for your help.
Hi @Mathis1
I have compared datalines and your SAS dataset.
It seems that there are some discrepancies between group allocations for Y values.
I think this is why you get different results.
Best,
Hi @Mathis1
Here is what I get with PROC ANOVA and PROC GLM (88.8). Ho did you get 872.3818586? Could you please share the code you used?
Data HAVE;
INPUT VAR_1 $ Var_2 $ Y ;
DATALINES;
0,50_A N 24.14
0,50_A N 30.47
0,50_B N 20.41
0,50_B N 17.6
0,50_B N 34.67
0,50_B N 25.29
0,50_B N 26.14
0,50_B N 22.89
0,50_B N 27.36
0,50_B O 41.85
0,50_B O 34.31
0,50_B O 22.82
0,50_B O 14.15
0,50_B O 20.87
0,50_B O 20.38
0,50_B O 20.33
0,76_0,80 O 20.3
0,76_0,80 O 42.98
0,81_0,86 O 23.61
0,9 O 24.91
;
run;
proc glm data=have;
class VAR_1 VAR_2;
model Y = VAR_1 VAR_2;
run;
Output (my apologies for the French display):
You get similar results with PROC ANOVA:
proc anova data=have;
class VAR_1 VAR_2;
model Y = VAR_1 VAR_2;
run;
Best,
Hello @ed_sas_member and thank you very much for your reply.
Actually, i didn't try running the proc anova on this table and i'm glad that it outputed those results. However I tried the proc anova on an other table with more variables but exactly the same groups for those 2 variables, and of course the same Y variable.
Find attached the table I'm talking about. You will find the same VAR_1, VAR_2 and Y variables, but also extra variables.
When executing :
proc anova data = HAVE_2 outstat= ANOVA ;
class VAR_1 VAR_2;
model Y = VAR_1 VAR_2;
run;
I find the 872 i was mentionning ealier. I don't see why the presence of extra columns would change the results...
Thank you 😉
Hi @Mathis1
I have compared datalines and your SAS dataset.
It seems that there are some discrepancies between group allocations for Y values.
I think this is why you get different results.
Best,
PROC ANOVA should not be used here. It should only be used for cases where the data is balanced (equal numbers in each cell) or a one-way analysis of variance (which this is not). So I would ignore the PROC ANOVA results.
Totally agree with @PaigeMiller
-> please see the warning message in the log when you run PROC ANOVA:
WARNING: PROC ANOVA has determined that the number of observations in each cell is not equal.
PROC GLM may be more appropriate.
Best,
Thank you very much ed_sas_member, this is where the problem came from !
About proc Anova, this is the only way I know for getting the SS between groups. Is there any way to get this variance given by proc anova with the proc GLM and one of its option ?
Thanks 🙂
@ed_sas_member already showed you where the SS for groups is in the PROC GLM output:
The part in yellow is the sum of squares.
SteveDenham
This part is giving me the SS for groups for all the variable. The Anova gives me the SS between groups for each variable. I can't find those results in the proc glm.
I'm talking about these SS, corresponding to the attached table in my earlier post (and where VAR_1 became CRM2 and VAR_2 became PetitRouleur, but it doesn't matter).
Hi @Mathis1
Please try this option:
proc glm data=have;
class VAR_1 VAR_2;
model Y = VAR_1 VAR_2 / e3;
run;
Hi @Mathis1 ,
Head here https://support.sas.com/documentation/onlinedoc/stat/141/glm.pdf . Drop down to the Getting Started section. the second page will give you some example outputs. There are Type I and Type III sums of squares for each variable. You will likely want the Type III.
SteveDenham
Remember that you have unbalanced data. Running PROC ANOVA on unbalanced data will give the following in the log window (using interactive SAS):
WARNING: PROC ANOVA has determined that the number of observations in each cell is not equal.
PROC GLM may be more appropriate.
This warning in the log is telling you that the SS presented by PROC ANOVA are NOT accurate for your data. Please stop assuming that PROC ANOVA is a gold standard. For unbalanced data, if you want SS, use the Type III sums of squares from PROC GLM>
SteveDenham
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.