BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mathis1
Quartz | Level 8

Hello, 

I'm looking to get the SS between groups for each categorical variable of a model. 

 

For instance, I have :

Data HAVE;
INPUT VAR_1 $ Var_2 $ Y $ ;
DATALINES;
0,50_A N 24,14
0,50_A N 30,47
0,50_B N 20,41
0,50_B N 17,6
0,50_B N 34,67
0,50_B N 25,29
0,50_B N 26,14
0,50_B N 22,89
0,50_B N 27,36
0,50_B O 41,85
0,50_B O 34,31
0,50_B O 22,82
0,50_B O 14,15
0,50_B O 20,87
0,50_B O 20,38
0,50_B O 20,33
0,76_0,80 O 20,3
0,76_0,80 O 42,98
0,81_0,86 O 23,61
0,9 O 24,91
;
run;

Where  my model is Y = VAR_1 VAR_2

 

On Excel I managed to find the SS between groups for VAR_2 (which is 1,81) as the Proc ANOVA  outputed. Nevertheless, when doing the same calculus but for VAR_1, I get a SS of 88,82 when the Proc ANOVA outputs 872.3818586.

 

The formula I use is this one : https://arc.lib.montana.edu/book/statistics-with-r-textbook/meta/img/Equation2.5.jpeg

Where J is the number of groups (2 for VAR_2 and 5 for VAR_1) and nj is the number of observations in each group. 

 

My questions are : How can i get the SS between groups (i.e the 88,82 that I got with the formula) automatically with SAS for all the variables of my model ? Besides how is the Anova SS for each variable is calculated ? 

Thank you for your help.

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @Mathis1 

I have compared datalines and your SAS dataset.

It seems that there are some discrepancies between group allocations for Y values.

I think this is why you get different results.

Best,

Capture d’écran 2020-05-05 à 12.33.03.png

View solution in original post

13 REPLIES 13
ed_sas_member
Meteorite | Level 14

Hi @Mathis1 

Here is what I get with PROC ANOVA and PROC GLM (88.8). Ho did you get 872.3818586? Could you please share the code you used?

Data HAVE;
INPUT VAR_1 $ Var_2 $ Y ;
DATALINES;
0,50_A N 24.14
0,50_A N 30.47
0,50_B N 20.41
0,50_B N 17.6
0,50_B N 34.67
0,50_B N 25.29
0,50_B N 26.14
0,50_B N 22.89
0,50_B N 27.36
0,50_B O 41.85
0,50_B O 34.31
0,50_B O 22.82
0,50_B O 14.15
0,50_B O 20.87
0,50_B O 20.38
0,50_B O 20.33
0,76_0,80 O 20.3
0,76_0,80 O 42.98
0,81_0,86 O 23.61
0,9 O 24.91
;
run;

proc glm data=have;
	class VAR_1 VAR_2;
	model Y = VAR_1 VAR_2;
run;

Output (my apologies for the French display):

Capture d’écran 2020-05-05 à 11.47.00.png

 

You get similar results with PROC ANOVA:

proc anova data=have;
	class VAR_1 VAR_2;
	model Y = VAR_1 VAR_2;
run;

Capture d’écran 2020-05-05 à 11.48.35.png

Best,

Mathis1
Quartz | Level 8

Hello @ed_sas_member and thank you very much for your reply. 

 

Actually, i didn't try running the proc anova on this table and i'm glad that it outputed those results. However I tried the proc anova on an other table with more variables but exactly the same groups for those 2 variables, and of course the same Y variable. 

Find attached the table I'm talking about.  You will find the same VAR_1, VAR_2 and Y variables, but also extra variables. 

 

When executing : 

proc anova data = HAVE_2 outstat= ANOVA ;
class VAR_1 VAR_2;
model Y = VAR_1 VAR_2;
run; 

I find the 872 i was mentionning ealier. I don't see why the presence of extra columns would change the results... 

Thank you 😉

 

ed_sas_member
Meteorite | Level 14

Hi @Mathis1 

I have compared datalines and your SAS dataset.

It seems that there are some discrepancies between group allocations for Y values.

I think this is why you get different results.

Best,

Capture d’écran 2020-05-05 à 12.33.03.png

PaigeMiller
Diamond | Level 26

PROC ANOVA should not be used here. It should only be used for cases where the data is balanced (equal numbers in each cell) or a one-way analysis of variance (which this is not). So I would ignore the PROC ANOVA results.

--
Paige Miller
ed_sas_member
Meteorite | Level 14

Totally agree with @PaigeMiller 

-> please see the warning message in the log when you run PROC ANOVA:

 WARNING: PROC ANOVA has determined that the number of observations in each cell is not equal. 
PROC GLM may be more appropriate.

Best,

Mathis1
Quartz | Level 8

Thank you very much ed_sas_member, this is where the problem came from !

About proc Anova, this is the only way I know for getting the SS between groups. Is there any way to get this variance given by proc anova with the proc GLM and one of its option ?

 

Thanks 🙂

SteveDenham
Jade | Level 19

@ed_sas_member already showed you where the SS for groups is in the PROC GLM output:

 

SteveDenham_4-1588686300070.png

The part in yellow is the sum of squares.

 

SteveDenham

 

Mathis1
Quartz | Level 8

This part is giving me the SS for groups for all the variable. The Anova gives me the SS between groups for each variable. I can't find those results in the proc glm.

 

Capture_ANOVA.PNG

 

I'm talking about these SS, corresponding to the attached table in my earlier post (and where VAR_1 became CRM2 and VAR_2 became PetitRouleur, but it doesn't matter). 

ed_sas_member
Meteorite | Level 14

Hi @Mathis1 

Please try this option:


proc glm data=have;
	class VAR_1 VAR_2;
	model Y = VAR_1 VAR_2 / e3;
run;
Mathis1
Quartz | Level 8
Hmmm it doesn't seem to get me the SS given by the ANOVA. It's not a big deal anyway, i can just run the ANOVA if looking to get those SS between groups... It was just in order to simplify my code with an option on the proc glm.
SteveDenham
Jade | Level 19

Hi @Mathis1 ,

 

Head here https://support.sas.com/documentation/onlinedoc/stat/141/glm.pdf .  Drop down to the Getting Started section. the second page will give you some example outputs.  There are Type I and Type III sums of squares for each variable.  You will likely want the Type III.

 

SteveDenham

SteveDenham
Jade | Level 19

Remember that you have unbalanced data.  Running PROC ANOVA on unbalanced data will give the following in the log window (using interactive SAS):

 

WARNING: PROC ANOVA has determined that the number of observations in each cell is not equal.
PROC GLM may be more appropriate.

 

This warning in the log is telling you that the SS presented by PROC ANOVA are NOT accurate for your data. Please stop assuming that PROC ANOVA is a gold standard.  For unbalanced data, if you want SS, use the Type III sums of squares from PROC GLM>

 

SteveDenham

Mathis1
Quartz | Level 8
Hello and thank you for your reply.
I'm aware about Type I and III SS in the PROC GLM and about the Warning message thank you 🙂 . I was just ingenuously asking myself if there was any way to get the same SS outputed by Proc ANOVA with the proc GLM, although (and I completely agree with you) they may be totally irrelevant. I'm not assuming that PROC ANOVA is a gold standard. This morning I was not even sure how the SS outputed by this proc were computed. That's how much I'm not assuming anything about proc ANOVA 🙂

But thank you for helping though...

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 1478 views
  • 6 likes
  • 4 in conversation