BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
audreyk
Calcite | Level 5

I have a 2x2 design.   To make it simple, here’s a small data set with two factors, A and B, each with 2 levels and a DV ,Y. The design is perfectly balanced:
The results from the ANOVA portion of proc glm produce different findings than the solution, which doesn’t make sense. The solution option in the model statement provides regression estimates of  effects of the dummy-coded variables. Given that both IVs are only 2 levels, each estimable effect corresponds to the effects in the design (i.e., A, B and A*B). Therefore the p-values for the F-tests should be identical to those of the parameter estimates from the regression solution, correct?

data exp;

      input A $ B $ Y @@;

      datalines;

   A1 B1 12 A1 B1 14     A1 B2 11 A1 B2 9

   A1 B1 13 A1 B1 12     A1 B2 10 A1 B2 8

   A2 B1 20 A2 B1 18     A2 B2 17 A2 B2 10

   A2 B1 22 A2 B1 20     A2 B2 19 A2 B2 11

;

   proc glm data=exp;

      class A B;

      model Y=A B A*B/SOLUTION;

run;

Dependent Variable: Y

Sum of

       Source                      DF         Squares     Mean Square    F Value Pr > F

       Model                        3     231.2500000      77.0833333      12.42 0.0005

       Error                       12      74.5000000       6.2083333

       Corrected Total             15     305.7500000

R-Square     Coeff Var      Root MSE        Y Mean

0.756337      17.64002      2.491653      14.12500

       Source                      DF       Type I SS     Mean Square    F Value Pr > F

       A                            1     144.0000000     144.0000000      23.19 0.0004

       B                            1      81.0000000      81.0000000      13.05 0.0036

       A*B                          1       6.2500000       6.2500000       1.01 0.3355

       Source                      DF     Type III SS     Mean Square    F Value Pr > F

       A                            1     144.0000000     144.0000000      23.19 0.0004

       B                            1      81.0000000      81.0000000      13.05 0.0036

       A*B                          1       6.2500000       6.2500000       1.01 0.3355

Standard

Parameter Estimate             Error    t Value Pr > |t|

Intercept            14.25000000 B      1.24582637      11.44 <.0001

            A         A1         -4.75000000 B      1.76186454      -2.70 0.0195

            A         A2          0.00000000 B       . .         .

            B         B1          5.75000000 B      1.76186454       3.26 0.0068

            B         B2          0.00000000 B       .                .         .

            A*B       A1 B1      -2.50000000 B      2.49165273      -1.00 0.3355

            A*B       A1 B2       0.00000000 B       .                .         .

            A*B       A2 B1       0.00000000 B       .                .         .

            A*B       A2 B2       0.00000000 B       .                .         .

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve

      the normal equations.  Terms whose estimates are followed by the letter 'B' are not

      uniquely estimable.

  Shouldn’t the p-value for the effect of B in the type III SS  F-test  (p=.0036) be equal to p-value for the parameter B B (p=.0068)?

1 ACCEPTED SOLUTION

Accepted Solutions
JamesRoger
Calcite | Level 5

The t test in the Solution for B main effect is testing the difference between A2B1 and A2B2. This is because of the constraint that SAS is using.So it is looking at [A2B1 - A2B2]

On the other hand the F test is testing the main effect of B in the presence of an interaction A*B using type 3. That is it test in the margin by assuming equal weight for each level of A. So this is testing ([A2B1 - A2B2] + [A1B1 - A1B2])/2.

Finally several very erudite Statisticians would claim that you should not be using either of these. They would say "Never test a main effect in the presence of an interaction". That is as soon as you believe in the presence of an interaction, you need to ask about the the role of B when A=A1, and again the role of B when A=A2.

Hope that this helps.

James.

View solution in original post

4 REPLIES 4
ballardw
Super User

Different test=> different test statistic from different calculations, very likely to have different p values.

JamesRoger
Calcite | Level 5

The t test in the Solution for B main effect is testing the difference between A2B1 and A2B2. This is because of the constraint that SAS is using.So it is looking at [A2B1 - A2B2]

On the other hand the F test is testing the main effect of B in the presence of an interaction A*B using type 3. That is it test in the margin by assuming equal weight for each level of A. So this is testing ([A2B1 - A2B2] + [A1B1 - A1B2])/2.

Finally several very erudite Statisticians would claim that you should not be using either of these. They would say "Never test a main effect in the presence of an interaction". That is as soon as you believe in the presence of an interaction, you need to ask about the the role of B when A=A1, and again the role of B when A=A2.

Hope that this helps.

James.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

The question from the OP has come up several times in the past. See, for instance,

for a more complex problem (more complex model) that has bearing on the question. For a simple model with one factor and two levels, as an example, there is agreement between the different tests. But once you have interactions, things are different. This is because the tests in the ANOVA table are for expected values (such as differences of means), not just for individual parameters. With an interaction, a main effect mean is a linear combination of the coefficent(s) for the listed effect (say, B) AND some of the coefficients for the interaction (say, for B*A). You can see what parameters are used for each test by adding an E option on the model statement.

audreyk
Calcite | Level 5

Indeed, and this constraint is not trivial.  The "solution" option in proc glm uses dummy coding which renders the parameter estimates for main effects uninterpretable as main effects (philosophical question of whether one should interpret main effects notwithstanding). If regression estimates are desired for the main effects in a 2x2, orthogonal contrast coding is required. Normally when running an experiment one is not interested in regression coefficient. However there are cases where the coeffiicents are desireable -- for example when testing mediation and estimating indirect effects. In such cases, DONT use the 'solution' option in proc glm.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2869 views
  • 0 likes
  • 4 in conversation