Programming the statistical procedures from SAS

Why are there different p-values for F-tests and soluation t-tests in proc glm with a balanced design?

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Why are there different p-values for F-tests and soluation t-tests in proc glm with a balanced design?

I have a 2x2 design.   To make it simple, here’s a small data set with two factors, A and B, each with 2 levels and a DV ,Y. The design is perfectly balanced:
The results from the ANOVA portion of proc glm produce different findings than the solution, which doesn’t make sense. The solution option in the model statement provides regression estimates of  effects of the dummy-coded variables. Given that both IVs are only 2 levels, each estimable effect corresponds to the effects in the design (i.e., A, B and A*B). Therefore the p-values for the F-tests should be identical to those of the parameter estimates from the regression solution, correct?

data exp;

      input A $ B $ Y @@;

      datalines;

   A1 B1 12 A1 B1 14     A1 B2 11 A1 B2 9

   A1 B1 13 A1 B1 12     A1 B2 10 A1 B2 8

   A2 B1 20 A2 B1 18     A2 B2 17 A2 B2 10

   A2 B1 22 A2 B1 20     A2 B2 19 A2 B2 11

;

   proc glm data=exp;

      class A B;

      model Y=A B A*B/SOLUTION;

run;

Dependent Variable: Y

Sum of

       Source                      DF         Squares     Mean Square    F Value Pr > F

       Model                        3     231.2500000      77.0833333      12.42 0.0005

       Error                       12      74.5000000       6.2083333

       Corrected Total             15     305.7500000

R-Square     Coeff Var      Root MSE        Y Mean

0.756337      17.64002      2.491653      14.12500

       Source                      DF       Type I SS     Mean Square    F Value Pr > F

       A                            1     144.0000000     144.0000000      23.19 0.0004

       B                            1      81.0000000      81.0000000      13.05 0.0036

       A*B                          1       6.2500000       6.2500000       1.01 0.3355

       Source                      DF     Type III SS     Mean Square    F Value Pr > F

       A                            1     144.0000000     144.0000000      23.19 0.0004

       B                            1      81.0000000      81.0000000      13.05 0.0036

       A*B                          1       6.2500000       6.2500000       1.01 0.3355

Standard

Parameter Estimate             Error    t Value Pr > |t|

Intercept            14.25000000 B      1.24582637      11.44 <.0001

            A         A1         -4.75000000 B      1.76186454      -2.70 0.0195

            A         A2          0.00000000 B       . .         .

            B         B1          5.75000000 B      1.76186454       3.26 0.0068

            B         B2          0.00000000 B       .                .         .

            A*B       A1 B1      -2.50000000 B      2.49165273      -1.00 0.3355

            A*B       A1 B2       0.00000000 B       .                .         .

            A*B       A2 B1       0.00000000 B       .                .         .

            A*B       A2 B2       0.00000000 B       .                .         .

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve

      the normal equations.  Terms whose estimates are followed by the letter 'B' are not

      uniquely estimable.

  Shouldn’t the p-value for the effect of B in the type III SS  F-test  (p=.0036) be equal to p-value for the parameter B B (p=.0068)?


Accepted Solutions
Solution
‎03-10-2015 02:29 PM
New Contributor
Posts: 3

Re: Why are there different p-values for F-tests and soluation t-tests in proc glm with a balanced design?

The t test in the Solution for B main effect is testing the difference between A2B1 and A2B2. This is because of the constraint that SAS is using.So it is looking at [A2B1 - A2B2]

On the other hand the F test is testing the main effect of B in the presence of an interaction A*B using type 3. That is it test in the margin by assuming equal weight for each level of A. So this is testing ([A2B1 - A2B2] + [A1B1 - A1B2])/2.

Finally several very erudite Statisticians would claim that you should not be using either of these. They would say "Never test a main effect in the presence of an interaction". That is as soon as you believe in the presence of an interaction, you need to ask about the the role of B when A=A1, and again the role of B when A=A2.

Hope that this helps.

James.

View solution in original post


All Replies
Grand Advisor
Posts: 10,026

Re: Why are there different p-values for F-tests and soluation t-tests in proc glm with a balanced design?

Different test=> different test statistic from different calculations, very likely to have different p values.

Solution
‎03-10-2015 02:29 PM
New Contributor
Posts: 3

Re: Why are there different p-values for F-tests and soluation t-tests in proc glm with a balanced design?

The t test in the Solution for B main effect is testing the difference between A2B1 and A2B2. This is because of the constraint that SAS is using.So it is looking at [A2B1 - A2B2]

On the other hand the F test is testing the main effect of B in the presence of an interaction A*B using type 3. That is it test in the margin by assuming equal weight for each level of A. So this is testing ([A2B1 - A2B2] + [A1B1 - A1B2])/2.

Finally several very erudite Statisticians would claim that you should not be using either of these. They would say "Never test a main effect in the presence of an interaction". That is as soon as you believe in the presence of an interaction, you need to ask about the the role of B when A=A1, and again the role of B when A=A2.

Hope that this helps.

James.

Valued Guide
Valued Guide
Posts: 673

Re: Why are there different p-values for F-tests and soluation t-tests in proc glm with a balanced design?

The question from the OP has come up several times in the past. See, for instance,

for a more complex problem (more complex model) that has bearing on the question. For a simple model with one factor and two levels, as an example, there is agreement between the different tests. But once you have interactions, things are different. This is because the tests in the ANOVA table are for expected values (such as differences of means), not just for individual parameters. With an interaction, a main effect mean is a linear combination of the coefficent(s) for the listed effect (say, B) AND some of the coefficients for the interaction (say, for B*A). You can see what parameters are used for each test by adding an E option on the model statement.

New Contributor
Posts: 2

Re: Why are there different p-values for F-tests and soluation t-tests in proc glm with a balanced design?

Indeed, and this constraint is not trivial.  The "solution" option in proc glm uses dummy coding which renders the parameter estimates for main effects uninterpretable as main effects (philosophical question of whether one should interpret main effects notwithstanding). If regression estimates are desired for the main effects in a 2x2, orthogonal contrast coding is required. Normally when running an experiment one is not interested in regression coefficient. However there are cases where the coeffiicents are desireable -- for example when testing mediation and estimating indirect effects. In such cases, DONT use the 'solution' option in proc glm.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 492 views
  • 0 likes
  • 4 in conversation