turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Why are there different p-values for F-tests and s...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-09-2015 10:24 PM

I have a 2x2 design. To make it simple, here’s a small data set with two factors, A and B, each with 2 levels and a DV ,Y. The design is perfectly balanced:

The results from the ANOVA portion of proc glm produce different findings than the solution, which doesn’t make sense. The solution option in the model statement provides regression estimates of effects of the dummy-coded variables. Given that both IVs are only 2 levels, each estimable effect corresponds to the effects in the design (i.e., A, B and A*B). Therefore the p-values for the F-tests should be identical to those of the parameter estimates from the regression solution, correct?

**data** exp;

input A $ B $ Y @@;

datalines;

A1 B1 12 A1 B1 14 A1 B2 11 A1 B2 9

A1 B1 13 A1 B1 12 A1 B2 10 A1 B2 8

A2 B1 20 A2 B1 18 A2 B2 17 A2 B2 10

A2 B1 22 A2 B1 20 A2 B2 19 A2 B2 11

;

**proc** **glm** data=exp;

class A B;

model Y=A B A*B/SOLUTION;

**run**;

Dependent Variable: Y

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 3 231.2500000 77.0833333 12.42 0.0005

Error 12 74.5000000 6.2083333

Corrected Total 15 305.7500000

R-Square Coeff Var Root MSE Y Mean

0.756337 17.64002 2.491653 14.12500

Source DF Type I SS Mean Square F Value Pr > F

A 1 144.0000000 144.0000000 23.19 0.0004

B 1 81.0000000 81.0000000 13.05 0.0036

A*B 1 6.2500000 6.2500000 1.01 0.3355

Source DF Type III SS Mean Square F Value Pr > F

A 1 144.0000000 144.0000000 23.19 0.0004

B 1 81.0000000 81.0000000 13.05 0.0036

A*B 1 6.2500000 6.2500000 1.01 0.3355

Standard

Parameter Estimate Error t Value Pr > |t|

Intercept 14.25000000 B 1.24582637 11.44 <.0001

A A1 -4.75000000 B 1.76186454 -2.70 0.0195

A A2 0.00000000 B . . .

B B1 5.75000000 B 1.76186454 3.26 0.0068

B B2 0.00000000 B . . .

A*B A1 B1 -2.50000000 B 2.49165273 -1.00 0.3355

A*B A1 B2 0.00000000 B . . .

A*B A2 B1 0.00000000 B . . .

A*B A2 B2 0.00000000 B . . .

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve

the normal equations. Terms whose estimates are followed by the letter 'B' are not

uniquely estimable.

Shouldn’t the p-value for the effect of B in the type III SS F-test (p=.0036) be equal to p-value for the parameter B B (p=.0068)?

Accepted Solutions

Solution

03-10-2015
02:29 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2015 02:29 PM

The t test in the Solution for B main effect is testing the difference between A2B1 and A2B2. This is because of the constraint that SAS is using.So it is looking at [A2B1 - A2B2]

On the other hand the F test is testing the main effect of B in the presence of an interaction A*B using type 3. That is it test in the margin by assuming equal weight for each level of A. So this is testing ([A2B1 - A2B2] + [A1B1 - A1B2])/2.

Finally several very erudite Statisticians would claim that you should not be using either of these. They would say "Never test a main effect in the presence of an interaction". That is as soon as you believe in the presence of an interaction, you need to ask about the the role of B when A=A1, and again the role of B when A=A2.

Hope that this helps.

James.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2015 10:49 AM

Different test=> different test statistic from different calculations, very likely to have different p values.

Solution

03-10-2015
02:29 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2015 02:29 PM

The t test in the Solution for B main effect is testing the difference between A2B1 and A2B2. This is because of the constraint that SAS is using.So it is looking at [A2B1 - A2B2]

On the other hand the F test is testing the main effect of B in the presence of an interaction A*B using type 3. That is it test in the margin by assuming equal weight for each level of A. So this is testing ([A2B1 - A2B2] + [A1B1 - A1B2])/2.

Finally several very erudite Statisticians would claim that you should not be using either of these. They would say "Never test a main effect in the presence of an interaction". That is as soon as you believe in the presence of an interaction, you need to ask about the the role of B when A=A1, and again the role of B when A=A2.

Hope that this helps.

James.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2015 06:03 PM

The question from the OP has come up several times in the past. See, for instance,

for a more complex problem (more complex model) that has bearing on the question. For a simple model with one factor and two levels, as an example, there is agreement between the different tests. But once you have interactions, things are different. This is because the tests in the ANOVA table are for expected values (such as differences of means), not just for individual parameters. With an interaction, a main effect mean is a linear combination of the coefficent(s) for the listed effect (say, B) AND some of the coefficients for the interaction (say, for B*A). You can see what parameters are used for each test by adding an E option on the model statement.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2015 08:11 PM

Indeed, and this constraint is not trivial. The "solution" option in proc glm uses dummy coding which renders the parameter estimates for main effects uninterpretable as main effects (philosophical question of whether one should interpret main effects notwithstanding). If regression estimates are desired for the main effects in a 2x2, orthogonal contrast coding is required. Normally when running an experiment one is not interested in regression coefficient. However there are cases where the coeffiicents are desireable -- for example when testing mediation and estimating indirect effects. In such cases, DONT use the 'solution' option in proc glm.