Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Obtaining the fitted values of main effects and interactions in PROC G...

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 02-08-2019 06:33 PM
(1327 views)

In short, I need to decompose the fitted values obtained from ANOVA into components corresponding to each term in ANOVA statement. This problem appears trivial, but it becomes tricky when GLM coding is used for the design matrix. Let's take a balanced 2*2 ANOVA with 2 observations per cell.

Y = A + B + A*B

GLM coding produces 8*9 design matrix:

Intercept – column 0

Main effect of A – columns 1-2

Main effect of B – columns 3-4

Interaction – columns 5-8

The estimated coefficients, b, look like (4 estimable parameters, as expected):

b0 b1 b2 b3 b4 b5 b6 b7 b8

5.5 -0.915 0 1.15 0 -0.36 0 0 0

Let's take the covariate pattern:

1 1 0 1 0 1 0 0 0

for which the fitted value is 5.375. One could say that it's decomposed as

5.375 = 5.5 (Int) - 0.915 (A main effect) + 1.15 (B main effect) - 0.36 (A*B interaction)

but that's not the case. The linear restrictions for Type III hypotheses to test the significance of the two main effects and interactions are created as specified here. In particular, to test

H0: Main effect of A = 0,

we test whether k′b where k is (defined up to a multiple):

0 0.894427 -0.894427 0 0 0.447214 0.447214 -0.447214 -0.447214

That shows that, for the given covariate pattern, the point estimate of A main effect depends on both b1 and b5, not just b1.

To test H0: Main effect of B = 0, we use the restriction:

0 0 0 0.894427 -0.894427 0.447214 -0.447214 0.447214 -0.447214

and for H0: A*B = 0 it is:

0 0 0 0 0 1 -1 -1 1

These linear restrictions are called "Type III estimable functions" in PROC GLM and "Type III coefficients" in PROC MIXED. The problem is I don't know how to use all that information in order to obtain the point estimates of main effects and interactions for a given covariate pattern.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

These ESTIMATE statements are linear contrasts for the means for each covariate pattern, and match the LSMEAN output for FactorA*FactorB:

```
estimate "A1B1" intercept 1 FactorA 1 0 FactorB 1 0 FactorA*FactorB 1 0 0 0;
estimate "A1B2" intercept 1 FactorA 1 0 FactorB 0 1 FactorA*FactorB 0 1 0 0;
estimate "A2B1" intercept 1 FactorA 0 1 FactorB 1 0 FactorA*FactorB 0 0 1 0;
estimate "A2B2" intercept 1 FactorA 0 1 FactorB 0 1 FactorA*FactorB 0 0 0 1;
```

9 REPLIES 9

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The estimated coefficients, b, look like (4 estimable parameters, as expected):

b0 b1 b2 b3 b4 b5 b6 b7 b8

5.5 -0.915 0 1.15 0 -0.36 0 0 0

Let's take the covariate pattern:

1 1 0 1 0 1 0 0 0

for which the fitted value is 5.375. One could say that it's decomposed as

5.375 = 5.5 (Int) - 0.915 (A main effect) + 1.15 (B main effect) - 0.36 (A*B interaction)

but that's not the case.

I'm afraid that is the case. If you want a predicted value of Y for this X condition, you have done the right thing.

The linear restrictions for Type III hypotheses to test the significance of the two main effects and interactions are created as specified here. In particular, to test

H0: Main effect of A = 0,

we test whether k′b where k is (defined up to a multiple):

0 0.894427 -0.894427 0 0 0.447214 0.447214 -0.447214 -0.447214

The linear restrictions to test the main effect of A have nothing to do with prediction of a value at a given X condition, as you discussed above. They are two different things. You are mixing apples and gorillas.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for replying. To put it in other terms, I want to decompose the fitted value into two main effects and interaction term to be able to adjust the observed response. E.g. if I want to adjust it for B and A*B,

Y_adj = Y - (Main effect of B) - (AB interaction)

so that when I regress Y_adj on A, B, A*B again, the Type III p-value for A should be exactly the same, and the p-values for B and A*B should be equal to one. I tried it the naive way (e.g. for that observation add (1.15 - 0.36) to adjust it), but the p-value for A is not preserved.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@JamesLin wrote:

E.g. if I want to adjust it for B and A*B,

Y_adj = Y - (Main effect of B) - (AB interaction)

so that when I regress Y_adj on A, B, A*B again,

the Type III p-value for A should be exactly the same

I doubt this is possible

and the p-values for B and A*B should be equal to one

In combination, these two conditions seems like a very unusual way of doing things, and I doubt it is possible.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Adding: the p-value depends on the number of degrees of freedom of the error term, and the estimate of the root mean square error of the model. As soon as you try to fix one portion of the model, and let other parts vary, you have changed the estimate of the root mean square error of the model, and so you **cannot** get the same p-values.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The estimated coefficients look like REF coding, not GLM coding. Some PROCs (e.g., LOGISTIC, GLMSELECT) have options for other CLASS parameterizations, but I think that the GLM and MIXED procedures use only REF by default. If so, your confusion may be due to thinking you have GLM parameterization but you actually have REF.

Posting your code and an example dataset could be quite helpful.

Also, I might think that "the point estimates of main effects and interactions for a given covariate pattern" are just the interaction LSMEANS. It's not clear to me what you are trying to obtain.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

No, it's not the reference coding. One can call it "Intercept + GLM"; e.g. for Y = A, the design has a column of 1's plus k columns where k is is the number of levels of factor A. If it were reference or effect coding the design would have fewer than 9 columns. I attached the data and code. Could you elaborate what you said about lsmeans?

FactorA FactorB Response

A1 B1 4.48

A1 B2 5.53

A1 B1 4.69

A1 B2 5.22

A2 B1 5.54

A2 B2 6.4

A2 B1 5.46

A2 B2 6.9

proc glm (or mixed) data = input.data;

class FactorA FactorB;

model Response = FactorA FactorB FactorA * FactorB / e e1 e2 e3 e4;

run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Oh, yes, you're right, it is GLM parameterization.

These ESTIMATE statements duplicate the hypothesis tests reported in the Type III ANOVA table. The coefficients are extracted from the Type III Estimable Functions table.

```
data test;
input FactorA $ FactorB $ Response;
datalines;
A1 B1 4.48
A1 B2 5.53
A1 B1 4.69
A1 B2 5.22
A2 B1 5.54
A2 B2 6.4
A2 B1 5.46
A2 B2 6.9
;
run;
proc glm data =test;
class FactorA FactorB;
model Response = FactorA FactorB FactorA * FactorB / e3 solution;
lsmeans FactorA*FactorB;
estimate "Main effect FactorA" intercept 0 FactorA 1 -1 FactorB 0 0 FactorA*FactorB 0.5 0.5 -0.5 -0.5;
estimate "Main effect FactorB" intercept 0 FactorA 0 0 FactorB 1 -1 FactorA*FactorB 0.5 -0.5 0.5 -0.5;
estimate "Interaction" intercept 0 FactorA 0 0 FactorB 0 0 FactorA*FactorB 1 -1 -1 1;
run;
```

The interaction lsmeans depict the predicted value for each of the 2 x 2 = 4 covariate patterns (i.e., A1B1, A1B2, A2B1, A2B2).

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

These ESTIMATE statements are linear contrasts for the means for each covariate pattern, and match the LSMEAN output for FactorA*FactorB:

```
estimate "A1B1" intercept 1 FactorA 1 0 FactorB 1 0 FactorA*FactorB 1 0 0 0;
estimate "A1B2" intercept 1 FactorA 1 0 FactorB 0 1 FactorA*FactorB 0 1 0 0;
estimate "A2B1" intercept 1 FactorA 0 1 FactorB 1 0 FactorA*FactorB 0 0 1 0;
estimate "A2B2" intercept 1 FactorA 0 1 FactorB 0 1 FactorA*FactorB 0 0 0 1;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for clarifying things for me.

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.