- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I was looking at a coding example in Ramon Littel's book 'SAS for Mixed Modells', where he is looking at an interaction between a continuous (hour) and a categorical (drug) variable in the contrast statment. I don't understand why (within each contrast) he first specifies the main effect, e.g. "drug 1 -1 0" before specifying the drug*hour interaction. What is the purpose of this?
Here is another example of someone doing the same thing in an estimate statement (source: SAS Library: How do I handle interactions of continuous and categorical variables?😞
Can anyone shed some light on why this is necessary, i.e. why the contrast/estimate statments don't just simply contain the interaction of interest, without the main effect?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is needed because of the way SAS has chosen to parameterize the model with regard to classification factors (sometimes called the "effects model").
The interaction has to have all of the effects shown, and if you left the main effect for drug (or diet in the second example), you would be told that the effect is non-estimable. You can force PROC GLM to display what functions are estimable by using the E option in the MODEL statement.
There are other parameterizations of the model (you may have learned somewhere, and are used in other software, called the "means model") where this would not be necessary. But SAS doesn't use this parameterization, you are pretty much stuck with the parameterization that SAS gives you, and thus you need to include the main effects in the ESTIMATE statement, as shown by the E option in the MODEL Statement.
This reference goes into all the gory details: "Analysis of Messy Data", Milliken and Johnson, Van Nostrand Reinhold, 1984.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is needed because of the way SAS has chosen to parameterize the model with regard to classification factors (sometimes called the "effects model").
The interaction has to have all of the effects shown, and if you left the main effect for drug (or diet in the second example), you would be told that the effect is non-estimable. You can force PROC GLM to display what functions are estimable by using the E option in the MODEL statement.
There are other parameterizations of the model (you may have learned somewhere, and are used in other software, called the "means model") where this would not be necessary. But SAS doesn't use this parameterization, you are pretty much stuck with the parameterization that SAS gives you, and thus you need to include the main effects in the ESTIMATE statement, as shown by the E option in the MODEL Statement.
This reference goes into all the gory details: "Analysis of Messy Data", Milliken and Johnson, Van Nostrand Reinhold, 1984.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try to calculate and/or graph your estimated means by hand using the parameters of your model, with a different line for each level of your main effect. With an interaction, the difference in main effects differs at depending on the level of your continuous variable. When you have more than two levels of your main effect, you need to specify which two you are contrasting...because you are estimating different slopes for your continuous variable for each level of your main effect.
The following pages may be of help to you. It always helps me to write out the function I am estimating based on its parameterization, and then the components of the two means I am trying to compare.
Specification of ESTIMATE Expressions :: SAS/STAT(R) 13.1 User's Guide
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Also note that in the first example, the variable hour is being fit as a categorical variable, with multiple levels. It is NOT being treated as a continuous variable. In the second example, height is a continuous variable, and heterogeneous slopes are fit by diet. This results in testing the differences between diets at a low, median, and high value of the continous covariate.
You may wish to examine the use of the LSMESTIMATE statement. It avoids this sort of thing where all levels need to be specified.
Steve Denham
.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Steve. Will the LSMESTIMATE work for continuous variables. It says " Only class variables allowed in this effect."