Hi, I am trying to run proc glm (on SAS 9.4 TS Level 1M5) with a continuous dependent variable (log10biomarker) and 1 continuous independent variable (log10root) and 1 categorical independent variable (Species: either Kanlow or Summer). I would like to test for the separate effects of log10root and Species, but also for the interaction between the two independent variables. I have tried the code (below), specifying that Species is a categorical variable with the statement Class. And no dummy coding was necessary as the dataset lists either "Kanlow" or "Summer" in every cell (i.e. not say, "0" for Kanlow and "1" for Summer). However, the output (attached) shows that the interaction is tested for Kanlow and Summer individually. Instead, I want to simply test the interaction log10root * Species on log10biomarker (i.e. does the effect of log10root on log10biomarker change based on whether the species is Kanlow or Summer).
proc glm data=Rootmicr.Initial;
Class Species;
by Group;
model log10biomarker = log10root | Species / solution;
ods output ParameterEstimates=Rootmicr.IniDepthSpecIntxnGLMparmestim;
run;
This is the correct output when you have an interaction between a class x-variable and a continuous x-variable.
The interaction is statistically significant (Pr > F is less than 0.05), and the area you have highlighted indicates the different slopes of log10root to predict log10biomarker, depending on which species is used.
I see. Any thoughts on why there are no values in the parameter estimates output for when Species=Summer?
That's how SAS has chosen to parameterize the model.
I wrote a short description about this (in a slightly different situation, although the same principle applies here)
Interpreting Multivariate Linear Regression with Categorical Variables
I ran this:
proc glm data=Rootmicr.Initial;
Class Species;
by Group;
model log10biomarker = log10root | depthnum | Species / solution;
lsmeans Species / stderr pdiff cov out=adjmeans;;
ods output ParameterEstimates=Rootmicr.IniDepthSpecIntxnGLMparmestim;
run;
How does one interpret this part of the output?
The SAS System |
1.18374631 | 0.30934731 | 0.0010 | 0.2874 |
1.58113183 | 0.19197711 | <.0001 |
Species=Summar is the reference level. Because you've included an intercept term, the GLM can only estimate DIFFERENCES between levels. For the mathematical details, see "Singular parameterizations, generalized inverses, and regression estimates"
There is only a single interaction estimated, and that is the -4.08 value. What is confusing you is just the fact that your Species variable is a CLASS (categorical) predictor and is represented in the model with two dummy-coded (0,1-coded) variables. However, since Species has two levels, it has only 1 degree of freedom and therefore only 1 estimable parameter. The second parameter is restricted to zero. Similarly, the interaction has only 1 degree of freedom (1 for log10root times 1 for Species equals 1). Again, the second is restricted to zero. The significant -4.08 interaction suggests that the effect of log10root differs between the two Species.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.