Hi all,
I'm stuck on a bit of SAS coding. Here's the background info: I'm trying to do an ANCOVA (analysis of covariance). I'm looking at the relationship between tree diameter and sapwood area on aspen and limber pine trees. The ANCOVA should allow me to compare the regression models between my two tree species. I began by actually running individual regressions for each tree species. This step confirmed that the linear regression models were what I wanted to use (as opposed to, say, quadratic or cubic models). I then ran the following code for the ANCOVA (I've excluded most of the data to save space) :
data dbh_sa2;
options linesize = 80;
input x y species $;
if species = 'aspen' then w1 = 1;
else w1 = 0;
if species = 'limber' then w2 = 1;
else w2 = 0;
creating z variables;
z1 = w1 * x;
z2 = w2 *x;
datalines;
12.7 93.8 aspen
.
.
.
.
9.4 50.3 limber
;
ods html close; ods html;
*performing the multiple linear regressions;
*model 1, Slope and intercept whatever;
proc reg;
model y = w1 w2 z1 z2 / noint;
model 2, Force regression to have identical slopes;
proc reg;
model y = w1 w2 x / noint;
*model 4, Force regression to have identical slopes AND intercepts;
proc reg;
model y = x;
run;
So, that code worked. The NEXT step is the one I am having problems with. In this step, I am trying to conduct the F test for comparing model #1 against model #2. To do this, I have to look at the output (.lst file) from my previous SAS code, and fish out the following data:
RSSO (residual, or error, sum of squares from the null model)
RSSA (residual sum of squares from the alternate model)
DFO (error degrees of freedom from the null model)
DFA (error degrees of freedom from the alternate model)
Having fished out those values, I went ahead and created my next bit of SAS code:
Data ftest;
Options linesize = 80;
Input RSSO RSSA DFO DFA;
DIFFRSS = (RSSO – RSSA);
*Difference between null and alternate RSS;
DIFFDF = DFO – DFA;
*Difference between null and alternate error df;
MSA = RSSA / DFA;
*Denominator mean square for f test;
F = (DIFFRSS / DIFFDF) / MSA;
*Calculated F value;
P = 1 – probf (F, DIFFDF, DFA);
*p-value for calculated F value;
Datalines:
286806 268748 69 68
;
Proc print;
Run;
Sadly, this last code doesn't run like it should, and I don't understand the error messages... Any suggestions?
Thanks in advance!
SAS has all the tools to simplify the ANCOVA. It will generate the dummy variables for you and do the proper tests. With version 9.3, it will even detect that your analysis is an ANCOVA and generate a specialized graph. Try this :
data dbh_sa2;
input x y species $;
datalines;
12.7 93.8 aspen
.
.
.
.
9.4 50.3 limber
;
/* First, check the parallel slopes assumption */
proc glm data=dbh_sa2;
class species;
model y = x species x*species / solution;
run;
/* Then, if x*species term is not significant, i.e. the slopes are parallel, remove it to do the ANCOVA */
proc glm data=dbh_sa2;
class species;
model y = x species / solution;
run;
PG
Thanks, PG. It worked!
Is it full balance experiment design ?
If it is, use proc anova , otherwise like PG's proc glm .
Ksharp -- good to know. I don't have the same number of samples for aspen and limber pine (one has something like 38, the other has something like 32), so I think that means my samples are not "balanced"...
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.