I am working on a project with collaborators who use R- I am trying to recode an one of their analyses in SAS. It is a mixed effects model using the attached data (df1.csv). In our experimental design, individual chambers (numbered 1- 27) are sited within three levels of topography. An individual chamber only appears in one topography. Each chamber repeatedly measures CO2 from sequential time points (Days). Rainfall is thought to be an important driving variable for CO2.
The R code for the model is
lme.mod <- lme(log(co2)~(Days+Topography+Rainfall)^2,random=~Topography|Chamber,
correlation=corARMA(p=1,q=0),na.action=na.exclude,
control = lmeControl(msMaxIter = 200, msMaxEval = 500,sing.tol=1e-20),data=df1)
anova(lme.mod)
Anova(lme.mod)
# initial model with 2 way interactions between all factors/variables and allowing topography-specific variance across chambers (chamber as random effect); for temporal autocorrelation, start with (1,0) corARMA model
# one ANOVA is F test, one is Type 2 Chi-sq; worth doing both to confirm sig. effect
which produces the output for fixed effects:
numDF denDF F-value p-value
(Intercept) 1 4826 313.06357 <.0001
Days 1 4826 0.66294 0.4156
Topography 2 24 282.28233 <.0001
Rainfall 1 4826 29.08699 <.0001
Days:Topography 2 4826 0.39228 0.6755
Days:Rainfall 1 4826 1.67336 0.1959
Topography:Rainfall 2 4826 8.91292 0.0001
My interpretation of the code is that the model has random factor fixed effects of chamber nested within topography, with an autoregressive structure (the measurements are repeated on each chamber at each time point). I want to code the equivalent in SAS; I create the log of the CO2 variable in the data step before I run the model. My code is here
DATA DF;
INFILE FLUX FIRSTOBS= 2 DLM=',' MISSOVER;
INPUT
OBS DAYS TOPOGRAPHY $ CO2 CHAMBER RAINFALL JUNK;
LOGCO2= LOG(CO2);
RUN;
PROC SORT NODUPKEY;
BY CHAMBER DAYS;
RUN;
PROC MIXED DATA= DF MAXFUNC= 500 MAXITER= 200;
WHERE LOGCO2 NE .;
CLASS CHAMBER DAYS TOPOGRAPHY ;
MODEL LOGCO2= DAYS TOPOGRAPHY RAINFALL
DAYS * TOPOGRAPHY
DAYS * RAINFALL
TOPOGRAPHY * RAINFALL / SOLUTION RESIDUAL DDFM= BW ;
RANDOM CHAMBER(TOPOGRAPHY) ;
REPEATED DAYS / SUBJECT= CHAMBER*DAYS TYPE= AR(1) R;
RUN;
but it produces output very different from the R script:
SAS Output
Type 3 Tests of Fixed Effects | ||||
---|---|---|---|---|
Effect | Num DF | Den DF | F Value | Pr > F |
DAYS | 12 | 4489 | 2.95 | 0.0004 |
TOPOGRAPHY | 2 | 4489 | 218.94 | <.0001 |
RAINFALL | 0 | . | . | . |
DAYS*TOPOGRAPHY | 356 | 4489 | 1.54 | <.0001 |
RAINFALL*DAYS | 0 | . | . | . |
RAINFALL*TOPOGRAPHY | 0 | . | . | . |
Can anyone help me understand i) why the output is so different. I understand that thee are various ways to calculate the denominator df, but why are the numerator df so different?? ii) why am I not getting a result for the rainfall variable in my Type 3 tests?
Thank you in advance
R code:
library(nlme)
library(car)
lme.mod <- lme(log(co2)~(Days+Topography+Rainfall)^2,random=~Topography|Chamber,
correlation=corARMA(p=1,q=0),na.action=na.exclude,
control = lmeControl(msMaxIter = 200, msMaxEval = 500,sing.tol=1e-20),data=df1)
summary(lme.mod)
Anova(lme.mod, type=3)
SAS code
/* remove DAYS from CLASS */
/* make copy of DAYS to use in REPEATED */
/* correct SUBJECT */
/* heterogeneous variances for TOPOGRAPHY */
data df;
set df;
days_x = days;
run;
PROC MIXED DATA= DF MAXFUNC= 500 MAXITER= 200;
WHERE LOGCO2 NE .;
CLASS CHAMBER TOPOGRAPHY(ref=first) days_x;
MODEL LOGCO2= DAYS TOPOGRAPHY RAINFALL
DAYS * TOPOGRAPHY
DAYS * RAINFALL
TOPOGRAPHY * RAINFALL / SOLUTION RESIDUAL DDFM= BW ;
RANDOM CHAMBER(TOPOGRAPHY) / group=topography;
REPEATED days_x / SUBJECT= CHAMBER(topography) TYPE= AR(1);
RUN;
Your two main problems: (1) In your R code, DAYS is numeric; in your original SAS code, DAYS is factor. (2) You mis-specified SUBJECT in the REPEATED statement.
There are small differences in variance-covariance estimates between R and SAS (the data do not suggest or support such a complicated covariance structure, and many estimates are set to zero in MIXED), but otherwise the results match well.
This is just an answer to your coding question. I do not guarantee that this statistical model is appropriate for this study design.
UPDATE....
I have successfully run the lme model from R in SAS, using PROC IML and the call ExportDataSetToR subroutine (this is very cool). But, I would still like to know why my mixed model is so different in SAS..
@ninemileshigh wrote:
UPDATE....
I have successfully run the lme model from R in SAS, using PROC IML and the call ExportDataSetToR subroutine (this is very cool). But, I would still like to know why my mixed model is so different in SAS..
Different algorithms.
What little I have heard about R is that the "data sets" are structured quite a bit differently than SAS data sets. So the code has to differ. With different starting points, such as data structure, then options are set differently and different options are needed to reflect the strengths and weaknesses of the algorithms based on the structures.
Exercise, not that it reflects R just different software.
Create a spreadsheet with about 200 columns and 60,000 rows of data.
Now calculate the mean, max, min, std deviation, range, interquartile range, median, 25th and 75th percentiles, skewness and kurtosis of each of the columns.
Then consider the two lines of Proc Means code that would do the same thing in SAS with a similar data set.
R code:
library(nlme)
library(car)
lme.mod <- lme(log(co2)~(Days+Topography+Rainfall)^2,random=~Topography|Chamber,
correlation=corARMA(p=1,q=0),na.action=na.exclude,
control = lmeControl(msMaxIter = 200, msMaxEval = 500,sing.tol=1e-20),data=df1)
summary(lme.mod)
Anova(lme.mod, type=3)
SAS code
/* remove DAYS from CLASS */
/* make copy of DAYS to use in REPEATED */
/* correct SUBJECT */
/* heterogeneous variances for TOPOGRAPHY */
data df;
set df;
days_x = days;
run;
PROC MIXED DATA= DF MAXFUNC= 500 MAXITER= 200;
WHERE LOGCO2 NE .;
CLASS CHAMBER TOPOGRAPHY(ref=first) days_x;
MODEL LOGCO2= DAYS TOPOGRAPHY RAINFALL
DAYS * TOPOGRAPHY
DAYS * RAINFALL
TOPOGRAPHY * RAINFALL / SOLUTION RESIDUAL DDFM= BW ;
RANDOM CHAMBER(TOPOGRAPHY) / group=topography;
REPEATED days_x / SUBJECT= CHAMBER(topography) TYPE= AR(1);
RUN;
Your two main problems: (1) In your R code, DAYS is numeric; in your original SAS code, DAYS is factor. (2) You mis-specified SUBJECT in the REPEATED statement.
There are small differences in variance-covariance estimates between R and SAS (the data do not suggest or support such a complicated covariance structure, and many estimates are set to zero in MIXED), but otherwise the results match well.
This is just an answer to your coding question. I do not guarantee that this statistical model is appropriate for this study design.
ok, thank you @sld - I always find having something explained so much more helpful than reading the support pages. My frustration was with not being able to get SAS to run the data. I am also not sure whether this is the most appropriate analysis, but at least now I can work with it in SAS.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.