BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ninemileshigh
Obsidian | Level 7

I am working on a project with collaborators who use R- I am trying to recode an one of their analyses in SAS. It is a mixed effects model using the attached data (df1.csv). In our experimental design, individual chambers (numbered 1- 27) are sited within three levels of topography. An individual chamber only appears in one topography. Each chamber repeatedly measures CO2 from sequential time points (Days). Rainfall is thought to be an important driving variable for CO2.  

 

The R code for the model is 

 

lme.mod <- lme(log(co2)~(Days+Topography+Rainfall)^2,random=~Topography|Chamber,
correlation=corARMA(p=1,q=0),na.action=na.exclude,
control = lmeControl(msMaxIter = 200, msMaxEval = 500,sing.tol=1e-20),data=df1)
anova(lme.mod)
Anova(lme.mod)
# initial model with 2 way interactions between all factors/variables and allowing topography-specific variance across chambers (chamber as random effect); for temporal autocorrelation, start with (1,0) corARMA model
# one ANOVA is F test, one is Type 2 Chi-sq; worth doing both to confirm sig. effect

 

which produces the output for fixed effects:

 

                            numDF                   denDF              F-value              p-value
(Intercept)                  1                       4826               313.06357          <.0001
Days                          1                      4826                 0.66294             0.4156
Topography                2                        24                  282.28233          <.0001
Rainfall                      1                      4826                 29.08699            <.0001
Days:Topography      2                      4826                  0.39228             0.6755
Days:Rainfall             1                      4826                  1.67336             0.1959
Topography:Rainfall   2                     4826                   8.91292             0.0001

 

My interpretation of the code is that the model has random factor fixed effects of chamber nested within topography, with an autoregressive structure (the measurements are repeated on each chamber at each time point). I want to code the equivalent in SAS; I create the log of the CO2 variable in the data step before I run the model. My code is here

 

DATA DF;
INFILE FLUX FIRSTOBS= 2 DLM=',' MISSOVER;
INPUT 
OBS DAYS TOPOGRAPHY $ CO2 CHAMBER RAINFALL JUNK;
LOGCO2= LOG(CO2);
RUN;

 

PROC SORT NODUPKEY;
BY CHAMBER DAYS;
RUN;

 

 

PROC MIXED DATA= DF MAXFUNC= 500 MAXITER= 200;
WHERE LOGCO2 NE .;
CLASS CHAMBER DAYS TOPOGRAPHY ;
MODEL LOGCO2= DAYS TOPOGRAPHY RAINFALL
DAYS * TOPOGRAPHY
DAYS * RAINFALL 
TOPOGRAPHY * RAINFALL / SOLUTION RESIDUAL DDFM= BW ;
RANDOM CHAMBER(TOPOGRAPHY) ;
REPEATED DAYS / SUBJECT= CHAMBER*DAYS  TYPE= AR(1) R;
RUN;

 

 

but it produces output very different from the R script:

 

SAS Output

Type 3 Tests of Fixed Effects
Effect Num DF Den DF F Value Pr > F
DAYS 12 4489 2.95 0.0004
TOPOGRAPHY 2 4489 218.94 <.0001
RAINFALL 0 . . .
DAYS*TOPOGRAPHY 356 4489 1.54 <.0001
RAINFALL*DAYS 0 . . .
RAINFALL*TOPOGRAPHY 0 . . .

 

 

Can anyone help me understand i) why the output is so different. I understand that thee are various ways to calculate the denominator df, but why are the numerator df so different?? ii) why am I not getting a result for the rainfall variable in my Type 3 tests?

 

Thank you in advance

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

R code:

 

 

library(nlme)
library(car)

lme.mod <- lme(log(co2)~(Days+Topography+Rainfall)^2,random=~Topography|Chamber,
               correlation=corARMA(p=1,q=0),na.action=na.exclude,
               control = lmeControl(msMaxIter = 200, msMaxEval = 500,sing.tol=1e-20),data=df1)
summary(lme.mod)
Anova(lme.mod, type=3)

 

SAS code

 

/*  remove DAYS from CLASS */
/*  make copy of DAYS to use in REPEATED */
/*  correct SUBJECT */
/*  heterogeneous variances for TOPOGRAPHY */
data df;
    set df;
    days_x = days;
    run;
PROC MIXED DATA= DF MAXFUNC= 500 MAXITER= 200;
WHERE LOGCO2 NE .;
CLASS CHAMBER TOPOGRAPHY(ref=first) days_x;
MODEL LOGCO2= DAYS TOPOGRAPHY RAINFALL
DAYS * TOPOGRAPHY
DAYS * RAINFALL 
TOPOGRAPHY * RAINFALL / SOLUTION RESIDUAL DDFM= BW ;
RANDOM CHAMBER(TOPOGRAPHY) / group=topography;
REPEATED days_x / SUBJECT= CHAMBER(topography) TYPE= AR(1);
RUN;

Your two main problems: (1) In your R code, DAYS is numeric; in your original SAS code, DAYS is factor. (2) You mis-specified SUBJECT in the REPEATED statement.

 

There are small differences in variance-covariance estimates between R and SAS (the data do not suggest or support such a complicated covariance structure, and many estimates are set to zero in MIXED), but otherwise the results match well.

 

This is just an answer to your coding question. I do not guarantee that this statistical model is appropriate for this study design.

View solution in original post

4 REPLIES 4
ninemileshigh
Obsidian | Level 7

UPDATE....

 

I have successfully run the lme model from R in SAS, using PROC IML and the call ExportDataSetToR subroutine (this is very cool). But, I would still like to know why my mixed model is so different in SAS..

 

ballardw
Super User

@ninemileshigh wrote:

UPDATE....

 

I have successfully run the lme model from R in SAS, using PROC IML and the call ExportDataSetToR subroutine (this is very cool). But, I would still like to know why my mixed model is so different in SAS..

 


Different algorithms.

What little I have heard about R is that the "data sets" are structured quite a bit differently than SAS data sets. So the code has to differ. With different starting points, such as data structure, then options are set differently and different options are needed to reflect the strengths and weaknesses of the algorithms based on the structures.

 

Exercise, not that it reflects R just different software.

Create a spreadsheet with about 200 columns and 60,000 rows of data.

Now calculate the mean, max, min, std deviation, range, interquartile range, median, 25th and 75th percentiles, skewness and kurtosis of each of the columns.

Then consider the two lines of Proc Means code that would do the same thing in SAS with a similar data set.

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

R code:

 

 

library(nlme)
library(car)

lme.mod <- lme(log(co2)~(Days+Topography+Rainfall)^2,random=~Topography|Chamber,
               correlation=corARMA(p=1,q=0),na.action=na.exclude,
               control = lmeControl(msMaxIter = 200, msMaxEval = 500,sing.tol=1e-20),data=df1)
summary(lme.mod)
Anova(lme.mod, type=3)

 

SAS code

 

/*  remove DAYS from CLASS */
/*  make copy of DAYS to use in REPEATED */
/*  correct SUBJECT */
/*  heterogeneous variances for TOPOGRAPHY */
data df;
    set df;
    days_x = days;
    run;
PROC MIXED DATA= DF MAXFUNC= 500 MAXITER= 200;
WHERE LOGCO2 NE .;
CLASS CHAMBER TOPOGRAPHY(ref=first) days_x;
MODEL LOGCO2= DAYS TOPOGRAPHY RAINFALL
DAYS * TOPOGRAPHY
DAYS * RAINFALL 
TOPOGRAPHY * RAINFALL / SOLUTION RESIDUAL DDFM= BW ;
RANDOM CHAMBER(TOPOGRAPHY) / group=topography;
REPEATED days_x / SUBJECT= CHAMBER(topography) TYPE= AR(1);
RUN;

Your two main problems: (1) In your R code, DAYS is numeric; in your original SAS code, DAYS is factor. (2) You mis-specified SUBJECT in the REPEATED statement.

 

There are small differences in variance-covariance estimates between R and SAS (the data do not suggest or support such a complicated covariance structure, and many estimates are set to zero in MIXED), but otherwise the results match well.

 

This is just an answer to your coding question. I do not guarantee that this statistical model is appropriate for this study design.

ninemileshigh
Obsidian | Level 7

ok, thank you @sld - I always find having something explained so much more helpful than reading the support pages. My frustration was with not being able to get SAS to run the data. I am also not sure whether this is the most appropriate analysis, but at least now I can work with it in SAS. 

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 4620 views
  • 0 likes
  • 3 in conversation