Solved: How do I translate lme in R to proc mixed?

ninemileshigh · Posted 10-22-2018 06:06 AM

I am working on a project with collaborators who use R- I am trying to recode an one of their analyses in SAS. It is a mixed effects model using the attached data (df1.csv). In our experimental design, individual chambers (numbered 1- 27) are sited within three levels of topography. An individual chamber only appears in one topography. Each chamber repeatedly measures CO2 from sequential time points (Days). Rainfall is thought to be an important driving variable for CO2.

The R code for the model is

lme.mod <- lme(log(co2)~(Days+Topography+Rainfall)^2,random=~Topography|Chamber,
correlation=corARMA(p=1,q=0),na.action=na.exclude,
control = lmeControl(msMaxIter = 200, msMaxEval = 500,sing.tol=1e-20),data=df1)
anova(lme.mod)
Anova(lme.mod)
# initial model with 2 way interactions between all factors/variables and allowing topography-specific variance across chambers (chamber as random effect); for temporal autocorrelation, start with (1,0) corARMA model
# one ANOVA is F test, one is Type 2 Chi-sq; worth doing both to confirm sig. effect

which produces the output for fixed effects:

numDF denDF F-value p-value
(Intercept) 1 4826 313.06357 <.0001
Days 1 4826 0.66294 0.4156
Topography 2 24 282.28233 <.0001
Rainfall 1 4826 29.08699 <.0001
Days:Topography 2 4826 0.39228 0.6755
Days:Rainfall 1 4826 1.67336 0.1959
Topography:Rainfall 2 4826 8.91292 0.0001

My interpretation of the code is that the model has random factor fixed effects of chamber nested within topography, with an autoregressive structure (the measurements are repeated on each chamber at each time point). I want to code the equivalent in SAS; I create the log of the CO2 variable in the data step before I run the model. My code is here

DATA DF;
INFILE FLUX FIRSTOBS= 2 DLM=',' MISSOVER;
INPUT
OBS DAYS TOPOGRAPHY $ CO2 CHAMBER RAINFALL JUNK;
LOGCO2= LOG(CO2);
RUN;

PROC SORT NODUPKEY;
BY CHAMBER DAYS;
RUN;

PROC MIXED DATA= DF MAXFUNC= 500 MAXITER= 200;
WHERE LOGCO2 NE .;
CLASS CHAMBER DAYS TOPOGRAPHY ;
MODEL LOGCO2= DAYS TOPOGRAPHY RAINFALL
DAYS * TOPOGRAPHY
DAYS * RAINFALL
TOPOGRAPHY * RAINFALL / SOLUTION RESIDUAL DDFM= BW ;
RANDOM CHAMBER(TOPOGRAPHY) ;
REPEATED DAYS / SUBJECT= CHAMBER*DAYS TYPE= AR(1) R;
RUN;

but it produces output very different from the R script:

SAS Output

Type 3 Tests of Fixed Effects
Effect	Num DF	Den DF	F Value	Pr > F
DAYS	12	4489	2.95	0.0004
TOPOGRAPHY	2	4489	218.94	<.0001
RAINFALL	0	.	.	.
DAYS*TOPOGRAPHY	356	4489	1.54	<.0001
RAINFALL*DAYS	0	.	.	.
RAINFALL*TOPOGRAPHY	0	.	.	.

Can anyone help me understand i) why the output is so different. I understand that thee are various ways to calculate the denominator df, but why are the numerator df so different?? ii) why am I not getting a result for the rainfall variable in my Type 3 tests?

Thank you in advance

sld · Posted 10-24-2018 11:05 AM

R code:

library(nlme)
library(car)

lme.mod <- lme(log(co2)~(Days+Topography+Rainfall)^2,random=~Topography|Chamber,
               correlation=corARMA(p=1,q=0),na.action=na.exclude,
               control = lmeControl(msMaxIter = 200, msMaxEval = 500,sing.tol=1e-20),data=df1)
summary(lme.mod)
Anova(lme.mod, type=3)

SAS code

/*  remove DAYS from CLASS */
/*  make copy of DAYS to use in REPEATED */
/*  correct SUBJECT */
/*  heterogeneous variances for TOPOGRAPHY */
data df;
    set df;
    days_x = days;
    run;
PROC MIXED DATA= DF MAXFUNC= 500 MAXITER= 200;
WHERE LOGCO2 NE .;
CLASS CHAMBER TOPOGRAPHY(ref=first) days_x;
MODEL LOGCO2= DAYS TOPOGRAPHY RAINFALL
DAYS * TOPOGRAPHY
DAYS * RAINFALL 
TOPOGRAPHY * RAINFALL / SOLUTION RESIDUAL DDFM= BW ;
RANDOM CHAMBER(TOPOGRAPHY) / group=topography;
REPEATED days_x / SUBJECT= CHAMBER(topography) TYPE= AR(1);
RUN;

Your two main problems: (1) In your R code, DAYS is numeric; in your original SAS code, DAYS is factor. (2) You mis-specified SUBJECT in the REPEATED statement.

There are small differences in variance-covariance estimates between R and SAS (the data do not suggest or support such a complicated covariance structure, and many estimates are set to zero in MIXED), but otherwise the results match well.

This is just an answer to your coding question. I do not guarantee that this statistical model is appropriate for this study design.

View solution in original post

ninemileshigh · Posted 10-22-2018 01:46 PM

UPDATE....

I have successfully run the lme model from R in SAS, using PROC IML and the call ExportDataSetToR subroutine (this is very cool). But, I would still like to know why my mixed model is so different in SAS..

ballardw · Posted 10-22-2018 06:35 PM

@ninemileshigh wrote:

UPDATE....

I have successfully run the lme model from R in SAS, using PROC IML and the call ExportDataSetToR subroutine (this is very cool). But, I would still like to know why my mixed model is so different in SAS..

Different algorithms.

What little I have heard about R is that the "data sets" are structured quite a bit differently than SAS data sets. So the code has to differ. With different starting points, such as data structure, then options are set differently and different options are needed to reflect the strengths and weaknesses of the algorithms based on the structures.

Exercise, not that it reflects R just different software.

Create a spreadsheet with about 200 columns and 60,000 rows of data.

Now calculate the mean, max, min, std deviation, range, interquartile range, median, 25th and 75th percentiles, skewness and kurtosis of each of the columns.

Then consider the two lines of Proc Means code that would do the same thing in SAS with a similar data set.

sld · Posted 10-24-2018 11:05 AM

R code:

library(nlme)
library(car)

lme.mod <- lme(log(co2)~(Days+Topography+Rainfall)^2,random=~Topography|Chamber,
               correlation=corARMA(p=1,q=0),na.action=na.exclude,
               control = lmeControl(msMaxIter = 200, msMaxEval = 500,sing.tol=1e-20),data=df1)
summary(lme.mod)
Anova(lme.mod, type=3)

SAS code

/*  remove DAYS from CLASS */
/*  make copy of DAYS to use in REPEATED */
/*  correct SUBJECT */
/*  heterogeneous variances for TOPOGRAPHY */
data df;
    set df;
    days_x = days;
    run;
PROC MIXED DATA= DF MAXFUNC= 500 MAXITER= 200;
WHERE LOGCO2 NE .;
CLASS CHAMBER TOPOGRAPHY(ref=first) days_x;
MODEL LOGCO2= DAYS TOPOGRAPHY RAINFALL
DAYS * TOPOGRAPHY
DAYS * RAINFALL 
TOPOGRAPHY * RAINFALL / SOLUTION RESIDUAL DDFM= BW ;
RANDOM CHAMBER(TOPOGRAPHY) / group=topography;
REPEATED days_x / SUBJECT= CHAMBER(topography) TYPE= AR(1);
RUN;

Your two main problems: (1) In your R code, DAYS is numeric; in your original SAS code, DAYS is factor. (2) You mis-specified SUBJECT in the REPEATED statement.

There are small differences in variance-covariance estimates between R and SAS (the data do not suggest or support such a complicated covariance structure, and many estimates are set to zero in MIXED), but otherwise the results match well.

This is just an answer to your coding question. I do not guarantee that this statistical model is appropriate for this study design.

ninemileshigh · Posted 10-24-2018 11:15 AM

ok, thank you @sld - I always find having something explained so much more helpful than reading the support pages. My frustration was with not being able to get SAS to run the data. I am also not sure whether this is the most appropriate analysis, but at least now I can work with it in SAS.

How do I translate lme in R to proc mixed?

Re: How do I translate lme in R to proc mixed?

Re: How do I translate lme in R to proc mixed?

Re: How do I translate lme in R to proc mixed?

Re: How do I translate lme in R to proc mixed?

Re: How do I translate lme in R to proc mixed?

How do I translate lme in R to proc mixed?

Re: How do I translate lme in R to proc mixed?

Re: How do I translate lme in R to proc mixed?

Re: How do I translate lme in R to proc mixed?

Re: How do I translate lme in R to proc mixed?

Re: How do I translate lme in R to proc mixed?

Ready to join fellow brilliant minds for the SAS Hackathon?