Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How do I translate lme in R to proc mixed?

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-22-2018 06:06 AM
(4515 views)

I am working on a project with collaborators who use R- I am trying to recode an one of their analyses in SAS. It is a mixed effects model using the attached data (df1.csv). In our experimental design, individual chambers (numbered 1- 27) are sited within three levels of topography. An individual chamber only appears in one topography. Each chamber repeatedly measures CO2 from sequential time points (Days). Rainfall is thought to be an important driving variable for CO2.

The R code for the model is

lme.mod <- lme(log(co2)~(Days+Topography+Rainfall)^2,random=~Topography|Chamber,

correlation=corARMA(p=1,q=0),na.action=na.exclude,

control = lmeControl(msMaxIter = 200, msMaxEval = 500,sing.tol=1e-20),data=df1)

anova(lme.mod)

Anova(lme.mod)

# initial model with 2 way interactions between all factors/variables and allowing topography-specific variance across chambers (chamber as random effect); for temporal autocorrelation, start with (1,0) corARMA model

# one ANOVA is F test, one is Type 2 Chi-sq; worth doing both to confirm sig. effect

which produces the output for fixed effects:

numDF denDF F-value p-value

(Intercept) 1 4826 313.06357 <.0001

Days 1 4826 0.66294 0.4156

Topography 2 24 282.28233 <.0001

Rainfall 1 4826 29.08699 <.0001

Days:Topography 2 4826 0.39228 0.6755

Days:Rainfall 1 4826 1.67336 0.1959

Topography:Rainfall 2 4826 8.91292 0.0001

My interpretation of the code is that the model has random factor fixed effects of chamber nested within topography, with an autoregressive structure (the measurements are repeated on each chamber at each time point). I want to code the equivalent in SAS; I create the log of the CO2 variable in the data step before I run the model. My code is here

DATA DF;

INFILE FLUX FIRSTOBS= 2 DLM=',' MISSOVER;

INPUT

OBS DAYS TOPOGRAPHY $ CO2 CHAMBER RAINFALL JUNK;

LOGCO2= LOG(CO2);

RUN;

PROC SORT NODUPKEY;

BY CHAMBER DAYS;

RUN;

PROC MIXED DATA= DF MAXFUNC= 500 MAXITER= 200;

WHERE LOGCO2 NE .;

CLASS CHAMBER DAYS TOPOGRAPHY ;

MODEL LOGCO2= DAYS TOPOGRAPHY RAINFALL

DAYS * TOPOGRAPHY

DAYS * RAINFALL

TOPOGRAPHY * RAINFALL / SOLUTION RESIDUAL DDFM= BW ;

RANDOM CHAMBER(TOPOGRAPHY) ;

REPEATED DAYS / SUBJECT= CHAMBER*DAYS TYPE= AR(1) R;

RUN;

but it produces output very different from the R script:

SAS Output

Type 3 Tests of Fixed Effects | ||||
---|---|---|---|---|

Effect | Num DF | Den DF | F Value | Pr > F |

DAYS | 12 | 4489 | 2.95 | 0.0004 |

TOPOGRAPHY | 2 | 4489 | 218.94 | <.0001 |

RAINFALL | 0 | . | . | . |

DAYS*TOPOGRAPHY | 356 | 4489 | 1.54 | <.0001 |

RAINFALL*DAYS | 0 | . | . | . |

RAINFALL*TOPOGRAPHY | 0 | . | . | . |

Can anyone help me understand **i) **why the output is so different. I understand that thee are various ways to calculate the denominator df, but why are the numerator df so different?? **ii) **why am I not getting a result for the rainfall variable in my Type 3 tests?

Thank you in advance

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

R code:

```
library(nlme)
library(car)
lme.mod <- lme(log(co2)~(Days+Topography+Rainfall)^2,random=~Topography|Chamber,
correlation=corARMA(p=1,q=0),na.action=na.exclude,
control = lmeControl(msMaxIter = 200, msMaxEval = 500,sing.tol=1e-20),data=df1)
summary(lme.mod)
Anova(lme.mod, type=3)
```

SAS code

```
/* remove DAYS from CLASS */
/* make copy of DAYS to use in REPEATED */
/* correct SUBJECT */
/* heterogeneous variances for TOPOGRAPHY */
data df;
set df;
days_x = days;
run;
PROC MIXED DATA= DF MAXFUNC= 500 MAXITER= 200;
WHERE LOGCO2 NE .;
CLASS CHAMBER TOPOGRAPHY(ref=first) days_x;
MODEL LOGCO2= DAYS TOPOGRAPHY RAINFALL
DAYS * TOPOGRAPHY
DAYS * RAINFALL
TOPOGRAPHY * RAINFALL / SOLUTION RESIDUAL DDFM= BW ;
RANDOM CHAMBER(TOPOGRAPHY) / group=topography;
REPEATED days_x / SUBJECT= CHAMBER(topography) TYPE= AR(1);
RUN;
```

Your two main problems: (1) In your R code, DAYS is numeric; in your original SAS code, DAYS is factor. (2) You mis-specified SUBJECT in the REPEATED statement.

There are small differences in variance-covariance estimates between R and SAS (the data do not suggest or support such a complicated covariance structure, and many estimates are set to zero in MIXED), but otherwise the results match well.

This is just an answer to your coding question. I do not guarantee that this statistical model is appropriate for this study design.

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**UPDATE....**

I have successfully run the lme model from R in SAS, using PROC IML and the call ExportDataSetToR subroutine (this is very cool). But, I would still like to know why my mixed model is so different in SAS..

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@ninemileshigh wrote:

UPDATE....

I have successfully run the lme model from R in SAS, using PROC IML and the call ExportDataSetToR subroutine (this is very cool). But, I would still like to know why my mixed model is so different in SAS..

Different algorithms.

What little I have heard about R is that the "data sets" are structured quite a bit differently than SAS data sets. So the code has to differ. With different starting points, such as data structure, then options are set differently and different options are needed to reflect the strengths and weaknesses of the algorithms based on the structures.

Exercise, not that it reflects R just different software.

Create a spreadsheet with about 200 columns and 60,000 rows of data.

Now calculate the mean, max, min, std deviation, range, interquartile range, median, 25th and 75th percentiles, skewness and kurtosis of each of the columns.

Then consider the two lines of Proc Means code that would do the same thing in SAS with a similar data set.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

R code:

```
library(nlme)
library(car)
lme.mod <- lme(log(co2)~(Days+Topography+Rainfall)^2,random=~Topography|Chamber,
correlation=corARMA(p=1,q=0),na.action=na.exclude,
control = lmeControl(msMaxIter = 200, msMaxEval = 500,sing.tol=1e-20),data=df1)
summary(lme.mod)
Anova(lme.mod, type=3)
```

SAS code

```
/* remove DAYS from CLASS */
/* make copy of DAYS to use in REPEATED */
/* correct SUBJECT */
/* heterogeneous variances for TOPOGRAPHY */
data df;
set df;
days_x = days;
run;
PROC MIXED DATA= DF MAXFUNC= 500 MAXITER= 200;
WHERE LOGCO2 NE .;
CLASS CHAMBER TOPOGRAPHY(ref=first) days_x;
MODEL LOGCO2= DAYS TOPOGRAPHY RAINFALL
DAYS * TOPOGRAPHY
DAYS * RAINFALL
TOPOGRAPHY * RAINFALL / SOLUTION RESIDUAL DDFM= BW ;
RANDOM CHAMBER(TOPOGRAPHY) / group=topography;
REPEATED days_x / SUBJECT= CHAMBER(topography) TYPE= AR(1);
RUN;
```

Your two main problems: (1) In your R code, DAYS is numeric; in your original SAS code, DAYS is factor. (2) You mis-specified SUBJECT in the REPEATED statement.

There are small differences in variance-covariance estimates between R and SAS (the data do not suggest or support such a complicated covariance structure, and many estimates are set to zero in MIXED), but otherwise the results match well.

This is just an answer to your coding question. I do not guarantee that this statistical model is appropriate for this study design.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.