turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- PROC MIXED - multivariate response + random effect

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-01-2017 01:12 AM - edited 02-07-2017 01:25 AM

I'm looking for some help specifying a model using PROC MIXED.

My dataset consists of individuals (variable = 'id') from 13 populations (variable = "pop"). I am attempting to model varation in 3 different response variables: y1, y2, and y3. I was able to rearrange my dataset into the 'stacked' format, with a single line for each response trait, coded by a new categorical variable called "var".

An example of my dataset:

data tall; input pop$ id$ var$ y; cards; AF 1 y1 1.11186 AF 1 y2 1.10318 AF 1 y3 1.03751 AF 2 y1 1.05796 AF 2 y2 1.07508 AF 2 y3 1.0011 AF 3 y1 1.16453 AF 3 y2 1.02108 AF 3 y3 0.93194 AF 4 y1 1.09352 AF 4 y2 0.83194 AF 4 y3 0.99382 AF 5 y1 1.12513 AF 5 y2 1.14387 AF 5 y3 1.01929 ...

The goal is to estimate the variance-covariance matrix for the population term ('pop'), so I've included it in the RANDOM statement. But I'm having trouble understanding how to utilize the RANDOM and REPEATED commands in this context. I have no fixed terms/covariates in the model. This is what I've been trying so far...

```
proc mixed data=tall method=reml covtest asycov;
class var pop;
model y= /noint notest;
random pop /type=UN;
repeated var /subject=id type=VC;
```

Thanks in advance for any suggestions.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-31-2017 05:50 PM - edited 02-01-2017 01:13 AM

moved to analytics community...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-31-2017 10:17 PM

You might want to move this topic to the analytics community.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-01-2017 08:52 AM

Try the following to see if the results make sense:

```
proc mixed data=tall method=reml covtest asycov;
class var pop;
model y=var /noint;
random pop;;
repeated var /subject=id type=unr;
```

Here I would consider the variable to be repeated measures on the subject, with an unstructured latent correlation matrix. I would continue to consider population as a random effect, but not try to model a structure between populations. I would think that the populations represent a random sample from possible populations, and would lead to a single variance component.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-01-2017 11:48 AM

Thanks for the suggestion Steve. However, when I ran the command you suggested it returned the following error:

NOTE: An infinite likelihood is assumed in iteration 0 because of a nonpositive definite estimated R matrix for Subject 1.

And I think I understand what you're saying about between-population variance. But the main goal of the analysis is to estimate the variance-covariance matrix of the 3 response variables across the populations.

i.e.,

y1 y2 y3 y1 variance(y1) covariance(y1,y2) covariance(y1,y3) y2 covariance(y2,y1) variance(y2) covariance(y2,y3) y3 covariance(y3,y1) covariance(y3,y2) variance(y3)

Any suggestions on how to generate this would be greatly appreciated.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-06-2017 09:44 AM

Well, once we get the infinite likelihood problem solved, the variance-covariance estimates should be included in the output.

Is there any chance that the Id variable is repeated across populations? This is a common cause of non-unique records. If so, replace the subject=id in the REPREATED statement with subject=id*pop.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-06-2017 12:03 PM - edited 02-06-2017 12:38 PM

@SteveDenham, following your lead (which looks like a good direction to me), what do you think about something like this:

proc mixed data=tall method=reml covtest asycov; class var pop id; model y=var ; random var / subject=pop type=unr; repeated var / subject=id(pop) type=unr; run;

It's a bit manic.

Does the covariance structure of the entire model make sense? I'm not sure.

I added *id* to the class statement which I think will help with the estimation problem.

I dropped noint from the model statement but doubt that matters.

Would there be any hope of convergence with only 13 levels of *pop* (edit: and possibly few levels of *id* within *pop*) and so many covariance parameter estimates?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-06-2017 12:58 PM

@sld, I don't much like that RANDOM statement. You are right about too many parameters (78) to be estimated. Simpler, I think, to view the populations as a random sample from possible populations, and estimate a single variance component (At least to start.)

I think the next logical step might be to look at this with each variable (of y1, y2 and y3) having its own estimate for variability, so how about:

random intercept / subject=pop group=var;

This would help guarantee that the diagonal in @kpboh's requested matrix had different values.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-06-2017 01:01 PM

Steve,

Thanks for the follow up. I appreciate your help working through this.

And the ids are indeed unique, both within AND across populations.

On a related note--I presume 'id' should be input as a categorical variable, no? I ask because when I submitted the code you suggested above (with "subject=id") it returned:

ERROR: Variable id should be either numeric or in CLASSES list.

When 'id' is added to the class list, it returns the infinite likelihood error...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-06-2017 01:22 PM

Shoot, I hoped it was as easy as duplicate records.

In the data sample you provided, is Subject 1 the complete set:

```
AF 1 y1 1.11186
AF 1 y2 1.10318
AF 1 y3 1.03751
```

or is there more someplace else? What I am starting to be afraid of is some sort of linear dependency.

If this is typical for all subjects (3 observations, each indexed by a y variable), it really isn't a lot different from a split-plot in time as far as analysis goes. You might consider recoding y1, y2, y3 to integer values, and sorting with a nodupkey option to see if the output dataset has the same number of observations.

**See the SGF paper by Kiernan et al. http://support.sas.com/resources/papers/proceedings12/332-2012.pdf. There is a specific section on page 10 that deals with this NOTE. It only appears when there is some sort of duplicate record, or misspecification of the subject so that it leads to duplicate records.**

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-06-2017 11:44 PM

Hi Steve @SteveDenham and @kpboh,

Thoughts:

(1) Regarding the number of parameters: 78 seems too high. I was thinking it would be 6 for the first random, 6 for the second random, and another 3 for the *var* means. But 15 is still a lot, and maybe I'm not counting correctly. I thought about taking this problem for a spin with simulated data, but I haven't had the time.

(2) kpboh, as Steve suggests, you can check the data structure. I use the TABULATE procedure a lot for this task:

proc tabulate data=tall; class pop id var; table pop*id, var; run;

I think you would hope to see "1" in each cell.

(3) kpboh, in your original message, your example dataset was "pops", but the dataset used in GLIMMIX was "tall". Are these two datasets equivalent?

(4) kpboh, it's not clear whether you ran a model with both

`class var pop id;`

and

`repeated var /subject=id(pop) type=unr;`

and still had lack of convergence. You might have, it's just not clear.

(5) If my model suggestion is not feasible and if you want covariances at the *pop* level, could you average the y1, y2, and y3 values over the *id*s within each *pop*? Do you have the same number of *id*s in each *pop*? (Does it even make sense to average over *id*s, in context or statistically?) How else can we get at covariances at the *pop* level....? (There's the rub.)

proc mixed data=tall_MEANS method=reml covtest asycov; class var pop; model y=var ; repeated var / subject=pop type=unr; run;

Susan

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-07-2017 01:26 AM - edited 02-07-2017 01:30 AM

Dear @sld,

Thanks very much for the suggestions. Following are my responses:

2) The TABULATE procedure is new to me and very handy indeed. Thanks! And yes--it shows a "1" in all cells of the dataset.

(3) Sorry for the confusion. The original dataset was "pops" but I had converted it to the "tall" format (i.e., 3 entries per individual). Looks like I made an error in the copy/paste. I corrected the dataset name in the original post.

(4) Apologies for not reporting back, but yes--I ran the model you had suggested:

```
proc mixed data=tall method=reml covtest asycov;
class var pop id;
model y=var;
random var/subject=pop type=unr;
repeated var /subject=id(pop) type=unr;
run;
quit;
```

Unfortunately, this returned the "infinite likelihood" error mentioned above.

(5) And yes, it had occurred to me that I could simply generate populations means for the three response variable, then simply estimate the variance-covariance matrix from those means (and indeed, this is common practice for this particular type of analysis). Others have shown similar results doing this versus the random effect variance component approach... My hope was to have both to be able to compare.

And to flesh out the dataset a bit more:

1. there are no missing data

2. the dataset is unbalanced--the number of 'ids' per population ranges from n=10 to n=85 (mean is 39.3 individuals per populations)

Finally, I should mention that I had been working off of this document which discusses multivariate analysis using PROC MIXED. Unfortunately, none of the examples therein included both a RANDOM and a REPEATED statement.

Thanks again,

kpboh

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-07-2017 06:25 PM

(2) The documentation for MIXED | Details | Computational Issues | Convergence Problems includes this statement:

"If PROC MIXED stops because of an infinite likelihood, recheck your model to make sure that no observations from the same subject are producing identical rows in or and that you have enough data to estimate the particular covariance structure you have selected."

Your TABULATE results show that the problem is not due to coding (observations from the same subject producing identical rows in or ), which means that it is probably due to not enough data. Adjusting parameters that control the estimation process (covered in the documentation and the Kiernan et al. paper that @SteveDenham recommended) might help, or not.

So....what next? Other than give up. I'd follow Steve's lead and try simpler covariance structures, building up to more complex, and see how close you can get to what you want.

A split-plot-in-time-like model that specifies 3 sources of variance (in addition to residual variance) with default covariance structures

/*RandomStmt1*/ random intercept / subject=pop;

/*RandomStmt2*/ random intercept / subject=id(pop);

/*RandomStmt3*/ random var / subject=pop;

Try 1: RandomStmt1 and RandomStmt2 (i.e., drop RandomStmt3)

Try 2: RandomStmt1 and RandomStmt2 and RandomStmt3

Try 3: Steve's suggestion: RandomStmt1 with group=var

Try 4: RandomStmt1 with type=un or unr or fa0(3) (see http://comp.soft-sys.sas.narkive.com/RqRMBFgu/difference-between-un-vs-unr-type-option-in-random-sta...)

Try 5: RandomStmt2 with default covariance structure, and RandomStmt3 with type=un or unr or fa0(3)

Try other combinations that you think are appropriate, depending on whether simpler Trys converge.

Clearly, I haven't attempted these, and so if one looks wrong, it might be wrong.

None of these Trys are doing anything fancy with residual variance. Should they? You don't appear to be interested in covariance structure for residual variance--your focus is on covariance structure for pop variance--and maybe you can ignore residual variance to some extent, although I'm not sure that's statistically valid.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-08-2017 10:56 AM

The 78 comes from 13*12/2 for an unstructured matrix.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-08-2017 04:07 PM - edited 02-09-2017 02:02 AM

On the number of parameters topic, I disagree. I'm happier when I have a data set to test, so I dummied one up (* very badly*; if I had time, I'd pull down my copy of Rick Wicklin's