BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Hi
I am analyzing count data which was collected during transects, in different fields, at different time periods. I want to use a mixed model in which I can have the transects within fields included as a random factor, as in some cases more than one transect is associated with a field. I also want to include the sampling rounds as a repeated measure.

Firstly I tried PROC GLIMMIX. I just included the random effect of transect within field. As my data is count data my models had much better measures of fit and overdispersion when i used dist=poisson, I think because my data contains many zeros it worked even better with dist=negbin. So it makes sense for me to include a distribution family.

proc glimmix data=xxx;
class round treatment field transect;
model y = round treatment round*treatment / dist=poisson;
random transect(field);
run;

The above model works fine and then I come unstuck trying to add in repeated=sampling round.
In GLIMMIX it seems that there is no repeated option so I would have to include it as a random effect when i do this (below) the model runs forever and I have to stop it.

proc glimmix data=xxx;
class round treatment field transect;
model y = round treatment round*treatment / dist=poisson;
random = round / type = AR(1) subject=subj;
random transect(field);
run;

If I use PROC MIXED I can add both the random and repeated statements but can't specify poisson or negative binomial errors so my AIC measure is really large.
Does anyone have any advice or suggestions on how to include repeated measures as a random factor in PROC GLIMMIX? I am wondering if the reason my PROC GLIMMIX model wont run with the repeated measure as a random statement is because my subject is wrong. At the moment it is 'subj' which is the count recorded from each transect in each field during each time period.

Any advice would be greatly appreciated!
Thankyou in advance
Claire
1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19
Hi Claire,

To include repeated measures in GLIMMIX, you need to add one more option to the random statement. Try:

random = round / type = AR(1) subject=subj residual;

This will specify round as an R-side effect. Make sure that subject is included in the class statement, or that it is numeric and that the dataset is sorted in subject order (a good idea in any case).

Next question: are your timepoints equally spaced (or nearly equally spaced)? If not, AR(1) is probably not a good choice for the covariance error structure as it assumes a constant correlation between timepoints--look into SP(POW) as an alternative.

A note about autoregressive error structures: see Littell et al, J. Anim. Sci. 1998. 76:1216–1231, especially the technical section at the end. The subject in the R-side specification should be included as a random effect, so the between subject error can be correctly modeled. So, I would try the following:

proc glimmix data=xxx;
class round treatment field transect subj;
model y = round treatment round*treatment / dist=poisson;
random = round / type = AR(1) subject=subj residual;
random subj transect(field);
run;

Good luck!

View solution in original post

12 REPLIES 12
SteveDenham
Jade | Level 19
Hi Claire,

To include repeated measures in GLIMMIX, you need to add one more option to the random statement. Try:

random = round / type = AR(1) subject=subj residual;

This will specify round as an R-side effect. Make sure that subject is included in the class statement, or that it is numeric and that the dataset is sorted in subject order (a good idea in any case).

Next question: are your timepoints equally spaced (or nearly equally spaced)? If not, AR(1) is probably not a good choice for the covariance error structure as it assumes a constant correlation between timepoints--look into SP(POW) as an alternative.

A note about autoregressive error structures: see Littell et al, J. Anim. Sci. 1998. 76:1216–1231, especially the technical section at the end. The subject in the R-side specification should be included as a random effect, so the between subject error can be correctly modeled. So, I would try the following:

proc glimmix data=xxx;
class round treatment field transect subj;
model y = round treatment round*treatment / dist=poisson;
random = round / type = AR(1) subject=subj residual;
random subj transect(field);
run;

Good luck!
keckk
Fluorite | Level 6
Hi Steve!
I found your comment in search of an answer to a quite similar problem as Claire's and I wonder if you might have an answer to my question(s):

I am also analyzing count (behaviour data) data deriving from a study with cows of different genotype on different farms with repeated measures (on each farm on 3 consecutive days within periods - 3 periods for each of the farms, for each of which approx. 4 weeks lie in between the visits (=periods)).
For my interval data in this study I have build the following mixed model (my dependent variable are means of measurements within a certain time of the day e.g. from 8 to 10.00, 10-14.00, 14-17.00):

proc mixed data=xxxx;
class genotype animal farm day period;
model y = genotype period / solution;
random farm;
repeated day(period) / subj=animal(farm) type=cs;
run;

first question:
within one period my timepoints are equally spaced, but as there are 4 weeks to subsequent three-days repetition(=period) on the same animals(farm) I seem to have a kind of mixture between equally spaced and unequally spaced timepoints. In Littel et al. - SAS for mixed models I have found type = cs is one covariance error structure which may be used for both equally and unequally spaced timepoints (UN is not running, and AR(1) does not seem to be the right choice (?), as the subjects (in my case the cows) of one genotype haven't really been randomly chosen to account for genotype confounding effects (such as lactation stage, age etc)).

Do you have any advice for appropriate covariance structure?
by the way, proc mixed also doesn't seem to understand the nesting of animal(farm) and day(period) in the repeated statement, independent of the codes for the variables day (numerical or date) and period.

second question:
as I have many zeros in my count data, I think it is best to choose a zero-inflated poisson distribution to fit the data. But in proc glimmix it seems that there is no option for dist=zip or dist=zero-inflated negative binomial (?). Do you have any idea what to do in such a case ???
Maybe there's another solution for zero-inflated distribution and a hierarchical data set.
... seems to be a similar problem as Claire was mentioning in her post.

I would appreciate any comment and idea, as for this kind of data structure the problems seem to exceed the possible statistical solutions so far...
Karin
SteveDenham
Jade | Level 19
Wow. What a cool problem.

I don't have any answers, but maybe some observations that might help. It looks like a doubly repeated measures design. You might try a Kronecker product as the covariance structure--something like UN@AR(1). Unfortunately, this isn't available in GLIMMIX. If you are looking at SAS for Mixed Models, there is a section on unequally spaced timepoints (3.5 in the first edition, 5.4 in the second), where they talk about using spatial correlations, so maybe sp(pow)(time) would work. This structure is available in GLIMMIX.

As far as distributions, before you start looking at zero inflated processes, first check a negative binomial--it may be that the data is just overdispersed for a poisson distribution. However, if you are going into ZI processes, search the SAS-L listserve archives for this topic, especially articles by Dale McLaren (stringplayer_2@YAHOO.COM>). It will lead you to NLMIXED as a possibility for this sort of data.

Finally, if that seems like overkill, consider MIXED and an appropriate transformation. For counts, I seem to remember that a square root transform is variance stabilizing, or you might try log(counts+1).

Good luck.

Steve Denham
keckk
Fluorite | Level 6
Hello Steve,
ok, so I will try a poisson distribution, but what is the indicator for overdispersion in my count data?
First, I thought a preponderance of zero counts would rule out overdispersion for the non-zero counts, but as I don't seem to strike it lucky I might have to take both zero-inflation and maybe overdispersion into account...
NLMIXED seems to much challenge for now, but in case, where can I find the SAS-L listserve archives you mentioned?

I am not sure if I have properly converted my model from proc mixed
which so far was (without considering cs as the only covariance structure):
proc mixed data=xxx;
class genotype farm animal period date;
model x =genotype period genotype*period mean_bg1 / solution;
random farm;
repeated date(period)/subject=animal(farm) type=cs;
run;

to proc glimmix:
proc glimmix data=xxxx;
class genotype farm animal period date;
model x =genotype period genotype*period mean_bg1 / dist=negbin solution;
random farm;
random date(period) / subject=animal(farm) type=cs residual;
run;
Do I need to specify a random intercept (for animal)?
Do I also need a subject for the first random statement?

Again, any advice would be very welcome,
thank you,
Karin
SteveDenham
Jade | Level 19
SAS-L archives.

probably the fastest is to go to Google groups, and search the group

comp.soft-sys.sas

the listserve is hosted by the university of georgia, so a google search on those terms should lead to the actual archives.

good luck,

Steve Denham
keckk
Fluorite | Level 6
Hello Steve and everybody,

thanks for your hint on the archives.
Does anyone have an idea on the modeling of my mixed model in GLIMMIX? Is it correct as I did or do I need to define a subject for my random farm statement as well and/or a random intercept for the variable animal?

Thanks very much,
Karin
Kui
Calcite | Level 5 Kui
Calcite | Level 5
I also have a similar repeated count data with excess zeros.
If you have 9.2, you may consider proc genmod or countreg.
Unfortunately, 9.1 is my only option, SAS technical support suggested I need to use NLMIXED to fit ZIP or ZINB model. For the repeated, I need to program a correlation structure explicitly into the covariance matrix of the random effects on the RANDOM statement.

I am still new, please update if you guys get something to share.
keckk
Fluorite | Level 6
Hi Kui,
as for my data, I was following SteveDenham's advice and started with poisson and negbin distributions in proc glimmix as I wanted to account for the hierarchy in my data AND for the repeated measurements. Fit and residual diagnostics were looking ok.

As far as I know proc genmod works on 9.1 as well, and if you want to try with proc glimmix, you may download glimmix as an add-on for the SAS/Stat product in 9.1. (at support.sas.com). There's a SAS document introducing you to glimmix http://www2.sas.com/proceedings/sugi30/196-30.pdf , which might be helpful.

If you are not specifically interested in the repeated (time) effect (if repeated measurements refers to "repeated over time"), you might not necessarily look at them as repetitions and model them.

Hm, just a few thoughts ...
Good luck !
Kui
Calcite | Level 5 Kui
Calcite | Level 5
Good for you and thanks for your suggestions.

I have a reason why have to use NLMIXED instead of GLIMMIX, but I forgot, probably is repeated measurements.

Recently, I changed to use the binomail distribution on my data and stuck at adding the random effects.

Without random effects, it runs through, but with the random, optimization cannot be completed.

Kui
Kui
Calcite | Level 5 Kui
Calcite | Level 5
The reason is I was told Proc GLIMMIX cannot handle zero-inflated model.
keckk
Fluorite | Level 6
Hm, did you succeed using NLMIXED ?

Concerning your optimization problem with the (negative?) binomial distribution, you might try to use the METHOD=LAPLACE option in your proc glimmix statement (if you are talking about a glimmix model; with NLMIXED I don't have any experiences).
keckk
Fluorite | Level 6
Hi Claire,
I have quite a similar problem (many zeros in my count data, hierarchical data structure with random and repeated effect) and I wonder, if you have found a solution for the many zeros in your data - did you apply a zero-inflated negative binomial or zero-inflated poisson distribution for your analysis in proc glimmix ??

Thank you for any advice,
Karin

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 7115 views
  • 0 likes
  • 4 in conversation