BookmarkSubscribeRSS Feed
mgerritsen
Calcite | Level 5
Hi all,
I am trying to model the following data in SAS using proc glimmix and i would like feedback if i have modelled my data correctly.
My data consist of individual chickens (ID) who are grouped by treatment in the enclosure they occupy (chickens receiving different treatments are kept in the same enclosure). I want to know if treatment influences if they appear at popholes present in their enclosure (registered, in which '1' describes an event). Furthermore if two weather variables (Factor1, Factor2 , averages per day) may influence this appearance. Repeated measurements (measured once a day) have been taken of the same individual chickens over a course of days (Date).
In total about 480 chickens subdivided in 6 treatments have been followed over a period of 26 days.
I have attempted the following model but i am unsure if it is correct since I am new to using both proc glimmix and sas.

proc glimmix data=chicken;

class date id treatment;

model registered (event='1') = treatment|factor1 |factor 2 / link=logit dist=binomial solution;

random intercept/ subject=id(treatment);

random date/subject= ID(treatment) residual;

Lsmeans treatment/pdiff;

run ;

I have tried to model the repeated nature of the experiment by using random date/subject= ID(treatment) residual; and this model gives an output which states that only factor2 and factor1*factor2 have a significant effect (type 3 test). The outcome feels right as individual chickens are treated as individual subjects with  the correct number of observations per subject and the model is not overdispersed.
G-side Cov. Parameters1
R-side Cov. Parameters1
Columns in X28
Columns in Z per Subject1
Subjects (Blocks in V)485
Max Obs per Subject26
However I beleive i should include date and id also in the model statement. However, If i do this using the most simple model

model registered (event='1') = treatment factor1  factor 2 id date / link=logit dist=binomial solution; sas gives the following indication:

WARNING: Obtaining minimum variance quadratic unbiased estimates as starting values for the  covariance parameters failed.

If i exclude date from the statement the model will not converge. When excluding id from the model, the model converges however as date takes all the credit for effect per day  factor1 and factor 2 are not provided with further output.

I have also attempted the same models while changing dist= from binomial to binary, this time the model did not give the warning but neither did it converge, not even when pconv=1e-4.   Excluding id had the same effect as in the binomial model,   date takes all the credit for effect per day  factor1 and factor 2 are not provided with further output. While excluding date results in a model which will not converge even if pconv=1e-4.
* Edit: In hindsight i suspect the warning to be due to the order in which the indepedent variables are placed in the model. When i modelled

model registered (event='1') = treatment factor1  factor 2  id date / link=logit dist=binomial solution; I got the warning again but not when

model registered (event='1') = treatment date factor1  factor 2  id / link=logit dist=binomial solution; *

Hopefully someone is willing to shed some light on my model.
Mariëlle

I made an edit * in regards to the warning sas provided Message was edited by: Marielle Gerritsen

21 REPLIES 21
SteveDenham
Jade | Level 19

Hi Marielle,

I would try the following:

proc glimmix data=chicken;

class date id treatment;

model registered (event='1') = treatment|date factor1 factor2 / link=logit dist=binomial solution;

random intercept/ subject=id(treatment);

random date/subject= ID(treatment) residual;

Lsmeans treatment/diff;

run ;

This will give the marginal means of the treatment, averaged over all dates, estimated at the mean values of factor1 and factor2.  This may or may not address all of your scientific issues, and you should realize that marginal means of binomials fit with R side covariance structures will tend to be biased toward the grand mean.

Here are some things to consider:  How many dates do you have, are they equally spaced in time, and are all ID's observed on the same dates?  It is likely that some sort of autoregressive error structure should be considered.  Are their any conceivable interactions between the weather factors and the treatment?  I deleted these, but your analysis points out a significant factor1*factor2 effect.  Are these two factors on a similar scale?  The interaction you find may depend on this.

Steve Denham

mgerritsen
Calcite | Level 5

Hi Steve,
Thank you for your time in looking at my question. I have tried to use the model you proposed, unfortunately the model refuses to converge.  I have tried increasing the number of iterations up to 200 and lowering the convergence criterion to 1e-4, also a combination ot the two does not result in a converged model. Also sorting by date, registered and ID did not result in convergence.

What did work was introducing a new continues numeric variable which represents date (datenum) in the model but not the repated statement, using the following model:

proc glimmix data=chicken;

class date id treatment;

model registered (event='1') = treatment|datenum factor1|factor2 / link=logit dist=binomial solution;

random intercept/ subject=id(treatment);

random date/subject= ID(treatment) residual;

Lsmeans treatment/diff;

run ;

     For modeling the R-side effect a variable has to be identified as a class variable thus is introducing a continues variable into the modelstatement as a representation a correct solution? I doubt it is because i suspect i remove the repeated nature of the data by using datenum instead of date. Regarding modelling the repeated measures using the R-side. For glimmix it was the only method i could clearly find online and i am not aware of any other methods of doing this, do you know of any other methods?

     In regard to you question about the dates and the autoregressive structure: I followed chickens for 26 days continouesly with not all id's being observed on the same days. If chickens were not registred  the dataset automatically defaults to setting an observation of the chicken not being registred.I have attempted to work with an ar(1) autoregressivestructure before however i find that the model will often not converge.

Surprisingly when i set type=ar(1) in both the model you proposed and the one depicted above the models would only converge when i set pconv=1e-4, other combinations of pconv and inititer had no effect. However, in the model you proposed factor1 and factor2 had no output while they had an output using datenum instead of date.

I stumbled upon changing the convergence criterion somewhere but am uncertain what levels are stil viewed as correct in modelling? 

In regard to the interactions between variables, the model was indeed overspecified. I have removed the interaction between treatment and the weatherfactors but did include an interaction between them, as they respectively represent windspeed/direction and rainfall.

Mariëlle

SteveDenham
Jade | Level 19

Marielle,

Have you tried:

proc glimmix data=chicken method=laplace;

class datenum id treatment;

nloptions maxiter=5000 tech=nmsimp;

model registered (event='1') = treatment|datenum factor1|factor2 / link=logit dist=binary solution;

random intercept / subject=id(treatment);

random datenum / subject=id(treatment) type=ar(1);

Lsmeans treatment/diff;

run ;

Shifting to a binary distribution from binomial may help, as I suspect all of your responses are zero or one.  I threw in the change to Nelder-Mead simplex as an optimaiztion technique, which requires a LOT of iterations but is not as touchy as some other techniques.  I also went to datenum because I know the data are correctly sorted on that (sometimes actual dates are messy).  With uneven spacing per individual, but even spacing throughout the study, an AR(1) should work, so I threw that in too.  Probably too many changes at once Smiley Sad but maybe this will help.

Steve Denham

mgerritsen
Calcite | Level 5

Hi Steve,

Thank you for the input. I'm only wondering should random datenum / subject=id(treatment) type=ar(1); not be random datenum / subject=id(treatment) type=ar(1) residual; to model the repeated nature of the data?

Mariëlle

SteveDenham
Jade | Level 19

No.  Since the distribution is binary/binomial, the mean and variance are functionally related.  By changing to quasi-likelihood estimation (rather than pseudo-likelihood), the results are conditional on the random effects, rather than marginals averaged over the random effects.  See Stroup's Generalized Linear Mixed Models for an extensive discussion on this issue.  For very large multi-site clinical trials, this approach has been shown to be not much better than the pseudo-likelihood estimation, but for smaller studies, it provides a more unbiased result.

Steve Denham

mgerritsen
Calcite | Level 5

Hi Steve,

I have attempted to use your model however after running the model for 8 hours sas provided that the 5000 iterations specified were not  sufficient to converge the model. Would you suggest increasing maxiter?

On another hand, a colleque suggested it might better if I use REML to model my data. However, if I am correct in sas REML are performed using the mixed procedure which cannot cope with binary data? He tried this analysis in GENSTAT and in his specified model he only used ID and date as random variables not ID(row). Furthermore he did not use date in the model statement because it would nullify the specification of date as a random factor in the overall model. Is glimmix sensitive to the same limitation of the random statement or is glimmix specifically able to handle random factors included in the modelstatement because it combines the mixed and linear regression models? In most glimmix models i have encountered date or time set as a random variable was included in the model statement,  but it vexes me why it gives so much trouble in my model.

Furthermore, he suggested I might use percentages of chickens registred per day per treatment instead of the actual numbers. Might this simplify the calculations in either a glimmix or REML because you have fewer datapoints per day?

SteveDenham
Jade | Level 19

The suggestion to condense to a proportion should certainly help.  You could do that with the events/trials syntax.

I reiterate my comment above:

See Stroup's Generalized Linear Mixed Models for an extensive discussion on this issue (method of fitting).  Personally, I'm a strong believer in the conditional approach, rather than the marginal REML.

As far as maxiter, for NMSIMP, I have gone up to 10,000.  After that I begin to worry about nonlinearity/sensitive dependence on initial conditions/machine precision swamping the actual convergence.  I really think going to the events/trials syntax should help in this area--it opens up other techniques as possibilities.

So, if expressed as proportions, do you have any possibility of quasi-separation (either all or none of the birds registered on some days on one treatment, and the opposite on a different treatment)?

Steve Denham

mgerritsen
Calcite | Level 5

I have now created a table containing the variables datenum (26 in total)), treatment (1 to 6) count (the total amount of hens registred at the pophole per treatment), total (the total amount of hens per treatment). Also included are Percent (the respective proportions of the countdata), factor1 and factor2.

All of my data is similar to the values depicted in the table below so i have no cases in which the proportion (Percent) is either 0 or 1 for one treatment and the opposite for another, thus both binary values 1 or 0 occur for every treatment on every day measurements were taken, so neither is there complete or quasicomplete-seperation.


datenum


treatment


COUNT


PERCENT


total


Factor1


Factor2


1


1


50


0.626880642


80


-0.504551372


-0.756606836


1


2


48


0.601805416


81


-0.504551372


-0.756606836


1


3


39


0.488966901


81


-0.504551372


-0.756606836


1


4


49


0.614343029


81


-0.504551372


-0.756606836


1


5


43


0.539117352


81


-0.504551372


-0.756606836


1


6


43


0.539117352


81


-0.504551372


-0.756606836


I have attempted to adjust the model you described previously and used it on the data as shown above:

proc glimmix data=hen method=laplace;

  class datenum treatment;

  model count/total = datenum|treatment factor1 |factor2 / link=logit dist=binomial solution;

  random intercept/ subject=treatment;

  random datenum/ subject=treatmenttype=ar(1);

  Lsmeans treatment/pdiff;

  run ;

Though the model converges I get no significance values (a model is made but df's are 0) and the sas log provides the following commentary:

proc
glimmix data=hen method=laplace;

  1105  class datenum treatment;

  1106  model count/total = datenum|treatment factor1|factor2 / link=logit dist=binomial solution;;

  1107  random intercept/ subject=treatment;

  1108  random datenum/ subject=treatment type=ar(1);

  1109  Lsmeans treatment /pdiff;

  1110  run ;

   NOTE:
Writing HTML Body file: sashtml68.htm

NOTE:
Convergence criterion (ABSGCONV=0.00001) satisfied.

NOTE:
Estimated G matrix is not positive definite.

Have i interpreted the events/trials option in the right way?

SteveDenham
Jade | Level 19

It's because the treatment is both random and fixed.  Remove the first random statement (the one with intercept in it).  It is absorbing all of the degrees of freedom, I think.

Steve Denham

mgerritsen
Calcite | Level 5

After adjusting sas still does not provide df's or significance value's and the log says the following

proc glimmix data=hen method=laplace;

  class datenum treatment;

  model count/total = datenum|treatment factor1 |factor2 / link=logit dist=binomial solution;

  random datenum/ subject=treatment type=ar(1);

  Lsmeans treatment/pdiff;

  run ;

NOTE: Writing HTML Body file: sashtml28.htm

NOTE: Convergence criterion (ABSGCONV=0.00001) satisfied.

NOTE: Estimated G matrix is not positive definite.

NOTE: PROCEDURE GLIMMIX used (Total process time):

      real time           9.20 seconds

      cpu time            8.72 seconds

SteveDenham
Jade | Level 19

Mmm, maybe we have the wrong subject for the random statment.  Been a long time since I worked on a poultry study, but I would assume from your description that there are multiple pens per treatment.  If so, then summarize the proportions by pen, and change to:

proc glimmix data=hen method=laplace;

  class datenum treatment pen;

  model count/total = datenum|treatment factor1|factor2 / link=logit dist=binomial solution;

  random datenum/ subject=pen type=ar(1);

  Lsmeans treatment/pdiff;

  run ;

If not (treatment is completely confounded with pen), then the random statement is still exhausting all of the degrees of freedom, and consequently datenum will have to be removed from the model statement (something I really don't like doing, as it is a design factor):

proc glimmix data=hen method=laplace;

  class datenum treatment;

  model count/total = treatment factor1|factor2 / link=logit dist=binomial solution;

  random datenum/ subject=treatment type=ar(1);

  Lsmeans treatment/pdiff;

  run ;

Hope this works.

Steve Denham

mgerritsen
Calcite | Level 5

Hi Steve,

Unfortunately you have now hit on the weakness I feared at the beginning of the study. Unfortunately the current set-up of the study was already applied before I began working with the data. Because it is currently a trial study all the chickens, irregardless of treatment, are kept in the same pen on the same farm. Though i have replicates in time there are no replicates in treatment dispersed over seperate pens, only a multitude of individuals occupying the same pen. This probably also clarifies why my earlier attempts at modelling gave so many problems when i included date (datenum) in the model.

I have tried to use the second model you proposed and indeed I got values for treatment. However, something is still eating away the DF for the co-variates factor1 and factor2. With your explanation regarding confoundment of treatment and pen could it happen because for each treatment there is complete overlap between datenum and weatherfactors, since all chickens are kept in the same pen?

Since my colleque said he got results in genstat using REML i am considering to use this method. Though it cannot use datenum as a variable within the model statement i should be able to get an indication of its effect. My colleque indicated that using data from the table 'estimated variance components' the random variable is estimated to have a significant influence when the componentvalue/ s.e. >1.

Since random variables cannot be used in the model statement and i believe proc mixed cannot handle binary data I expect it to look something like:

proc mixed data=hen;

class datenum treatment;

  model count= treatment Factor1|factor2/ solution;

  repeated datenum/ subject= treatment type=ar(1);

  lsmeans treatment/diff;

  run;

In this case do you assume it would be better to use count data (I could not find the oppertunity to apply dist=poisson in proc mixed so i doubt using count data) or percent (count/total)?

For now I wish you a very good weekend,

Mariëlle

SteveDenham
Jade | Level 19

OK--one pen, all treatments included, and consequently factor1 and factor2 are what are completely confounded with datenum.  Still doable, and I see why the REML approach is appealing.

The sample data puts all of the proportions near 0.5.  If all of the observed proportions are in the 0.2 to 0.8 range, you will get acceptable results analyzing the proportions using the PROC MIXED approach, with the dependent variable as a proportion rather than a count.  As long as the range on the dependent value holds, then the known relation between the mean and variance won't be violated very badly.

The other alternative is to use RSPL--the equivalent of REML for non-normally distributed data.  It provides marginal estunates, but it also allows you to use df adjustment methods that should help handle the confounding of the covariates with datenum.  In this case, it would be:

proc glimmix data=hen;

  class datenum treatment;

  model count/total = treatment factor1|factor2 / link=logit dist=binomial solution ddfm=kr2;

  random datenum/residual subject=treatment type=ar(1);

  Lsmeans treatment/pdiff;

  run ;

Give it a shot.

And if that doesn't work, maybe considering factor1 and factor2 as spatially related random factors rather than as fixed covariates would be an interesting approach.

Steve Denham

mgerritsen
Calcite | Level 5

Hi Steve,

to give you a short updat for now. The mixed model as i proposed it in my last response did not work. When i changed subject=treatment to subject=datenum in the repeated statement it did fortunately. I got nearly identical results to those produced by my collegue in Genstat. The only thing that bothers me a little, though i don't know if it is valid, are the dimensions expected from the model.  I would like to see the subjects and observations per subject in the dimensions table to reflect 6 subjects (treatment) with 26 observations (datenum) instead i obtain 26 subjects with 6 observations per subject. This is to be expected when datenum is set as subject but it feels a bit counter-intuitive.

Furthermore i also attempted to use your RLSP model. My first thought was that itresembles the model i proposed in my first post. Furhtermore, I'm not sure if this is because I use genstat 9.2 but the program does not recognise ddfm=kr2, it assumes a typo  and per default it used ddfm=kr instead. Also, sas provides the following warning

WARNING: Pseudo-likelihood update fails in outer iteration 1.

Literature suggested using laplace or quadrature to circumvent this problem, however due to the residual statement this cannot work. Otherwise implementing the PARM statement was suggested to set the starting values, in order to select the correct covariates it was suggested to use the variables presented in 'Covariance Parameter Estimates” table. However i'm unsure how to decide which initial values to set.

this is the table belonging to your proposed model

AR(1)treatment0.9995.
Residual 193126.

Considering factor1 and factor2 as spatially related random factors would just be achieved by placing them in a random statement (random factor1 factor2) i presume?

Marielle

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 2866 views
  • 0 likes
  • 2 in conversation