GLIMMIX oddity - df and solution dependent on location of term in the ...

SteveDenham · Posted 05-21-2020 02:27 PM

In a repeated measures analysis of covariance, with the whole plot being a latin square and the repeated measurements on the subject per day and treatment (grp_no), we have been using the following code:

proc glimmix data=one(where=(param=700)) hessian IC=q ;
   nloptions maxiter=500;
   by segment param sex;
   class grp_no period day animnum;
   model value=COV grp_no  day animnum period grp_no*period period*animnum  period*day / solution ddfm=bw ;
   random intercept / subject = animnum(day grp_no);
   random period / type = AR(1) subject = animnum(day grp_no) residual;
   run;

Now for some param classes in a new dataset (we do separate analyses by param class), this "works" while for others (param=500 for example) the log has the warningr "Obtaining minimum variance quadratic unbiased estimates as starting values for the covariance parameters failed." To get around this we tried moving the continuous variable COV to the end of the MODEL statement like so

model value= grp_no  day animnum period grp_no*period period*animnum  period*day COV/ solution ddfm=bw ;

Well, that got rid of the warning.

Here is the second oddity, under both MODEL statements, the Type3 table of F tests has 0 df for the numerator, missing df for the denominator and missing F value. However, under the original MODEL statement (COV at beginning) there is an estimate, std error and probability reported in the solution vector for the param's that run. When COV is at the end though, the solution vector has a zero as the estimate. In fact, it looks like all of the other "set to zero" solution values.

So, I come before you to gather any ideas on what might be going on here. Is it a data sort issue? Is it a design issue? The code for this has been validated for this design but with only a single animal per day - treatment combination. The data here has 2 animals per day - treatment combination.

Oh yeah, it works just fine if you remove the covariate COV, but the whole point of the analysis is to adjust for pre-treatment differences between animals. This is all in SAS/STAT 14.1 (TS1M3). Data is attached.

SteveDenham

STAT_Kathleen · Posted 05-21-2020 05:26 PM

Steve,

I tested using the GLIMMIX code shown below using SAS 9.4TS1M6 with SAS/STAT 15.1 and was not able to replicate the warning message with COV as the first variable in the MODEL. However, based upon your description it sounds to me as if there could be a singularity, confounding issue associated with data and model. I suspect the COV variable is confounded either with another variable or levels of another variable. I added the htype=1,3 option to the MODEL statement and examined the NUM DF and noticed that the NUM DF differ between the TYPE I and TYPE 3. If the degrees of freedom for a particular effect differ between the Type I and Type III tests, then there might be a confounding problem in the model. In my experience where the variables where the NUM DF differ tend to be involved in the confounded problem.

proc glimmix data=work.for_stats_segment1(where=(param=700)) ;
nloptions maxiter=500;
by segment param sex;
class grp_no period day animnum;
model value=grp_no day animnum period grp_no*period period*animnum period*day COV /
ddfm=bw htype=1,3;
random intercept / subject = animnum(day grp_no);
random period / type = AR(1) subject = animnum(day grp_no) residual;
run;

SteveDenham · Posted 05-22-2020 11:24 AM

When in doubt about mixed models, there is always a great source of information - SAS for Mixed Models (whichever edition you happen to have). We have a bunch of 2nd editions laying around, and on pages 101 to 102, the design used in this study is discussed (split-plot with whole plot as a Latin square). The key is to get the row and column variables (animnum and day) into the RANDOM statement correctly. Here is what worked:

proc glimmix data=for_stats4(where=(param=500 )) hessian IC=q;
   nloptions maxiter=500;
   by segment param sex;
   class grp_no period day  animnum;
   model value=cov grp_no period grp_no*period  /*period*DAY period*animnum*/   /solution htype=(1 2 3) ddfm=bw;
   *random animnum day animnum*day;
   random intercept day/subject=animnum;
   random period / type = AR(1) subject = animnum*day residual;
   				lsmeans grp_no*period / 	slicediff=(period)
										adjust=simulate(cvadjust)  
										stepdown
										simpledifftype=control
										adjdfe=row;

				lsmeans grp_no / 	adjust=simulate(seed=1 nsamp=1000000) 
									diff=control
									stepdown
									adjdfe=row;
ods output 	lsmeans=_lsmeandat_4 tests1= _effectsdat1_4 tests3=_effectsdat3_4 slicediffs=_slicedat_4 diffs=_diffdat_4;
   run;

Thanks to everyone who helped out on this - @STAT_Kathleen , @lvm and @sld .

SteveDenham

SteveDenham · Posted 05-22-2020 11:33 AM

And to the authors: Littell, Milliken, Stroup, Wolfinger and Schabenberger. I wish there was a way to mark the book as a solution.

SteveDenham

lvm · Posted 05-21-2020 06:25 PM

It is an interesting problem. I did not get your error message with GLIMMIX, but I did duplicate your problem in the output. I tried this with PROC MIXED (changing the last statement to REPEATED) and got the message (with one of the orders for the covariate, can't remember which):

WARNING: Stopped because of infinite likelihood

This typically means with repeated measures that you have more than one observation per time/subject combination, which is not allowed as you know. Quoting from the great Kiernan et al. SASGF (2012) article:

"PROC MIXED requires that you have one observation per repeated effect per subject. This note occurs when your PROC MIXED step includes a REPEATED statement and the input data set has multiple observations at a given time point for that particular subject."

You said you had two observations per subject/time level. But I am surprised that you did not get this error message with GLIMMIX. I think you need to define animal as the subject, or is that what is happening with your animnum(day grp_no) subject? Other things can give this error as well.

I also think that Kathleen identified the other (maybe the first) problem. When I look at frequencies of the levels, using

proc freq data=one(where=(param=700));
table grp_no*day*animnum / nopercents norow nocol;
run;

I see mostly zeroes, suggesting a possible coding issue with grp_no. But I don't really know your data, so I can't say.

I hope this gives you some ideas.

lvm · Posted 05-21-2020 08:47 PM

This sure is an interesting challenge. To follow on to the previous comment by Kathleen, it appears that cov and animnum are confounded in this dataset. Although cov is a continuous variable it only takes on 8 values, a unique one for each of the 8 levels of animnum. Easy to see with:


proc freq data=one(where=(param=700));
	table animnum*cov / nopercents norow nocol;
run;

I became suspicious when I looked at the solution table. The last level of animnum was 'estimated' as 0, as it must be with the matrix sweep of the overparameterized model. However, the next-to-last level is also 0 (with no SE). Suggests that animnum is confounded with something else.

SteveDenham · Posted 05-22-2020 11:11 AM

Thanks for the hint, but I don't know how to get around it. The design is such that each animal has a unique pretreatment value that we wish to use as a covariate. Consequently, there is complete confounding. However, this never seems to be a problem when whole plots are just blocks, or there are no blocks. Off to SAS for Mixed Models, 2nd ed. pages 101-102, where not surprisingly, this exact design (without a covariate) is covered.

SteveDenham

lvm · Posted 05-22-2020 05:35 PM

With these data, the only way to get around it is to drop cov (for this model and data). I did some more playing around, which I'll explain here. You can see the impact of the confounding by focusing on the Type I tests, and putting cov and animnum as the first two variables in the model, that way you can focus better on the implications of the link between these two, ignoring everything else. If you use cov and animnum in that order, you get very reasonable results for Type I tests. cov is significant with F(1,19)=66.4 (slope = 0.6), and animnum is significant with F(6,19)=4.13. The continuous cov variable provides some information on the response, but not all of the information since cov is capturing just the linear effect. Any nonlinear effect is left over, which is then captured by the animnum factor. I think the df_N is 6 and not 7 because some of the effect of animnum factor is already captured by cov (one more level is 0 in the solution).

Now, if you put animnum and cov as the first two variables (in that order), things are very different. animnum is significant with F(7,19)=13.03, but cov has no effect, with F(0,.)=missing (slope = 0). The entire effect of animnum/cov is already captured by the first term in the model.

You can even look at this in a different way, although this is just for demonstration purposes. Define cov as a factor. Then fitting the model with cov factor (and no animnum) gives the exact same fit (same -2LL) as the model with animnum factor (an no cov factor). Trying to put both in the model as factors blows things up real good.

It was interesting to think about all of this.

SteveDenham · Posted 05-26-2020 08:49 AM

Thanks @lvm . The answer is somewhere above this, but I wanted to acknowledge the effort you put into this. As you pointed out, cov is completely confounded with animnum, and as a result, when the model presented was used, things fell apart. So there were two possible outs, either get rid of the pretreatment covariate or change the model. Since we adjust for the pretreatment value with other designs, including a non-replicated Latin square, the optimum path was to change the model to an appropriate form for a whole plot Latin square with a split-plot in time. As I mention, that design is spelled out explicitly in SAS for Mixed Models, and we are in the process of incorporating it in place of the code that was giving us fits.

SteveDenham

lvm · Posted 05-26-2020 05:06 PM

I know you found a different model that will work. I just thought I would share with other interested readers the things you can learn from Type 1 testing, since Type 1 testing is not done often enough.

SteveDenham · Posted 05-27-2020 08:33 AM

That is a great point @lvm . I think it is one of the main reasons Frank Harrell left SAS for other software. I really appreciate you and @STAT_Kathleen going through everything and pointing out what happens with complex designs. And also want to acknowledge Milliken and Johnson's Analysis of Messy Data, vol.3 Analysis of Covariance for additional thoughts.

SteveDenham

GLIMMIX oddity - df and solution dependent on location of term in the MODEL statement

Re: GLIMMIX oddity - df and solution dependent on location of term in the MODEL statement

Re: GLIMMIX oddity - df and solution dependent on location of term in the MODEL statement

Re: GLIMMIX oddity - df and solution dependent on location of term in the MODEL statement

Re: GLIMMIX oddity - df and solution dependent on location of term in the MODEL statement

Re: GLIMMIX oddity - df and solution dependent on location of term in the MODEL statement

Re: GLIMMIX oddity - df and solution dependent on location of term in the MODEL statement

Re: GLIMMIX oddity - df and solution dependent on location of term in the MODEL statement

Re: GLIMMIX oddity - df and solution dependent on location of term in the MODEL statement

Re: GLIMMIX oddity - df and solution dependent on location of term in the MODEL statement

Re: GLIMMIX oddity - df and solution dependent on location of term in the MODEL statement

SAS Innovate 2025: Call for Content