BookmarkSubscribeRSS Feed
ColinGradstudent
Calcite | Level 5

Hello All,

Sorry for the long discussion title. I've found a couple previous discussions on this warning, but none of the solutions proposed seem to fix my problem. So, briefly here is a description of the data I'm trying to analyze, the problem I'm having, and the solutions I've already tried:

Data is from a pollination experiment focusing on 2 species. I had

- 10 plots randomly assigned to 2 main treatments: Unmanipulated, and Removal

- within each plot, 10 randomly selected individuals were monitored

- of the 10 individuals in each plot, 3 were assigned to a pollen supplementation treatment, the other 7 were left unmanipulated.

- I collected seed count data for every flower on every individual as my response variable, and recorded flower position on the flowering stem as an important covariate.

I am treating individuals as subjects for which i have repeated measures (multiple flowers per individual). I have strongly correlated within-individual responses with respect to flower position, so, I have tried to account for those correlated responses with my R-side covariance structure as follows:

proc glimmix;

      class treatment plot poll_supp flowpos_1;

      model nseeds = treatment poll_supp flowpos treatment*poll_supp

treatment*flowpos poll_supp*flowpos treatment*flowpos*poll_supp

/ dist=negbin offset=lntovules

            ddfm=kenwardroger solution;

      random flowpos_1/ subject=ind type=ar(1) residual;

      random plot;

run;

However, no matter how I tweak this code (e.g. dropping the higher order interactions, using different dist=, ddfm=, with or w/o offset=), so long as I include "random flowpos_1/ subject=ind type=ar(1) residual;" it blows up and gives me the Warning: obtaining minimum var. quadratic unbiased estimates as starting values for the covariance parameters failed. At first, I thought it was just a d.f. deficiency, so I dropped interactions, and used a copy variable flowpos_1 to order the observations for the random statement, using flowpos as a continuous variable instead (this is actually a preferable formulation, but I originally tried it to sneak back some d.f.'s). Plus, the dataset has ~407 observations, and fingermath on the number of parameters being estimated checked out ok, so I think I am ok for d.f. My second thought was that this is just a numerical issue because of strong imbalance in the observations at different levels of flowpos (many individuals with 1-10 flowers, very few with 11,12; actually only 2 observations at 12, w/ variance=0)... so I tried subsetting the dataset and dropping the observations at the highest levels of flowpos... but am still having the same issue...

So... My conclusion is that I am writing the code for that random statement incorrectly, but can't find any good examples to follow. If anybody has any advice or suggestions I would appreciate any help I can get. I know there are alternative formulations to account for my within individual correlated responses, but this seems like the formulation that does the best job of staying true to the biology in the experiment, and if I can avoid dropping the R-side covariance structure to account for within-individual correlated responses with respect to flowpos, I'd like to.

thanks again for any help or suggestions.

cheers,

Colin

4 REPLIES 4
SteveDenham
Jade | Level 19

I think this is a data dependent problem, more than a syntax problem.  I will bet that most of your information is at a limited number of flowpos values.  Reparameterizing these to something where the responses are more evenly spaced could help if that is the situation.

I also think you would have better luck not having two versions of the flowpos variable.  Without data to try this on, what about sorting the data set by plot treatment poll_supp ind flowpos?

Then trying to execute:

proc glimmix data=sorteddata initglm scoring=5;

      class treatment plot poll_supp flowpos ind;

      model nseeds = treatment poll_supp flowpos treatment*poll_supp

treatment*flowpos poll_supp*flowpos treatment*flowpos*poll_supp

/ dist=negbin offset=lntovules

            ddfm=kenwardroger solution;

      random flowpos/ subject=ind type=ar(1) residual;

      random plot;

run;

I included ind as a class variable.  The subject=ind assumes that all individuals are uniquely identified, so that an infinite likelihood is avoided. The initglm option should generate starting values based on excluding the random effects, and the scoring=5 will give five iterations of Fisher scoring to get things started in the right direction (hopefully).

Good luck on this one.

Steve Denham

ColinGradstudent
Calcite | Level 5

Hi Steve,

Thanks for taking the time to look this over.  My dataset is ordered by plot-ind-flowpos, and both plots and individuals are uniquely identified. Since treatments were applied to whole plots, and poll_supp applied to whole individuals, sorting by those two variables doesn't change the ordering. Unfortunately, I tried your suggestions (and a few tweaks on them), and still no luck... but now have more data (so to speak) on the problem.

  I tried chopping the data down so I had close to a balanced dataset with respect to flowpos, still the same warning... this, and a conversation with my supervisor led to  thinking that the problem may have to do with my choice of ar(1) for the covariance structure... I did have plants that were missing flowers (e.g. - had flowers 1,2, ,4,5, with flower 3 damaged or missing), and since ar(1) requires equal spacing (as you mentioned), we decided to try using the random statement:

     random _residual_ / subject=ind type=sp(exp)(flowpos_1);

I tried all the previous formulations and options with this new random statement, and still get the same warning message... in fact, there are only two things I can do that will not result in this warning message:

1) do not include flowpos in any way to order observations for estimating covariance parameters. For example, the model will run, and converge if I use:  "random _residual_ / subject=ind type=ar(1) vcorr ;"

However, my vcorr matrix is 405x405 (my total # of observations), and has a strange block-diagonal form, with blocks that correspond to plot rather than ind. This seems strange to me, and I'm not sure what in this random statement is telling SAS to do this. I get the warning if I use:

         "random _residual_ / subject=ind type=sp(exp)(flowpos_1) vcorr ;"

2) If I do not tell SAS to treat this as an R-side effect, it will at least go through optimization iterations (but fail to converge).  The minute I tell SAS to treat this as an R-side effect, I get the same old warning. But this is clearly an R-side covariance issue.  ....

So, I am vexed. Why would incorporating flower position as an index for the covariance structure cause this particular problem? and why does the fact that this is R-side result in failure to obtain minimum variance quadratic unbiased estimates as starting values for the covariance parameters?!?  Hmm...

anyway, thanks again for taking the time to think this through. I appreciate it.  My supervisor will probably take a crack at it on his own time... so while I'm farming out help, if you are interested enough to fiddle with this as well, I'd be happy to provide a copy of the relevant data. At this point I'm committed to figuring this out on principle!

thanks again,

Colin

ColinGradstudent
Calcite | Level 5

EUREKA!  sort of...

I discovered why I was getting 'The Warning':  While MOST of my observations came from flowers on the main flowering stalk, al had a few from axial stems.  Flowers on axial stems were numbered 1,2,3,... just like the flowers on the main stalk.  This led to duplicate flowpos values, which blew up the covariance parameter estimation. As this was my preliminary foray into this analysis, i didn't even include the variable 'axial'...

however, I still have a couple things I don't fully understand.  Even after putting a band-aid over the axial flower issue by restricting my analysis to flowers on the main stalk, I still have a strange vcorr matrix: a  #obs x #obs block diagonal matrix by plot.  Also, the Dimensions output reports "Max Obs per Subject: 403"... I would have thought that this should be the maximum flowpos value: 12.  Is there something I'm not understanding about how the R-side covariance matrix is constructed?

thanks again,

Colin

SteveDenham
Jade | Level 19

Hi Colin,

A couple thoughts--I think GLIMMIX defaults to the G matrix when counting subjects.  Not really sure, but that is what it looks like.  For repeated measures in time, we often use AR+RE models (autoregressive plus random effect), which gives the correct count of subjects.

These look like:

 

proc glimmix data=for_Stats1 ic=pq;

by param ;

class sex grp_no studyday anml_nbr ;

model value = grp_no

sex

studyday

grp_no*sex

sex*studyday

grp_no*studyday

grp_no*sex*studyday

cov/ ddfm=kr(firstorder);

random studyday /residual  type=&covtype subject= anml_nbr ;*group=grp_no;

%if "&covtype" = "AR(1)" OR "&covtype" = "ARH(1)" %then %do;

random intercept/subject=anml_nbr;

%end;

ods output fitstatistics = &outdata._a convergencestatus = &outdata.status ;

run;

You may need something similar to the second random statement to get the subject number where you want it.

My other thought is: Why AR(1) as a structure?  I would really go with the spatial covariances like sp(exp) or sp(pow), as you indicated above, given that the indexing reflects some sort of measurement. These are a lot better at handling missing values (my opinion only).  The other alternative might be a heterogeneous compound symmetry (CSH), as both AR(1) and the spatial covariances really depend on the indexing.  If things are unevenly spaced in reality, but indexed as 1, 2, 3, ..., you are making an assumption about the correlations that might be misleading.

Good luck.

Steve Denham

Message was edited by: Steve Denham

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 5393 views
  • 3 likes
  • 2 in conversation