BookmarkSubscribeRSS Feed
Tarek_Elnaccash
Calcite | Level 5

I am trying to estimate covariance components from PROC MIXED and having trouble with convergence problems in the REML approach when I use the NOBOUNDS option. Let me explain why I am using NOBOUNDS in case this is the problem,

I am trying to estimate genetic covariances among 12 traits in a group of self-fertilizing plants.  There are ~400 plants and 160 genotypes.  The easy approach to estimating covariances is to calculate the covariance among genotype means.  This approach introduces some bias into covariance estimates, so I want to compare estimates of variance and covariance components.  I’ll refer to these approaches as “Among-genotype covariance” and “covariance components” approaches.

To calculate between trait covariance components, I organized my data (for 160 genotypes) like this:

Genotype         Trait                 Value

1                      Biomass           1.28

1                      Biomass           1.07

1                      Biomass           1.97

1                      TLength           1070.7

1                      TLength           875.14

1                      TLength           1547.93

To get the (genetic) covariance component between Biomass and TLength, I used this code:

PROC MIXED DATA=multi56 covtest cl;

      WHERE trait IN ("Biomass","TLength")

      CLASS genotype trait;

      MODEL value=trait;

      RANDOM trait/SUBJECT=genotype TYPE=un;

      /* The RANDOM statement tells SAS to estimate the 2 x 2 among genotype variance component matrix for the two traits listed in the where statement */

      RUN;

The resulting covariance parameter output was:


Covariance Parameter Estimates

Cov Parm         Subject Estimate       Standard Error Z Value   Pr Z   Alpha   Lower  Upper

UN(1,1)           Genotype         0          .           .           .           .           .           .

UN(2,1)           Genotype         50450       19630       2.57     0.0102 0.05     11975  88924    

UN(2,2)           Genotype         316010      90306       3.50     0.0002 0.05     193511   606860

Residual                                55618       3437.23     16.18   <.0001 0.05     49451  63021

The covariance estimate of 50450 is waaay bigger than the estimate I got from the covariance among the genotype means for Biomass and TLength (the Among genotype approach) which equals 257.1.  SO I figured the 0 variance component above could be the problem.  I ran the same PROC MIXED code above only adding the NOBOUND option:

PROC MIXED DATA=multi56 covtest cl NOBOUND ;

Now my covariance component estimates nearly match as I expect them to:

Iteration History

Iteration           Evaluations      -2 Res Log Like  Criterion

0                    1                    10109.62259781                  


WARNING: Stopped because of infinite likelihood.

Covariance Parameter Values

At Last Iteration

Cov Parm         Subject   Estimate

UN(1,1)           genotype  -34470

UN(2,1)           genotype   249.06

UN(2,2)           genotype  188855

Residual                         83814

NOTE: 125 observations are not included because of missing values.

WARNING: Stopped because of infinite likelihood.

NOTE: PROCEDURE MIXED used (Total process time):

real time           0.11 seconds

      cpu time            0.03 seconds

The covariance component estimate of 249 I got by allowing negative variances is much more reasonable than 50450, considering the estimate from the among genotype covariance was 257.  The problem is that the REML approach stopped and gave me warnings.  Does this mean I cannot trust this covariance component output?

I thoguth about running PROC MIXED with METHOD=TYPE3, but I can’t do that because I have SUBJECT=genotype TYPE=un as part of my random statement.  What is the best way to get the correct between trait covariance components?

8 REPLIES 8
SteveDenham
Jade | Level 19

How about shifting over to GLIMMIX, and trying the following:

PROC GLIMMIXED DATA=multi56 tech=quad cholesky;

      WHERE trait IN ("Biomass","TLength")

      CLASS genotype trait;

      MODEL value=trait;

      RANDOM trait/SUBJECT=genotype TYPE=chol;/* I think that using the Cholesky parameterization would probably be useful here, and in the model statement*/

      /* The RANDOM statement tells SAS to estimate the 2 x 2 among genotype variance component matrix for the two traits listed in the where statement */

      RUN;

If this runs into the same problems, then I would suggest trying tech=laplace and adding NOBOUND.

Steve Denham

Tarek_Elnaccash
Calcite | Level 5

Thank you Steve for the helpful responses. My problem is not solved, but I appreciate your help.

STEVE, (first response):I tried using GLIMMIX using the code below. Glimmix converged on an answer but the covariance estimate was 142 and should be closer to 257.  That's a big improvement over the covariance estimate of 50,000 from prox mixed, but not as good as the proc mixed estimate of 249 which I got from including the NOBOUNDS option (which resulted in infinite likelihood). This value of 249 for the biomass TLength covariance is what it should be, but I believe reviewers wont accept the result with an infinite likelihood and only 1 REML iteration.

PROC GLIMMIX DATA=multi56 METHOD=  QUAD CHOLESKY;

  WHERE trait IN ("Biomass","Tlength");  *converges but still bad estimate 142 for covariance instead of 250;

  CLASS genotype trait;

  MODEL value=trait;

  RANDOM trait/SUBJECT=genotype TYPE=CHOL;

  RUN;

When I tried METHOD=Laplace with the NOBOUND option, I got a good estimate of the Biomass, TLength covariance (249) but I got the warning :"The initial estimates did not yield a valid objective function", which puts me in the same place as the initial infinite likelihood.

1zmm
Quartz | Level 8

Read the PROC MIXED documentation section on convergence problems.  This section discusses both the problem of an infinite likelihood and the problem when two of the covariance parameters are several orders of magnitude apart, which seems to be true in your example.  This section also discusses possible solutions to each of these problems.

SteveDenham
Jade | Level 19

Given what 1zmm said, preprocess your data, and divide all of the TLength values by 1000.  Then see how things behave.  Also, in my previous post it is PROC GLIMMIX not GLIMMIXED.

Steve Denham

Tarek_Elnaccash
Calcite | Level 5

Steve2: I can't pre-process my data because I have to compare the covariance components for biomass and Tlength (and actually for a covariane matrix of 13 traits) to another matrix based on covariances among genotype means.  Changing the units would keep me from comparing two different matrices.  One coworker suggested I add a repeated statement to PROC MIXED which would estimate the within genotype covariance components between traits (the RANDOM statement measures the among genotype covariance components). I am only interested in the among genotype covariance components, not the within genotype, but I tired it anyway:

PROC MIXED DATA=multi56 covtest cl;

  WHERE trait IN ("Biomass","TLength");

  CLASS genotype trait plant;

  MODEL value=trait;

  RANDOM trait/SUBJECT=genotype TYPE=un;

  REPEATED trait/SUBJECT=plant(genotype) TYPE=un;

  RUN;

The covariances came out to be ~191 but still had an infinite likelihood problem(see below) I'm not sure how to proceed. That is, I need to figure out why I have an infinite likelihood when I use the NOBOUNDS option and get the "right" covariance estimate of 249. I need reasonable estimates for between trait among genotype covariance components that don't come with a "you can't trust these answers"-type warning

WARNING: Stopped because of infinite likelihood.

     

UN(1,1)genotype0.2871
UN(2,1)genotype191.61
UN(2,2)genotype154388
UN(1,1)plant(genotype)0.1935
UN(2,1)plant(genotype)140.16
UN(2,2)plant(genotype)167192

Thanks again for your help

Tarek


****************************************

Tarek Elnaccash

Post Doctoral Associate

Stephen Tonsor Laboratory

University of Pittsburgh

Tarek.Elnaccash@gmail.com

Tarek_Elnaccash
Calcite | Level 5

1zmm: thank you, I should have looked at the PROC MIXED documentation more closely first.  The documentation did suggest resealing TLength so it has a similar variance to Biomass. Unfortunately, this is not an option for me.  I estimated a genetic covariance matrix by calculating the covariance among traits using genotype means as my observational unit (about 3 plants per genotype).  Reviewers said these estimates are not as good as those using variance components and covariance components. So now I have to produce a covariance component based matrix which can then be compared to a matrix based on the covariance among genotype means.  Since the covariance component based matrix is going to be compared to another, I can't rescale my data and still be able to compare.  I also tried other documentation suggestions: 1. setting the PROC MIXED option SCORING=5 but still got an infinite likelihoood. 2.  Changing the convergence criteria with the ABSOLUTE option didn't help 3. I was going to set the initial likelihood search using PARMS, but I only have good starting estimates for the biomass and TLength variances and their covariance, but I do not have a good starting estimate for the residual variance.  When I use random numbers for the initial value of the  residual variance, I can produce essentially any value I want in the output for the covariance parameters. Whatever approach I use to get a good estimate for the covariance component between these two traits has to work for all pairs of 13 traits that make up my full covariance matrix. 

1zmm
Quartz | Level 8

Try rescaling your original data so that the variables you are trying to estimate have the same order of magnitude to see if the infinite likelihood problem resolves.  If this problem does resolve, then re-rescale your resulting variance and covariance estimates using the standard formulas (see, for example, the entry for "Variance" on Wikipedia):

   Var(aX) = a*a*Var(X),

and

   Cov(aX,bY) = a*b*Cov(X,Y).

For example, if Y is about 1,000 times the magnitude of X, rescale Y by dividing it by 1,000.  Estimate the variance-covariance matrix for X and the rescaled Y using SAS's PROC MIXED or PROC GLIMMIX.  Then the variance of the original Y would equal 1,000*1,000*Var(Rescaled-Y) = 1,000,000*Var(Rescaled-Y).  The covariance between X and the original Y would equal

1*1,000*Cov(X, Rescaled-Y). = 1,000*Cov(X, Rescaled-Y).

Although tedious to perform for 13 traits (169 terms in the variance-covariance matrix), this may help you to solve your problem.

SteveDenham
Jade | Level 19

I have to echo 1zmm regarding rescaling.  You can always back calculate once the model converges.  At this point, I am absolutely convinced it is the only way to avoid the infinite likelihood trap.

Steve Denham

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 9644 views
  • 6 likes
  • 3 in conversation