I am trying to estimate covariance components from PROC MIXED and having trouble with convergence problems in the REML approach when I use the NOBOUNDS option. Let me explain why I am using NOBOUNDS in case this is the problem,
I am trying to estimate genetic covariances among 12 traits in a group of self-fertilizing plants. There are ~400 plants and 160 genotypes. The easy approach to estimating covariances is to calculate the covariance among genotype means. This approach introduces some bias into covariance estimates, so I want to compare estimates of variance and covariance components. I’ll refer to these approaches as “Among-genotype covariance” and “covariance components” approaches.
To calculate between trait covariance components, I organized my data (for 160 genotypes) like this:
Genotype Trait Value
1 Biomass 1.28
1 Biomass 1.07
1 Biomass 1.97
1 TLength 1070.7
1 TLength 875.14
1 TLength 1547.93
To get the (genetic) covariance component between Biomass and TLength, I used this code:
PROC MIXED DATA=multi56 covtest cl;
WHERE trait IN ("Biomass","TLength")
CLASS genotype trait;
MODEL value=trait;
RANDOM trait/SUBJECT=genotype TYPE=un;
/* The RANDOM statement tells SAS to estimate the 2 x 2 among genotype variance component matrix for the two traits listed in the where statement */
RUN;
The resulting covariance parameter output was:
Covariance Parameter Estimates
Cov Parm Subject Estimate Standard Error Z Value Pr Z Alpha Lower Upper
UN(1,1) Genotype 0 . . . . . .
UN(2,1) Genotype 50450 19630 2.57 0.0102 0.05 11975 88924
UN(2,2) Genotype 316010 90306 3.50 0.0002 0.05 193511 606860
Residual 55618 3437.23 16.18 <.0001 0.05 49451 63021
The covariance estimate of 50450 is waaay bigger than the estimate I got from the covariance among the genotype means for Biomass and TLength (the Among genotype approach) which equals 257.1. SO I figured the 0 variance component above could be the problem. I ran the same PROC MIXED code above only adding the NOBOUND option:
PROC MIXED DATA=multi56 covtest cl NOBOUND ;
Now my covariance component estimates nearly match as I expect them to:
Iteration History
Iteration Evaluations -2 Res Log Like Criterion
0 1 10109.62259781
WARNING: Stopped because of infinite likelihood. |
Covariance Parameter Values
At Last Iteration
Cov Parm Subject Estimate
UN(1,1) genotype -34470
UN(2,1) genotype 249.06
UN(2,2) genotype 188855
Residual 83814
NOTE: 125 observations are not included because of missing values.
WARNING: Stopped because of infinite likelihood.
NOTE: PROCEDURE MIXED used (Total process time):
real time 0.11 seconds
cpu time 0.03 seconds
The covariance component estimate of 249 I got by allowing negative variances is much more reasonable than 50450, considering the estimate from the among genotype covariance was 257. The problem is that the REML approach stopped and gave me warnings. Does this mean I cannot trust this covariance component output?
I thoguth about running PROC MIXED with METHOD=TYPE3, but I can’t do that because I have SUBJECT=genotype TYPE=un as part of my random statement. What is the best way to get the correct between trait covariance components?
How about shifting over to GLIMMIX, and trying the following:
PROC GLIMMIXED DATA=multi56 tech=quad cholesky;
WHERE trait IN ("Biomass","TLength")
CLASS genotype trait;
MODEL value=trait;
RANDOM trait/SUBJECT=genotype TYPE=chol;/* I think that using the Cholesky parameterization would probably be useful here, and in the model statement*/
/* The RANDOM statement tells SAS to estimate the 2 x 2 among genotype variance component matrix for the two traits listed in the where statement */
RUN;
If this runs into the same problems, then I would suggest trying tech=laplace and adding NOBOUND.
Steve Denham
Thank you Steve for the helpful responses. My problem is not solved, but I appreciate your help.
STEVE, (first response):I tried using GLIMMIX using the code below. Glimmix converged on an answer but the covariance estimate was 142 and should be closer to 257. That's a big improvement over the covariance estimate of 50,000 from prox mixed, but not as good as the proc mixed estimate of 249 which I got from including the NOBOUNDS option (which resulted in infinite likelihood). This value of 249 for the biomass TLength covariance is what it should be, but I believe reviewers wont accept the result with an infinite likelihood and only 1 REML iteration.
PROC GLIMMIX DATA=multi56 METHOD= QUAD CHOLESKY;
WHERE trait IN ("Biomass","Tlength"); *converges but still bad estimate 142 for covariance instead of 250;
CLASS genotype trait;
MODEL value=trait;
RANDOM trait/SUBJECT=genotype TYPE=CHOL;
RUN;
When I tried METHOD=Laplace with the NOBOUND option, I got a good estimate of the Biomass, TLength covariance (249) but I got the warning :"The initial estimates did not yield a valid objective function", which puts me in the same place as the initial infinite likelihood.
Read the PROC MIXED documentation section on convergence problems. This section discusses both the problem of an infinite likelihood and the problem when two of the covariance parameters are several orders of magnitude apart, which seems to be true in your example. This section also discusses possible solutions to each of these problems.
Given what 1zmm said, preprocess your data, and divide all of the TLength values by 1000. Then see how things behave. Also, in my previous post it is PROC GLIMMIX not GLIMMIXED.
Steve Denham
Steve2: I can't pre-process my data because I have to compare the covariance components for biomass and Tlength (and actually for a covariane matrix of 13 traits) to another matrix based on covariances among genotype means. Changing the units would keep me from comparing two different matrices. One coworker suggested I add a repeated statement to PROC MIXED which would estimate the within genotype covariance components between traits (the RANDOM statement measures the among genotype covariance components). I am only interested in the among genotype covariance components, not the within genotype, but I tired it anyway:
PROC MIXED DATA=multi56 covtest cl;
WHERE trait IN ("Biomass","TLength");
CLASS genotype trait plant;
MODEL value=trait;
RANDOM trait/SUBJECT=genotype TYPE=un;
REPEATED trait/SUBJECT=plant(genotype) TYPE=un;
RUN;
The covariances came out to be ~191 but still had an infinite likelihood problem(see below) I'm not sure how to proceed. That is, I need to figure out why I have an infinite likelihood when I use the NOBOUNDS option and get the "right" covariance estimate of 249. I need reasonable estimates for between trait among genotype covariance components that don't come with a "you can't trust these answers"-type warning
WARNING: Stopped because of infinite likelihood. |
UN(1,1) | genotype | 0.2871 |
---|---|---|
UN(2,1) | genotype | 191.61 |
UN(2,2) | genotype | 154388 |
UN(1,1) | plant(genotype) | 0.1935 |
UN(2,1) | plant(genotype) | 140.16 |
UN(2,2) | plant(genotype) | 167192 |
Thanks again for your help
Tarek
****************************************
Tarek Elnaccash
Post Doctoral Associate
Stephen Tonsor Laboratory
University of Pittsburgh
1zmm: thank you, I should have looked at the PROC MIXED documentation more closely first. The documentation did suggest resealing TLength so it has a similar variance to Biomass. Unfortunately, this is not an option for me. I estimated a genetic covariance matrix by calculating the covariance among traits using genotype means as my observational unit (about 3 plants per genotype). Reviewers said these estimates are not as good as those using variance components and covariance components. So now I have to produce a covariance component based matrix which can then be compared to a matrix based on the covariance among genotype means. Since the covariance component based matrix is going to be compared to another, I can't rescale my data and still be able to compare. I also tried other documentation suggestions: 1. setting the PROC MIXED option SCORING=5 but still got an infinite likelihoood. 2. Changing the convergence criteria with the ABSOLUTE option didn't help 3. I was going to set the initial likelihood search using PARMS, but I only have good starting estimates for the biomass and TLength variances and their covariance, but I do not have a good starting estimate for the residual variance. When I use random numbers for the initial value of the residual variance, I can produce essentially any value I want in the output for the covariance parameters. Whatever approach I use to get a good estimate for the covariance component between these two traits has to work for all pairs of 13 traits that make up my full covariance matrix.
Try rescaling your original data so that the variables you are trying to estimate have the same order of magnitude to see if the infinite likelihood problem resolves. If this problem does resolve, then re-rescale your resulting variance and covariance estimates using the standard formulas (see, for example, the entry for "Variance" on Wikipedia):
Var(aX) = a*a*Var(X),
and
Cov(aX,bY) = a*b*Cov(X,Y).
For example, if Y is about 1,000 times the magnitude of X, rescale Y by dividing it by 1,000. Estimate the variance-covariance matrix for X and the rescaled Y using SAS's PROC MIXED or PROC GLIMMIX. Then the variance of the original Y would equal 1,000*1,000*Var(Rescaled-Y) = 1,000,000*Var(Rescaled-Y). The covariance between X and the original Y would equal
1*1,000*Cov(X, Rescaled-Y). = 1,000*Cov(X, Rescaled-Y).
Although tedious to perform for 13 traits (169 terms in the variance-covariance matrix), this may help you to solve your problem.
I have to echo 1zmm regarding rescaling. You can always back calculate once the model converges. At this point, I am absolutely convinced it is the only way to avoid the infinite likelihood trap.
Steve Denham
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.