BookmarkSubscribeRSS Feed
keckk
Fluorite | Level 6
hello all!

I am trying to model the relationship between the response variable (respiration rate mean_rr) of two genotypes dependent on the thermal environment (expressed as HLI). The animals were observed/measured at different locations (farm), naturally at varying thermal conditions. Animals of both genotypess have been on each farm.
Instead of using repeated measures I tried with a random coefficient regression model:

proc mixed data=x
class animal farm genotype;
model mean_rr = genotype HLI genotype*HLI /solution htype=1;
random farm;
random int HLI / subject=animal solution;
run;

I expect the slope to depend not only on the genotype, but also to vary among the animals of one genotype and between farms (different weather conditions).
So I tried to "translate" this as:

proc mixed data=x
class animal farm genotype;
model mean_rr = genotype HLI genotype*HLI /solution htype=1;
random farm;
random int genotype*HLI / subject=animal solution; run;

or even:

proc mixed data=x
class animal farm genotype;
model mean_rr = genotype HLI genotype*HLI /solution htype=1;
random farm;
random int HLI gentoype*HLI / subject=animal solution; run;

---> with the former I don't geht estimates for the random farm and random intercept and DF = .
and the latter does not converge.

My plots clearly display genotype effects, but the two converging models only reflect this with the htype=1 option. The default (Type 3) does not compute significant effects as one would expect looking at the graphs.
Is it correct to use the Typ I hypothesis tests ?

Would anyone help me to find out how to correctly test the difference between the regression slopes ? Which one of the above models might get me on the right track ?

Thanks very much for any help,
Karen
15 REPLIES 15
SteveDenham
Jade | Level 19
I'm looking at the description of the data, and I wonder if just some minor changes might help. Since obviously each animal is not seen on each farm, the subject effect should be OK, unless for some unknown reason the ID for animal is not unique to the farm.

Second, I think leaving out one term may be affecting what is happening--something involving farm by HLI. The models I see here don't have this effect, and it may be what you are seeing in the plots. Of course, this may be a matter of complete confounding, and is covered by the farm effect Have you tried:

proc mixed data=x
class animal farm genotype;
model mean_rr = genotype HLI genotype*HLI /solution htype=1;
random intercept/subject=farm;
random int HLI gentoype*HLI / subject=animal(farm) solution; run;

I certainly see nothing wrong with the type 1 tests, since most of the work is regression based, but that is just my opinion.

Have you tried this in GLIMMIX, with method=quad? The documentation (at least for 9.2) looks like it has something like this in the example under TECH=.

Good luck.

Steve Denham
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
With RC models, one usually needs to fit a model with a nonzero covariance for the slope and intercept among subjects. The model in this post is for variance components, with an implied covariance of 0. Based on SAS for Mixed Models (2nd edition), and other sources, I would add type=UN as option on the second random statement. Trying to fit a model with 0 covariance here, when almost certainly there is nonzero covariance, will make it hard to fit the model. However, from my experience, one could still have convergences issues with so many variance-covariance terms, unless there are a lot of observations.
keckk
Fluorite | Level 6
Thanks very much for your comments.
The animal ID is unique, the animals are NOT numbered within each farm, but across all farms (1 to 26), that's probably why the code you proposed ends up with exactly the same covariance parameter estimates, but no convergence.
Fitting the same model with unstructured covariance, I face convergence problems too.

With the simpler model version I get estimates = 0 for the random farm and random intercept, I may have expressed it unclearly (no estimates)
proc mixed data=x
class animal farm genotype;
model mean_rr = genotype HLI genotype*HLI /solution htype=1;
random farm;
random int gentoype*HLI / subject=animal solution; run;

I tried to look for the example in the 9.2. documentation you mentioned, but was not successful. Did you mean:
http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect03... ?

Thanks!
Karen
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
Convergence can be a big problem. You have a large covariance matrix (with UN), and unless you have a very large number of observations, convergence could easily be a problem. The example you give form the documentation is good. But you should read chapter 8 in the reference book (which you have to buy or get from your library -- it is not online):
SAS for Mixed Models, 2nd ed. 2006. By R. Littell, G. Milliken, W. Stroup, R. Wolfinger, and O. Schabenberger. Published by SAS Press.

You probably need to simplify the model, at least to start with, and consider genotype (and it's interaction) as fixed only. Then see if you can get a fit with the more typical RC model:
proc mixed data=x
class animal farm genotype;
model mean_rr = genotype HLI genotype*HLI /solution htype=1;
random farm;
random int HLI / subject=animal solution TYPE=UN; run;

If this doesn't work, you probably can't get convergence. You really need a covariance structure with nonzero covariances for the slope and intercept (UN). Convergence can also be related to an inappropriate model (e.g., nonlinear relationship between the response and HLI.
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
The more I think about, the more I am convinced that you do not have sufficient number of observations (unique values for each farm, genotype, and HLI) to fit the complex RC models. I can't tell from your original post, but you would need several different HLI (and response) values for each animal and each farm. It also does not make sense to me (the way you explained it) to consider animal to be a subject for genotype (as in your code). A single animal can only be one genotype.
keckk
Fluorite | Level 6
Thanks very much for your interest and helpful comments !

I have read chapter 8 of SAS for mixed models and studied its 3 examples for RC models. I understand that a random intercept and the continuous variable of the fixed effects part is as well a part of the random effects part of the model in order to fit a (random) slope for each subject.
Though, it doesn't answer the problem of having opposing results for the genotype effect dependent on using Type I or Type III tests (only the former reflecting the plots).

Consistent with the principles of RC models, I have multilevel data: measurements/observations have been made on the animal level on several days (replicates), so I consider animal as my experimental unit, the observed animals (of both genotypes) have been part of herds at different farms, animals of the same farm were naturally observed at the same HLI (HLI measured on the farm level): so maybe that's what SteveDenham meant, when mentioning confounding (involving HLI and farm) ?
I am interested in the genotype effect. I expect the slope not only to depend on the genotype (which I thought to express by the random genotype*HLI term), but to vary between farms and between animals of the same genotype which biologically makes sense to me (influence of animal specific effects causing variation between the animals of the same genotype).

I was trying to fit the simpler RC model:
proc mixed data=x
class animal farm genotype;
model mean_rr = genotype HLI genotype*HLI /solution htype=1;
random farm;
random int HLI / subject=animal solution TYPE=UN; run;

but the iteration process stops because of too many likelihood evaluations (whether I use random farm; or random int /subj=farm); there is convergence if I use random int /subj=farm combined with subject=animal(farm) in the second random statement, but the solution for random effects comes up with INFTY in the t-value column for the int and HLI effect.
Disregarding correlation of the random coefficients (default vc model), I have convergence but estimates=0 (DF=., SE=.) for the random intercept and random farm effect as mentioned with the more complex RC in the preceding posts,which is irritating.

Concerning the animal to be the subject, I assumed subject=animal to reflect the model at the animal-level as they are the experimental units and I would like to test the genotype effect on the animal-level.
I agree with you that an animal can only be of one genotype, so I tried to fit the simpler model again with random in HLI / subject=animal(genotype) type=UN or subject=animal(genotype*farm) type=UN, with the same results as above (stopped because of too many likelihood evaluations or INFTY in the random effects table).

Primarily, I was orientating on the following publication (bottom of page 10, RC regression approach): http://www2.sas.com/proceedings/forum2007/178-2007.pdf
where student corresponds to animal, teacher corresponds to farm and ses to genotype; the school level is missing in my data.

I wonder if the problem are the HLI values which are the same for the animals within a farm, or if I do not have sufficient number of observations (3 replicates per animal) ?

Any idea what I can do about that ?
SteveDenham
Jade | Level 19
I am now convinced that it is the confounding of HLI and farm that is leading to the problem. The hard part is separating this out. Have you tried the simpler model:

proc mixed data=x
class animal farm genotype;
model mean_rr = genotype HLI genotype*HLI /solution htype=1;
random int HLI / subject=animal solution TYPE=UN; run;

to at least see what might happen?

Steve Denham
keckk
Fluorite | Level 6
OK, I tried the simpler model without farm, with several response variables.

I do have convergence and estimates for the random intercept and slope (HLI), in one case there's INFTY in the t-value column (solution for random effects) and σa2 variance=0 (UN(1,1)) for the random slope.
With the third response variable, there's no convergence with this model (too many likelihood evaluations).

Using the default variance component model with 0 covariance, I consistently have convergence, but with estimates=0 for the random intercepts.

I considered to use subject=farm as in example 8.3. in chapter 8 of SAS for Mixed Models, and I have convergence, but not with all response variables. Again, estimates=0 for random intercept or slope default using the vc model, but convergence.

I am not entirely sure, but zero estimates for either random intercept or random slope do not make sense with a rc model, do they?
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
I really think the problem is insufficient number of observations. It sounds like there is just one HLI value for each subject (animal). If this is correct, then there is no way of estimating this covariance matrix.
keckk
Fluorite | Level 6
No, there are nine HLI values per subject (animal, n=26), but for the animals of the same farm (and because observed on the same day) HLI values are the same.

Due to missing observations, the number of response values varies from 1 (rarely) up to 9 values.
Could it be a problem ?
statsplank
Calcite | Level 5
Hello,

First, lets rephrase the initial problem to capture its main points:

There are N_f farms. For each farm, there is a measure of its thermal environment HLI. HLI is continuous. There is one value of HLI per farm. There are N_a animals at each farm. In total, there are N_f*N_a animals. The individual animal ID is unique. In total, there are N_f*N_a unique IDs in the data set. Animals at each farm are of two genotypes. The respiration rate mean_rr is an outcome. It is measured (observed) once for each animal. In total, there are N_f*N_a observations in the data set.

The HLI is measured at the farm level. ***If HLI is different for all N_f farms (there are N_f distinct values of HLI)***, then HLI and farm are confounded thus no random farm effect should be specified.

Assuming the *** statement is correct, we fit the following model to test if slopes for HLI are the same between the two genotypes:

proc mixed data=x;
class genotype;
model mean_rr = genotype HLI genotype*HLI /solution;
run;

Or, in Proc GLM:

proc glm data=x;
class genotype;
model mean_rr = genotype HLI genotype*HLI;
run;

There is no random coefficient for HLI. HLI is a covariate in your problem, a continuous variable we adjust for (analysis of covariance chapter in "SAS System For Mixed Models"). Moreover, if only main effect for HLI was included in the model it would have no effect on significance testing of genotype as genotype is the animal level effect thus within subject variance (unaffected by having only main effect for HLI) is used in test of significance.

UPD. Now, suppose the observed HLI levels came from the population of the HLI levels (HLI is random). We can fit such model as follows:

proc mixed data=x;
class genotype;
model mean_rr = genotype /solution;
random HLI genotype*HLI;
run;
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
I think that this is a good explanation why the model desired by the original poster was overparameterized (in terms of random effects). Random-coefficient mixed models are useful for many purposes, but they probably are not appropriate for this situation.
keckk
Fluorite | Level 6
Thank you very much for your help. I was thinking about your explanations.

> There are N_f farms. For each farm, there is a
> measure of its thermal environment HLI. HLI is
> continuous. There is one value of HLI per farm. There
> are N_a animals at each farm. In total, there are
> N_f*N_a animals. The individual animal ID is unique.
> In total, there are N_f*N_a unique IDs in the data
> set. Animals at each farm are of two genotypes. The
> respiration rate mean_rr is an outcome. It is
> measured (observed) once for each animal. In total,
> there are N_f*N_a observations in the data set.

The outcome (respiration rate) has not been measured once for each animal, but there are 9 replicates at best (missing values) as data arise from 9 observation days. That's why I assumed a random coefficient type of model would be useful as it would fit random regression models for each animal instead of using a repeated measures analysis (since I am not interested in the time effect; a suggestion in: http://www2.sas.com/proceedings/forum2007/178-2007.pdf
The number of animals N_a is not equal across farms N_f, but at each farm there is an equal number of the two genotypes (2x2, 5x5, 6x6..).

Does it make any difference?

> The HLI is measured at the farm level. ***If HLI is
> different for all N_f farms (there are N_f distinct
> values of HLI)***, then HLI and farm are confounded
> thus no random farm effect should be specified.
>
> Assuming the *** statement is correct, we fit the
> following model to test if slopes for HLI are the
> same between the two genotypes:
>
> proc mixed data=x;
> class genotype;
> model mean_rr = genotype HLI genotype*HLI /solution;
> run;
>
> Or, in Proc GLM:
>
> proc glm data=x;
> class genotype;
> model mean_rr = genotype HLI genotype*HLI;
> run;

Your *** statement is correct.
I tried the simple analysis of covariance without using any random effects at all:
proc mixed data=x;
class genotype;
model mean_rr = genotype HLI genotype*HLI /solution;
run;
Again, only type I tests of fixed effects show a distinct genotype effect as it is obvious from the scatter-plots, but default type III tests do not.

> There is no random coefficient for HLI. HLI is a
> covariate in your problem, a continuous variable we
> adjust for (analysis of covariance chapter in "SAS
> System For Mixed Models"). Moreover, if only main
> effect for HLI was included in the model it would
> have no effect on significance testing of genotype as
> genotype is the animal level effect thus within
> subject variance (unaffected by having only main
> effect for HLI) is used in test of significance.
>
> UPD. Now, suppose the observed HLI levels came from
> the population of the HLI levels (HLI is random). We
> can fit such model as follows:
>
> proc mixed data=x;
> class genotype;
> model mean_rr = genotype /solution;
> random HLI genotype*HLI;
> run;

This approach results in estimates=0 for the random effect genotype*HLI (using the solution option).

Your approach assuming HLI to be continuous and to represent the population of HLI levels (which is appropriate) is fine, but it doesn't allow me to show the increase in respiration rate with increasing HLI, depending on the genotype (genotype*HLI in the fixed effects part of the model), the different slopes, respectively. This is what is most interesting when looking at the plots.
SylviaK
Calcite | Level 5

I am working with a similar model and was wondering if you should have a repeated statement if you took samples on the animals over time?  My trial tested animals once weekly so I included not only a random statement but also a repeated week statement.  Just wondering what people think.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 3369 views
  • 0 likes
  • 5 in conversation