07-01-2015 05:12 PM
I am using Proc Glimmix to run a glmm. The response is time. There is one random effect and two fixed effects. In testing for model assumptions, a Levene's test indicated there are unequal variances in my comparison of interest. This can be corrected by using an inverse transformation of my data (1/time). I can use this transformed variable and run the model, but interpreting the error estimates for the parameter means (I want std. errors) in the original scale seems messy (impossible?).
Based on another thread I saw here, I used a link=reciprocal term with the original untransformed time variable with the intent to use ilink in the lsmeans statement to get the parameter and error estimates in the original scale of the data. However, when I use the link=reciprocal term, the model won't converge. I've tried using common convergence issue solutions (e.g. changing maxiter in the nlptions statement), but that doesn't help.
Am I using link= appropriately? Should I also change the distribution (I had defaulted to normal, but also tried gamma)?
07-02-2015 03:37 PM
I have had pretty good luck fitting half-life estimates (T1/2) with the link=reciprocal option. I am going to guess that you have some extremely long values for time, and some pretty short ones. Can you share two things--the range for time (with units), and your GLIMMIX code. If you have an additional source of heterogeneity of variance, that may be causing the failure to converge.
07-03-2015 12:36 PM
The minimum time value is 23 days and the max value is 77 days.
My code is:
proc glimmix data=IMMDur noitprint;
class block sex temp diet;
model time=diet sex temp sex*temp sex*diet temp*diet /solution ddfm=kr2 link=reciprocal;
lsmeans temp/ilink diff lines adjust=tukey;
lsmeans sex/ilink diff lines adjust=tukey;
lsmeans temp*sex/ilink diff lines adjust=tukey;
lsmeans diet/ ilink diff lines adjust=tukey;
5 categories of diet, 2 for sex, 2 for temp.
This was the Levene's test (the heterogeneity seemed to be within the 'diet' groups):
proc sort data=IMMdur;
proc glm data=IMMdur;
class block temp;
means temp/ hovtest=levene (type=abs) welch;
If I run the same model without specifying a link function, but using 'invtime' (inverse-transformed response), the model convergences. I had assumed the two syntax would be equivalent -- what is the difference?
07-03-2015 06:47 PM
For your one question, the link is definitely not the same as a transformation. The distinction can be subtle, but important. The link is a function of the expected value (i.e., mean), not a function of the response variable; in contrast, a transformation is a function of the response variable (not the mean). Say, z = 1/time (the transformation you mention). We can also write f(time) = 1/time. Then in GLIMMIX (or MIXED), one is modeling (without random effects)
E(z) = E(f(time)) = E(1/time) = X.beta
Here, E(.) is the expectation. You are dealing with means of the transformation. Back-transforming does NOT give you the mean of time, but a shifted value away from the mean.
A link can be written generically as g(mu), which in your case is g(E(time)), a function of the expected value. So, with link=inverse, you are modeling
g(E(time)) = 1/E(time) = 1/mu = X.beta
Note that the expectation in the first case is of the function of time, but with the link, one has a function of the expectation. The back transformation of the link, known as the "inverse link" (a term unrelated to the use of 1/mu as the link), gives the actual expected value of time (your response variable). In general, use of link functions is preferred because one is working with means of the original response variable.
With that said, inverses are tricky for modeling purposes, whether as data transformations or links. These can be especially problematical when the response varies greatly, with some points very close to 0. This is not the the situation with your data, apparently. There are lots of reasons why one can have problems with convergence. Can't tell based on your post. Maybe the algorithm is getting close or is stuck at a saddle point in the log-likelihood response surface. You could try some other optimization methods:
NLOPTIONS TECH = ;
You could try QUANEW or TRUREG or NRRIDG for technique; there are others listed in the user's guide.
If you are willing to move away from normal distributions, you have many possibilities. You mentioned that you have unequal variances. This is an essential property of variables with gamma or exponential distributions, or many other distributions. You could model the data with a gamma distribution, with time as your response variable (not 1/time). In this case the default link is log. "Inverse link" gives you expected time values.
07-05-2015 07:41 PM
Thank you both for your replies.
Ivm: I'm fine not using a normal distribution if justified, but how would you recommend determining which distribution is best for the data (normal with an inverse transformation vs. gamma with an inverse link)? Running each model and then comparing some parameter (e.g. -2 Res Log Pseudo Likelihood) under the fit statistics output?
07-06-2015 09:07 AM
Typically, you base distribution choice on properties of the response variable. Conditional residuals (histograms) can give hints, but keep the variable in mind. Time is a good example: it is bounded by zero, with no upper limit, and it is typically right skewed. This is a property of the gamma distribution. By the way, you don't choose a link function to stabilize variances (with non-normal distributions), you choose a link to provide a linear scale for the predictors. The unequal variances are automatically handled by the chosen (non-normal) distribution. For instance, with gamma, var(Y) = a.mu^2, where mu is the mean. So, one is automatically taking care of the fact that the variance is increasing quadratically with the mean. Once you move over into generalized linear models or generalized linear mixed models, lots of things change.
07-06-2015 01:51 PM
To see the differences in a reasonable manner, simulate some right skewed data. Compare the following analyses: untransformed, log transformed outside of GLIMMIX fit with a normal distribution, untransformed with a log link, untransformed with a lognormal distribution, untransformed with a gamma distribution.
All of the above may fit the data, some better than others, and there may be a variety of problems/opportunities arising during the fit. In this case, any of the methods may seem like a reasonable choice--and it is very difficult to select one over the others based on objective measures arising from the likelihood function. The key is that the expected values of the estimate are different in each of the cases (although 2 and 3 look the same to me). Specifying a distribution and/or using a specific link function should also consider the process by which the dependent variable is generated.
If it is waiting times being modeled, a gamma link is justifiable on theoretical grounds--and that should satisfy folks.
Except it seldom does, especially if there is a history in the field of using untransformed data. (Yes, I am looking at the clinical pathologists now.)