08-04-2015 05:29 AM
I would like some help on my modeling for the following situation.
I have a dataset with 200 subjects. For each subject (indicated by ID), their situation is determined at three moments, being either 'living at home', 'institutionalized', or 'dead' (situation is my dependent outcome). At the first measurement moment, every subject lives at home. I want to see which variables are associated with transition to one of the other two situations.
The variable 'time' indicates whether it is the first, second or third measurement moment.
The dataset also includes a number of variables, for which I would like to check if this is related to the transition. Some are continuous (cqtarief, mzleef), but the most important one is ordinal with 4 categories (vht).
I've tried this code:
proc glimmix data=long ic=q;
class ID index1 vht relation situation;
model situation(ref=first) = relation cqtarief mzleef vht /dist=multinomial link=glogit solution;
I have the following questions:
- I can't include "random index1 / subject=volgnr type=ar(1) residual;" because i have multinomial distribution. How can I model the dependency of the measurements within subjects? Or does the procedure take care of this?
- How can I compare models with different covariates?
- How do I assess model fit?
- Do I need to fit covariance structures (e.g. un/cs/ar(1))?
Thanks for any help on this matter!
08-04-2015 09:26 AM
This is a case where you are restricted to fitting a GLMM rather than a GEE model. If your data are equally spaced in time, simply remove the option 'residual' from your RANDOM statement. Personally, with 3 measures on time (index1), I would fit it as an unstructured matrix.
Now, this results in looking at your fixed effects averaged over the values of index1. To look for associations, you will likely need to model index1 as a fixed effect, and include interactions with each of your other fixed effects. And so the model becomes more complex, which will lead to difficulties in comparing models with different covariates. To get information criteria, you will have to try to fit the model using a quasi-likelihood method such as METHOD=LAPLACE. Changes relative to a "null" model would give some measure of how much information is retained using the model at hand (which gets around having to nest the models, and looking at changes in -2LL). It also gives a measure of model fit.
proc glimmix data=long ic=q method=laplace;
class ID index1 vht relation situation;
model situation(ref=first) =index1 relation relation*index1 cqtarief cqtarief*index1 mzleef mzleef*index1 vht vht*index1/dist=multinomial link=glogit solution;
random index1/subject=ID type=un;
I am also a bit confused by the subject in your example (which is subject=volgnr). Is this the same as ID? If not, then does it need to be on the CLASS statement, or will that lead to output with hundreds (or more) levels?
08-06-2015 10:20 AM
Thank you so much for this reply. Including the interaction terms makes sense indeed.
However, with the code you sent I get an error message saying that I have to include a group. So I added this part:
index1/subject=ID group=situation type=un;
(by the way, volgnr is indeed the same as ID, I translated it but forgot that one)
However, I'm not sure what the consequences are for including that group=situation statement.
I'm currently thinking about the possibility to use my outcome as a binary one (living at home vs not living at home).
Can I just replace "/dist=multinomial link=glogit" with "/dist=binary" ?
08-11-2015 08:15 AM
Can you copy in the error message? The random statement that it looks like you are using will fit separate unstructured matrices for each level of situation which is the dependent variable, and while that might work (I don't know as I've never seen that error and so have never used that syntax), it also greatly increases the number of parameters to be estimated, and hence the need for a LOT more data to get good values.
As far as moving to a binary response, I think that will really help. However, to really get things to work you may have consider "collapsing" the data to something that looks like a successes/trials syntax for each timepoint, moving to dist=binomial, and identifying something other than ID as the subject.