Let's go through these by number:
My questions are:
Does my action plan for addressing this exploratory question seem correct? Not really sure why you collapsed a multinomial response down to a binomial, but I think your approach is defensible. It seems like a good start, in any case.
Am I correct in keeping only those participants that have both a baseline visit, as well as visit 5 assessment for the final analysis dataset, and excluding the rest? Or is this problematic? If yes, what are some correct alternatives? This is certainly a case where, if the data are MNAR (missing, not at random), you might want to restrict the analysis to the complete records. However, if your data are MAR or MCAR, the maximum likelihood analysis can handle the missing values.
Are there any pre-modeling visualization techniques that I can/should use to further explore my data? Is it ok to use boxplots to look at look at the distribution of my continuous variable at each level of the binary outcome? Should I maybe use point-biserial correlation first to see if there’s any evidence of a relationship at all between my predictor and dependent variable before fitting the model? If yes, is there such a thing as point-biserial correlation for repeated measures data, or should I just use the baseline values of the variables? What do you expect to learn from the boxplots? The point-biserial issue can be addressed by a cluster approach--plot time vs independent variable with the binary outcome as two different colors - see the second example in PROC FASTCLUS as an approach..
Is my model setup correct/complete? Your model assumes that the values at the various visits are not correlated. You may wish to impose some sort of covariance structure on visit.
How can I check to see if my model fits the data well? I know that for regular linear models, there’s residual plots, QQ plots, check for outliers and influential points etc. But not sure what kind of model diagnostics are best for GLMMs? This is undoubtedly one of the hardest questions in GLMMs. You have two outcomes, so you may want to look at cutpoints in your model for classification of false positives and false negatives (ROC curve). GLIMMIX doesn't do this automatically like LOGISTIC, but I am sure there are examples on the web of how to do this with DATA steps and PROC SGPLOT.
Any other suggestions/recommendations? Not at this stage, but eventually you will probably want to work with the multinomial response. At that point, things get much fuzzier (in my opinion).
SteveDenham
... View more