Re: Mixed Model Logistic Regression Longitudinal Data Help

ROLuke91 · Posted 06-22-2018 03:44 PM

Hello,

I have a fairly nuanced and lengthy question and greatly appreciate any time taken to read and recommend suggestions or advice given.

I am trying to perform a mixed model logistic regression analysis with auto-correlated data. I have a sample of 702 individuals that were measured 11 times sequentially to see if they are yes (1) or no (0) for an attribute at each time point. I'm therefore trying to regress the dichotomous outcome on the single independent variable of measurement time point - measuring the OR of the individuals having the attribute vs. not having the attribute based on the specific time point (as represented by a categorical/class variable with 10 distinct strata and 1 reference).

My understanding was that this situation is best modeled by mixed regression analysis because of the existence of fixed effects (e.g. sample-wide trajectory of proportion of 0/1s for each time point), as well as the random slopes and intercepts resulting from each of the 702 individual possessing his/her own trajectory of 0/1s across the measurement period. Additionally, I would have to specify the correlation structure so that it is autocorrelative.

My initial research and consultation has lead me to a few possible options:

1.) Run mixed model logistic regression using PROC genmod with the following:

proc genmod data=data descending;
class id timepoint /param=ref ref = first;
model dichoOutcome = timepoint / dist=bin link=logit type3;
repeated subject = id / within = timepoint type=ar(1) corrw;
run;

When I perform this, however, I get the following error:

WARNING: The generalized Hessian matrix is not positive definite. Iteration will be terminated.
ERROR: Error in parameter estimate covariance computation.
ERROR: Error in estimation routine.

Not sure how to address this...

2.) Run a normal logistic regression using PROC GENMOD, and then use those obtained parameters as the starting values for PROC NLMIXED.

proc genmod data=data descending;
class ID timepoint /param=ref ref = first;
model dichoOutcome = timepoint / dist=bin link=logit;
run;

And then I would proceed to use NLMIXED with my values from PROC GENMOD to something like this:

proc nlmixed data=data;
parms b0 = b1 = b2 = b3 = b4 = b5 = b6 = b7 = b8 = b9 = b10
s2u = s2f= cuf=0;
xb = b0 + u + timepoint*(b1+rb1) +.....+ timepoint *(b10+r10);
p = exp(xb)/(1+exp(xb));
model dichoOutcome ~ binary(p);
random u rb1 ~ normal([0,0],[s2u,cuf,s2f]) subject=id;
run;

However, I'm not sure how to generate the starting parameter values for s2u (random intercept) and s2f (random slope). I'm not sure how to use PROC GENMOD to obtain these values.

3.) Using PROC GLIMMIX:

proc glimmix data=data;
class id timepoint /ref = first;
model dichoOutcome = timepoint / dist=bin link=logit;
random intercept / subject = id type=ar(1);
run;

However, this does not converge, and I'm not sure how to troubleshoot or adjust.

Could anybody please provide some guidance in how I should be better addressing my question or specifying models? Happy to provide as much more information as I can to better facilitate understanding....

Best,

Luke

StatDave · Posted 06-22-2018 04:24 PM

Your first model with GENMOD seems appropriate for the data - though it fits a Generalized Estimating Equations (GEE) model, not a random effects model if that is specifically what you want. The error is probably only occurring much for the same reason as separation prevents model fitting in a regular logistic model when the data are too sparse. You might be able to estimate the model if you make the data less sparse by merging together some of your 11 time points. A table of time*response might help to show how to do the merging.

ROLuke91 · Posted 07-02-2018 03:48 PM

Thank you very much for the response, StatDave. I appreciate the guidance.

Unfortunately, I'm not really sure how to account for the sparse data issue in a manner that preserves the intention of the analysis. We are attempting to hone in on, specifically, each of the 10 time points (since the 11th is our reference beginning point) and their proportion of "successes" in the dichotomous outcome, out of total responses. It is pertinent that we ideally preserve the separate time points as much as possible, or at the very least find an effective manner to condense them into blocks (e.g. "beginning", "middle", and "end", that end up with 3/3/4 time points in each respective block). Given that our outcome variable is binary, I'm not sure how to effectively merge time points while preserving integrity of the analysis (e.g. if an observation has 0/1/0 in the "beginning" block, it wouldn't really be appropriate to umbrella it into either values of "0" or "1").

The only solution to our problem and research goals that I can see is to do a Poisson or Negative Binomial Regression Analysis by transforming our outcome variable into count data. Since we are concerned about the proportion of successes at each particular time point, the only way I could see to model this would be to sum all of the total successes at each time point and model them as a proportion of total observations at each time point (successes/total non-missing per time point). However, my initial attempts at this analysis are showing that my specified model is saturated and is a poor fit.

I can/will post some of my coding attempts to paint a better picture of what I'm asking, but in the meantime - do you have any thoughts or advice on my interpretation and attempt to proceed? Any guidance would be very graciously appreciated.

Best,

Luke