Solved: Re: PROC GLIMMIX and Binary Predictors

kevinmc87 · Posted 03-13-2016 12:59 AM

Hi,

I am using a logistic discrete time hazard model to evaluate the 'hazard' of my event occuring at each of my twenty different time points.I am following the "Applied Longitiudinal Data Analysis" book by Singer. However she uses PROC LOGISTIC, and I am using PROC GLIMMIX, as I have a random effect in my model. I can use PROC GLIMMIX to do discrete time hazard modelling, correct?

My code is below.

proc glimmix data=blah pconv=1e-3;
class strata1 time1--time20;
model event(event=last)=time1--time20/
noint solution link=logit dist=binary;
nloptions tech=nrridg;
random intercept/subject=strata1;
covtest 'var(strata1)=0' 0/wald;
run;

kevinmc87 · Posted 03-22-2016 12:09 PM

I figured out my issue; it was in terms of how I had my data structured. Thank you for your time, IVM; I really appreciate it.

View solution in original post

lvm · Posted 03-14-2016 04:09 PM

GLIMMIX and LOGISTIC do not handle censoring. If you have no censoring, then you can use these procedures.

kevinmc87 · Posted 03-19-2016 10:15 PM

Hi,

Thanks for your reply. I am using GLIMMIX, but I am getting strange results. Here is the code that I have been running:

proc glimmix data=blah pconv=1e-3;

class strata1;

model event(event=LAST)=time1--time20/

noint solution link=logit dist=binary;

nloptions tech=nrridg;

covtest 'var(strata1)=0'/WALD;

random intercept/subject=strata1;

run;

Since I'm using a logistic discrete time hazard model (without any censored observations), I have my dataset constructed using the 'person-period' dataset. Here is an example of what a person-period dataset looks like:

id time1 time2 time3 time4 event
100 1 0 0 0 0
100 0 1 0 0 0
100 0 0 1 0 1
101 1 0 0 0 1
102 1 0 0 0 0
102 0 1 0 0 0
102 0 0 1 0 0
102 0 0 0 1 0

Essentially, each 'time' variable represents whether this period is occuring. So, time1=1 during the first period, 0 otherwise. And then time2=1 during the second period, 0 otherwise, and so on. I am modelling the probability that the event occurs during each of these periods. When I use PROC LOGISITIC, I get sensible parameter estimates.

proc logistic data=blah;

model event (event=LAST)=time1--time20 /noint;

run;

This code delivers parameter estimates for time1=-3.0052, which gives me a probability of the event occuring in time period 1 of .047. These estimates slowly get smaller, for each time[i] variable, which is what I would expect. However, when I run my GLIMMIX code and add in this random effect for strata1, it blows up my model - the parameter estimates for time flip their sign. time1=2.84, time2=2.67, time3=2.41, and they consistently get smaller. I'm really confused as to why- this model is telling me that the probability of the event occuring is over 90% in this period, which I know to be untrue. Does anyone have any idea why this is?

kevinmc87 · Posted 03-21-2016 04:19 PM

Hi,

So I know that was somewhat of a difficult to follow and long-winded explanation. Narrowing down my question to something easier to follow would be this: how does adding a random effect in to my model (which appears that it should belong in my model), completely inverse the 'hazard' rate of my predictors? Without the random effect, I have a hazard rate of .047 at time1. With the random effect, I have a hazard rate of .945 at time1. This doesn't make sense to me, and I'm wondering what my error could possibly be.

Thanks for your time.

lvm · Posted 03-21-2016 04:39 PM

You say that results are different between LOGISTIC and GLIMMIX. But do the results different within GLIMMIX with and without the use of the random statement? Not sure if you made this clear.

kevinmc87 · Posted 03-21-2016 05:15 PM

Sorry for not making that clear. But yes, the results of GLIMMIX differ greatly with and without the random effect. When I don't use the random effect in GLIMMIX, my results match the LOGISTIC procedure. When I do use the random statement, the signs of my time[i] predictors flip, and I get results that are basically the opposite of what I would expect, with the inverse hazard ratios I mentioned above.

lvm · Posted 03-21-2016 05:46 PM

Hard to tell what is happening based on what you gave us. What is the estimate of the random effect variance? Are all times represented in each level of strata? Under extreme situations, you could have confounding of the random effect with the fixed effects, with great imbalance.

Once you add a random effect in glimmix, the model fitting becomes restricted-pseudo-likelihood (default). (It is maximum likelihood without a random effect with binary data, same as LOGISTIC). This pseudo-likelihood method is often not a good choice when you have binary data. I suggest you try method=laplace option in the glimmix statement.

kevinmc87 · Posted 03-21-2016 06:12 PM

I wish I could provide output, but I am not able to transfer that data off of the server, unfortunately.

I have worked with this same data, although not in the wide format as I am now (as I currently have one record per time period; in the past I have had one record per ID when working with this data). I never had any issues with the random effect blowing up my model as it is currently doing.

Interestingtly, my estimate for the covariance parameter estimates for strata1 is 27. It seems that this value should be much, much lower, as my other regressions with this data (although not using a time hazard model), have shown this value to be around .05. All times are represented within each strata

I tried the method=laplace option and my model did not converge - I'm not sure what my next step should be here.

If you have any other ideas, I would love to hear them. Thanks for your time.

kevinmc87 · Posted 03-22-2016 12:09 PM

I figured out my issue; it was in terms of how I had my data structured. Thank you for your time, IVM; I really appreciate it.

Catch up on SAS Innovate 2026