BookmarkSubscribeRSS Feed
lei
Obsidian | Level 7 lei
Obsidian | Level 7

Hi,

 

i am trying to model predictors of wait time using negative binomial regression (poisson regression had over dispersion). The wait time is from two events A and B. In terms of the study design, we followed patients for the same amount of time starting from event A. However when event B occurred, we stopped counting. The maximum follow up for each person is 120 days.

 

I dont believe offsetting is necessary in this case. However, there are disagreements amongst a couple of us.

 

i am also wondering whether I should use another model. This isn't censored data. No person is lost to follow up and everyone has event B within the follow up period. 

 

Any thoughts?

 

Thanks.

 

 

15 REPLIES 15
Ksharp
Super User

I don't think it is a good idea to model wait time as poisson/negative binomial distribution which both are for discrete variable, while wait time is continuous variable. It is more like survival analysis problem or just use Multivariate Regression Procedures .

lei
Obsidian | Level 7 lei
Obsidian | Level 7

Hi Keshan,

 

Thank you. I tried OLS on this data and it has failed every assumption possible. I was thinking of implementing a generalized linear model using a gamma distribution. Would you say that this approach is going towards the right direction?

 

Thank you.

JacobSimonsen
Barite | Level 11

Waiting time can be modelled with poisson regression, but not with negative binomial regression. This is becuase the likelihood for exponential distributed data (or even just data with piecewice constant rate)  is the same as the likelihood from poisson distributed data.

 

In this example I model an event which has a constant value of 1 with poisson regression, to estimate the rate which true value is 1/5.

data mydata;
 event=1;
  do i=1 to 1000;
    t=rand('exponential',5);
	logt=log(t);
	output;
  end;
run;

proc genmod data=mydata;
  model event=/dist=poisson link=log offset=logt;
  estimate 'rate' intercept 1/exp;
run;

But it is not meaningfull to talk about overdispersion, because that is a term which use assumption about the distribution. The data is clearly not poisson-distributed, but as the likelihood is the same as from poisson distrubuted data, poisson regression can be used to estimate parameters.

 

lei
Obsidian | Level 7 lei
Obsidian | Level 7

thank you Jacob,

 

this is really awesome. Is there any reference for this? I would like to read up more if possible.

 

I see you used the offset function. In my case, do you think I would need to use it too? 

 

Thankyou.

 

Shirley 

JacobSimonsen
Barite | Level 11

Hi Lei,

It is described very detailed in my favourite book: "Survival and Event History of Analysis" of Odd Aalen, page 223.

 

But the quick argument is like this:

 

Let T be exponential distributed with rate exp(β).  Make the likelihood function for β.

Then lets say X is Poisson distributed with intensity t*exp(β). Make the likelihod function for β.

Observe that these two likelihood functions are the same if you put log(t) as offset in the latter.

lei
Obsidian | Level 7 lei
Obsidian | Level 7

Hi,

 

Thank you. I have this book. This is super helpful.

 

I also have another question too since this is your favorite book. I am also conducting a survival analysis using cox proportional hazards model. My covariate of wait time is continuous in this case and I am looking at 5 year survival. The wait time is from surgery to chemotherapy. Clinically, it is proven that giving the too soon will cause greater mortality and giving them too late will have no effect. Therefore the effect of wait time on survival should be non-linear. I used a cubic spline model and tested the nonlinearity assumption and it showed that the relationship is linear. I have a feeling it's because there are about 2% of the whole cohort that's in the group with really short wait time. Is there a way to estimate the wait time for only starting at day 30 until 120? One method that I have thought of is setting day 30 as time 0 and then counting upwards that way.

 

Does that make sense?

 

Thanks.

lei
Obsidian | Level 7 lei
Obsidian | Level 7

Hi Jacob,

 

I am still a bit confused. How should I interpret the results from the poisson. For example, my wait time is measured by days. If I model it like you have described and have a covariates for age. How does that work? Is there any SAS documentation for this type of modeling. I have searched everywhere and can't find anything.

 

 

thanks. 

JacobSimonsen
Barite | Level 11

In case your age variable is time dependent (age increase as time increase) then you have to divide each waiting time into smaller pieces, where the first intervals will end up with a censoring (count=0) and only the last interval will end up with a event (or censoring if the original waiting time was censored). The offset should be log(length of interval).

If you just want to have age at beginning of the interval, you just put that into the model as a covariate, without splitting up the interval.

 

On the left side in the model statement you have the count which is either 0 or 1.

 

If you want to speed up the calculation time, you can summarize your data and have sum-of-events on left side and use log(sum-of-length of-waiting time) as offset.

lei
Obsidian | Level 7 lei
Obsidian | Level 7

I would like age at the beginning of the interval, I am just super confused how I would interpret the coefficients. For example, I got for age 0.2334. in addition, I tried using a cox model too. For age both came out as significant. However for all other covariates none came out significant in poisson but half came out significant in the cox. I am not sure how to understand these differences.

JacobSimonsen
Barite | Level 11
I agree that Cox regression and Poisson regression usually gives almost same result. That is because the difference between the two models are very small. Its likely that you did something wrong.
lei
Obsidian | Level 7 lei
Obsidian | Level 7

I ran the following codes:

 

PROC GENMOD DATA = CRC.PRED1;
CLASS FEMALE (REF='0')/PARAM=REFERENCE;
MODEL CHEMO =AGE FEMALE /DIST=POISSON LINK=LOG oFFSET=LNWAIT1;
ESTIMATE 'RATE' INTERCEPT 1/EXP;
RUN;

 

PROC PHREG DATA = CRC.PRED1;
CLASS FEMALE (REF='0')/PARAM=REFERENCE;
MODEL WAIT1*CHEMO(0) =AGE FEMALE;
RUN;

 

Here CHEMO=1 means event occured and 0 means event did not occur. All people  in the dataset had the event. I've attached the results. You can see they're quite different. I'm not really sure what I did wrong here if you can provide any type of guidance that would be really appreciated.

JacobSimonsen
Barite | Level 11

I think you did it correct.

 

The difference must be that the cox-regression allows the baseline rate to depend on the underlying time.

In contrast, Poissonregression have a constant baseline hazard, which you estimate with the intercept.

 

If you are very eager to see if this the difference is due to time, then you can divide your observations into pieces, and make a Poissonregression that include a timedependent variable, "time", that says what timeinterval that are observed. Remember that only the last interval should have can have an event.

 

 

lei
Obsidian | Level 7 lei
Obsidian | Level 7

Hi I think that makes a lot of sense.

 

I have a question though. In the beginning of my planning, I actually wanted to use a cox-regression. However, my dataset had no censored individuals (or all people chemo=1). I was told by another methodologist that I can't use cox-regression. The reasoning provided to me wasn't very clear at all. It seems like by your explanation previously that I can use cox-regression. Am I correct?

 

Thanks.

JacobSimonsen
Barite | Level 11

Of course you can use Cox regression also when all individuals have an event. As in this simulated example where all individuals have an observed event. The cox regression estimate the rate-ratio fairly close to the true hazard ratio.

data simulation;
  do i=1 to 1000;
    exposure=(i<=500);
	t=rand('exponential',exp(-0.5*exposure));
	output;
  end;
run;
proc phreg data=simulation;
  model t=exposure;
run;

 

 Estimation problems will only occur if all individuals in one level of exposure are censored, or if all indivudals from one level come have events before any of the individuals form other levels.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 4170 views
  • 3 likes
  • 3 in conversation