Hi everyone,
Apologies if this is not the right place to ask this question.
I am really confused at a higher level in terms of using Cox regression vs Poisson....
It is cleat to me that Cox regression models time to event, and Poisson regression models counts or rates of events.
But the applications of these models are not always as clear...
Take this example:
I am looking at a cohort of cancer patients who had treatment X, all these patients were (initially) cured after treatment. However, some of them developed cancer recurrence after a period of time. I want to look at whether recurrence is higher in the group that has large cancer to start with.
So whether the patient had recurrence or not on follow up is important, and time to recurrence is important too here.
Traditionally I analysed this kind of data/question using Cox regression with tumour size (categorical variable, large vs small) included as an independent variable in the model.
proc phreg data=have ;
Title 'Cox for recurrence';
class Sex(ref='F') Size(ref='Small') ;
model time*Censor(0)= Age Sex Size /rl;
run;
However, one could argue that a Poisson regression could also be used in this this case (assuming the median and variance are equal in the sample), modelling the number of recurrences in the group with large tumour vs small tumour.
proc glimmix data=have;
class Sex(ref='F') Size(ref='Small');
model recurrence=Age Sex Size/ dist=poisson offset=logtime s cl link=log;
run;
I am really confused here, and would appreciate any opinion regarding how to choose the best approach.
Kind regards
Am
If you assume piecewise constant hazard rates, then the likelihood function (as a function of parameteres) has same form as if the number of events had been poisson distributed. Therefore, you can use poisson regression on time to event data where you have counts on left side in the model statement. If you chop the timeaxis into finer and finer pieces, then the model will be equivalent to a cox-regression, and in that case the difference is only that the parameter of the time-effect is non-parametric in the cox-regression while it will be estimated together with other parametes in the Poisson regression model.
Be aware, there is no assumption that the counts actually are poisson distributed - and they are obviously not since the count are limited by the number of subjects in the trial, whereas if it was poisson distributed then there would not be an upper limit. Still, it is called poisson regression because of the similarity of the likelihood functions. Since data are not poisson distributed, it will not give meaning to apply model check that rely on the poison distribution ("variance = mean" does not give sense here).
Thanks @Ksharp
Some of the observations are censored, as some patients reached end of follow up time and did not have an event and some patients died of other causes.
What i have read is that a Poisson regression could actually produce exactly the same results and parameter estimates as Cox model if you split p follow up time to very small intervals ... not sure if this is the case or how to do it in SAS though....
AM
The more I read about this the more convinced I am that you can get the same results with either...
See this paper in R on Cox vs Poisson
http://bendixcarstensen.com/WntCma.pdf
Here is a quick example i made, but needs further fine tuning to split the followup time for the poisson model to smaller intervals (p values not exactly the same for both models). Any further edits are appreciated. As you can see the Risk ratios (HR or RR) are the same)
* Example based on data from https://stats.idre.ucla.edu/sas/seminars/sas-survival/;
* Data whas500 can be downloaded form https://stats.idre.ucla.edu/sas/seminars/sas-survival/;
* I downloaded tge data to X:\Data
libname x 'x:\Data';
proc format ;
value gender
0 = "Male"
1 = "Female";
run;
data whas500;
set x.whas500;
format gender gender.;
logpyrs=log(lenfol);
run;
ods output parameterestimates=Cox;
proc phreg data = whas500 ;
class gender;
model lenfol*fstat(0) = gender age/rl;
run;
Data Cox2 ;
set Cox;
combined_RR=compress(put(exp(estimate),10.4) ||'(' ||put((HRLowerCL),10.4)||'-' || put((HRupperCL),10.4)||')');
Model='Cox';
p_value=ProbChiSq;
keep Model parameter combined_rR p_value;
run;
ods output parameterestimates=Poisson;
proc glimmix data=whas500 ;
model fstat=gender age/ dist=poisson offset=logpyrs s cl link=log;
run;
Data poisson2 (rename=(effect=parameter));
set poisson;
combined_RR=compress(put(exp(estimate),10.4) ||'(' ||put(exp(lower),10.4)||'-' || put(exp(upper),10.4)||')');
Model='Poisson';
p_value=probt;
Keep model effect combined_rr p_value;
where effect ne 'Intercept';
run;
Data comb;
length model $ 8;
set cox2 poisson2;
run;
proc sort data=comb;
by parameter;
run;
proc print data=comb;
var model parameter combined_rr p_value;
run;
Here is the result, not exactly the same but pretty close. IF anyone can help splitting follow up time maybe we could get it even closer.
Obs |
Parameter |
model |
RR (95% CI) |
p_value |
1 |
AGE |
Cox |
1.0691 (1.0562-1.0822) |
0.00000 |
2 |
AGE |
Poisson |
1.0758 (1.0626-1.0891) |
0.00000 |
3 |
GENDER |
Cox |
0.9365 (0.7110-1.2336) |
0.64096 |
4 |
GENDER |
Poisson |
0.9258 (0.7032-1.2187) |
0.58169 |
If you assume piecewise constant hazard rates, then the likelihood function (as a function of parameteres) has same form as if the number of events had been poisson distributed. Therefore, you can use poisson regression on time to event data where you have counts on left side in the model statement. If you chop the timeaxis into finer and finer pieces, then the model will be equivalent to a cox-regression, and in that case the difference is only that the parameter of the time-effect is non-parametric in the cox-regression while it will be estimated together with other parametes in the Poisson regression model.
Be aware, there is no assumption that the counts actually are poisson distributed - and they are obviously not since the count are limited by the number of subjects in the trial, whereas if it was poisson distributed then there would not be an upper limit. Still, it is called poisson regression because of the similarity of the likelihood functions. Since data are not poisson distributed, it will not give meaning to apply model check that rely on the poison distribution ("variance = mean" does not give sense here).
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.