Solved: Re: Cox regression vs Poisson regression for analysis

ammarhm · Posted 05-02-2020 01:37 AM

Hi everyone,

Apologies if this is not the right place to ask this question.

I am really confused at a higher level in terms of using Cox regression vs Poisson....

It is cleat to me that Cox regression models time to event, and Poisson regression models counts or rates of events.
But the applications of these models are not always as clear...

Take this example:

I am looking at a cohort of cancer patients who had treatment X, all these patients were (initially) cured after treatment. However, some of them developed cancer recurrence after a period of time. I want to look at whether recurrence is higher in the group that has large cancer to start with.

So whether the patient had recurrence or not on follow up is important, and time to recurrence is important too here.

Traditionally I analysed this kind of data/question using Cox regression with tumour size (categorical variable, large vs small) included as an independent variable in the model.

proc phreg data=have ;
Title 'Cox for recurrence';
class Sex(ref='F')   Size(ref='Small') ;
model time*Censor(0)=   Age Sex Size /rl;
run;

However, one could argue that a Poisson regression could also be used in this this case (assuming the median and variance are equal in the sample), modelling the number of recurrences in the group with large tumour vs small tumour.

proc glimmix data=have;
class Sex(ref='F')   Size(ref='Small');
model recurrence=Age Sex Size/ dist=poisson offset=logtime s cl link=log;
run;

I am really confused here, and would appreciate any opinion regarding how to choose the best approach.

Kind regards

Am

JacobSimonsen · Posted 05-05-2020 05:42 AM

If you assume piecewise constant hazard rates, then the likelihood function (as a function of parameteres) has same form as if the number of events had been poisson distributed. Therefore, you can use poisson regression on time to event data where you have counts on left side in the model statement. If you chop the timeaxis into finer and finer pieces, then the model will be equivalent to a cox-regression, and in that case the difference is only that the parameter of the time-effect is non-parametric in the cox-regression while it will be estimated together with other parametes in the Poisson regression model.

Be aware, there is no assumption that the counts actually are poisson distributed - and they are obviously not since the count are limited by the number of subjects in the trial, whereas if it was poisson distributed then there would not be an upper limit. Still, it is called poisson regression because of the similarity of the likelihood functions. Since data are not poisson distributed, it will not give meaning to apply model check that rely on the poison distribution ("variance = mean" does not give sense here).

View solution in original post

Ksharp · Posted 05-02-2020 07:17 AM

Yeah. They have something in common.
If your data is not censored , Poisson Regression is also a right model , otherwise Life analysis.

ammarhm · Posted 05-02-2020 07:31 AM

Thanks @Ksharp

Some of the observations are censored, as some patients reached end of follow up time and did not have an event and some patients died of other causes.

What i have read is that a Poisson regression could actually produce exactly the same results and parameter estimates as Cox model if you split p follow up time to very small intervals ... not sure if this is the case or how to do it in SAS though....

AM

Ksharp · Posted 05-02-2020 07:48 AM

Yes. I think so . I don't know if the parameter estimators is the same as Survival Analysis because I don't test it before.

But you really could use Poisson regression if data is NOT censored.
Check PROC GENMOD 's documentation,in it there is an example about using survival data to build a Poisson . Put time at OFFSET= option as a measure unit of COUNT.

proc genmod.......;
model count=x y z/ offset=time dist=poisson;
run;

ammarhm · Posted 05-02-2020 08:40 PM

The more I read about this the more convinced I am that you can get the same results with either...

See this paper in R on Cox vs Poisson

http://bendixcarstensen.com/WntCma.pdf

Here is a quick example i made, but needs further fine tuning to split the followup time for the poisson model to smaller intervals (p values not exactly the same for both models). Any further edits are appreciated. As you can see the Risk ratios (HR or RR) are the same)

* Example based on data from https://stats.idre.ucla.edu/sas/seminars/sas-survival/;
* Data whas500 can be downloaded form https://stats.idre.ucla.edu/sas/seminars/sas-survival/;
* I downloaded tge data to X:\Data

libname x 'x:\Data';


proc format ;
value gender
	  0 = "Male"
	  1 = "Female";
run;

data whas500;
set x.whas500;
format gender gender.;
logpyrs=log(lenfol);
run;

ods output parameterestimates=Cox;

proc phreg data = whas500 ;
class gender;
model lenfol*fstat(0) = gender age/rl;
run;

Data Cox2 ;
set Cox;
combined_RR=compress(put(exp(estimate),10.4) ||'(' ||put((HRLowerCL),10.4)||'-' || put((HRupperCL),10.4)||')');
Model='Cox';
p_value=ProbChiSq;
keep Model parameter combined_rR p_value;

run;

ods output parameterestimates=Poisson;
proc glimmix data=whas500 ;
model  fstat=gender age/ dist=poisson offset=logpyrs s cl link=log;
run;

Data poisson2 (rename=(effect=parameter));
set poisson;
combined_RR=compress(put(exp(estimate),10.4) ||'(' ||put(exp(lower),10.4)||'-' || put(exp(upper),10.4)||')');
Model='Poisson';
p_value=probt;
Keep model effect combined_rr p_value;
where effect ne 'Intercept';
run;
Data comb;
length  model $ 8;
set cox2 poisson2;
run;

proc sort data=comb;
by   parameter;
run;
proc print data=comb;
var model parameter combined_rr p_value;
run;

Here is the result, not exactly the same but pretty close. IF anyone can help splitting follow up time maybe we could get it even closer.

Obs	Parameter	model	RR (95% CI)	p_value
1	AGE	Cox	1.0691 (1.0562-1.0822)	0.00000
2	AGE	Poisson	1.0758 (1.0626-1.0891)	0.00000
3	GENDER	Cox	0.9365 (0.7110-1.2336)	0.64096
4	GENDER	Poisson	0.9258 (0.7032-1.2187)	0.58169

Ksharp · Posted 05-03-2020 07:12 AM

Maybe @StatDave @Rick_SAS @lvm could give you exact answer.

JacobSimonsen · Posted 05-05-2020 05:42 AM

If you assume piecewise constant hazard rates, then the likelihood function (as a function of parameteres) has same form as if the number of events had been poisson distributed. Therefore, you can use poisson regression on time to event data where you have counts on left side in the model statement. If you chop the timeaxis into finer and finer pieces, then the model will be equivalent to a cox-regression, and in that case the difference is only that the parameter of the time-effect is non-parametric in the cox-regression while it will be estimated together with other parametes in the Poisson regression model.

Be aware, there is no assumption that the counts actually are poisson distributed - and they are obviously not since the count are limited by the number of subjects in the trial, whereas if it was poisson distributed then there would not be an upper limit. Still, it is called poisson regression because of the similarity of the likelihood functions. Since data are not poisson distributed, it will not give meaning to apply model check that rely on the poison distribution ("variance = mean" does not give sense here).

Catch up on SAS Innovate 2026