Programming the statistical procedures from SAS

Should I use "Offset" or add a "rate" variable in the data set for Poisson regression?

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

Should I use "Offset" or add a "rate" variable in the data set for Poisson regression?

Hi All,

 

I ran into a Poisson regression analysis issue. My data contain "counts" for an event and "days" for the event. I am fosusing on the "rate", which is counts/days. Usually people would use Poisson regression to fit the rate data by doing:

 

Model Event = treat /offset=log(days).

 

However, I am wondering if it makes sense to add a "rate" variable in the data set by doing: event_rate = events/days, and then fit the Poisson model:

 

Model event_rate = treat.

 

Thanks for your comments and help.

 


Accepted Solutions
Solution
‎01-13-2017 09:19 AM
Respected Advisor
Posts: 4,747

Re: Should I use "Offset" or add a "rate" variable in the data set for Poisson r

[ Edited ]

The models look the same, but the likelihood equations aren't. The estimates will be the same (for offset and event_rate= event/days models) but the standard error of the estimates will differ. Look at this small simulation:

 

data a;
call streaminit(7576);
rate = 10;
do days = 10, 100;
    logDays = log(days);
    do i = 1 to 20;
        event = rand("Poisson", rate*days);
        event_rate1 = event / days;
        event_rate2 = event / days * mean(10, 100);
        event_rate3 = event / days * 1000;
        output;
        end;
    end;
run;

ods select ParameterEstimates(persist);

title "Estimate of log number per day with Offset";
proc glimmix data=a;
model event = / dist=poisson offset=logDays solution;
run;

title "Estimate of log rate per day";
proc glimmix data=a;
model event_rate1 = / dist=poisson solution;
run;

title "Estimate of log rate per 55 days";
proc glimmix data=a;
model event_rate2 = / dist=poisson solution;
run;

title "Estimate of log rate per 1000 days";
proc glimmix data=a;
model event_rate3 = / dist=poisson solution;
run;
                     Estimate of log number per day with Offset       

                               The GLIMMIX Procedure

                                Parameter Estimates

                                  Standard
         Effect       Estimate       Error       DF    t Value    Pr > |t|
         Intercept      2.3091    0.006720       39     343.62      <.0001
		 
                            Estimate of log rate per day                          

                                Parameter Estimates

                                  Standard
         Effect       Estimate       Error       DF    t Value    Pr > |t|
         Intercept      2.2992     0.05008       39      45.91      <.0001
		 
                          Estimate of log rate per 55 days                        

                                Parameter Estimates

                                  Standard
         Effect       Estimate       Error       DF    t Value    Pr > |t|
         Intercept      6.3065    0.006753       39     933.83      <.0001
		 
                         Estimate of log rate per 1000 days                       

                                Parameter Estimates

                                  Standard
         Effect       Estimate       Error       DF    t Value    Pr > |t|
         Intercept      9.2070    0.001584       39    5813.16      <.0001

 

PG

View solution in original post


All Replies
Respected Advisor
Posts: 4,747

Re: Should I use "Offset" or add a "rate" variable in the data set for Poisson r

[ Edited ]

Unless your number of days varies very little, you should stick with the offset model to get valid inferences.

 

If you decide to go with event_rate, it should be defined as event/days*meanDays, where meanDays is the mean number of days in your data.

PG
Contributor
Posts: 40

Re: Should I use "Offset" or add a "rate" variable in the data set for Poisson r

Hi PG,

 

Thanks for the comment. That is very helpful. Just for my personal education. Why the offset model gives better inference? Mathematically the offset model and the event_rate are the same, right?

 

log(u/t) = alpha+beta*x --> log(u) - log(t) = alpha+beta*x --> log(u) = log(t) + alpha+beta*x

 

 

Thanks,

Solution
‎01-13-2017 09:19 AM
Respected Advisor
Posts: 4,747

Re: Should I use "Offset" or add a "rate" variable in the data set for Poisson r

[ Edited ]

The models look the same, but the likelihood equations aren't. The estimates will be the same (for offset and event_rate= event/days models) but the standard error of the estimates will differ. Look at this small simulation:

 

data a;
call streaminit(7576);
rate = 10;
do days = 10, 100;
    logDays = log(days);
    do i = 1 to 20;
        event = rand("Poisson", rate*days);
        event_rate1 = event / days;
        event_rate2 = event / days * mean(10, 100);
        event_rate3 = event / days * 1000;
        output;
        end;
    end;
run;

ods select ParameterEstimates(persist);

title "Estimate of log number per day with Offset";
proc glimmix data=a;
model event = / dist=poisson offset=logDays solution;
run;

title "Estimate of log rate per day";
proc glimmix data=a;
model event_rate1 = / dist=poisson solution;
run;

title "Estimate of log rate per 55 days";
proc glimmix data=a;
model event_rate2 = / dist=poisson solution;
run;

title "Estimate of log rate per 1000 days";
proc glimmix data=a;
model event_rate3 = / dist=poisson solution;
run;
                     Estimate of log number per day with Offset       

                               The GLIMMIX Procedure

                                Parameter Estimates

                                  Standard
         Effect       Estimate       Error       DF    t Value    Pr > |t|
         Intercept      2.3091    0.006720       39     343.62      <.0001
		 
                            Estimate of log rate per day                          

                                Parameter Estimates

                                  Standard
         Effect       Estimate       Error       DF    t Value    Pr > |t|
         Intercept      2.2992     0.05008       39      45.91      <.0001
		 
                          Estimate of log rate per 55 days                        

                                Parameter Estimates

                                  Standard
         Effect       Estimate       Error       DF    t Value    Pr > |t|
         Intercept      6.3065    0.006753       39     933.83      <.0001
		 
                         Estimate of log rate per 1000 days                       

                                Parameter Estimates

                                  Standard
         Effect       Estimate       Error       DF    t Value    Pr > |t|
         Intercept      9.2070    0.001584       39    5813.16      <.0001

 

PG
Contributor
Posts: 40

Re: Should I use "Offset" or add a "rate" variable in the data set for Poisson r

Thanks @PG, this is very helpful!
Contributor
Posts: 40

Re: Should I use "Offset" or add a "rate" variable in the data set for Poisson r

Again, thanks @PG. Sorry for having so many questions. Your simulation is very helpful. However, if one wants to pick an estimate, which one should he use? 

 

It seems that using "Offset" gives pretty similar "estimate" as "Rate" does, although the SEs are different. But if using "Rate*Mean(days)", it gives completely different "estimate" but similar SEs.

 

I also tried my real data using PROC GENMOD on both Offset model and Rate model. It seems that the two give pretty close results in terms of "estimate" and "SE".

 

Here are my code snippets:

 

1. Poisson model with offset

 

proc genmod data=final;
 title "BV-003 Part 2 - Poisson Model";
 class ptno treat;

 model numcr = treat /type3 dist=poi offset=logdayct;
 repeated subject=ptno;

 estimate 'D vs A' treat -1 1 0/exp;
 estimate 'E vs A' treat -1 0 1/exp;
 estimate 'D vs E' treat 0 1 -1/exp;
run;

2. Poisson model with rate data:

 

proc genmod data=final;
 title "BV-003 Part 2 - Poisson Model on the Cramp/Day Ratio";
 class ptno treat;

 model crp_day = treat /type3 dist=poi;
 repeated subject=ptno;

 estimate 'D vs A' treat -1 1 0/exp;
 estimate 'E vs A' treat -1 0 1/exp;
 estimate 'E vs D' treat 0 1 -1/exp;
run;

And here are the outputs:

For 1,

1.PNG

 

For 2,

2.PNG

 

Super Contributor
Posts: 287

Re: Should I use "Offset" or add a "rate" variable in the data set for Poisson r

I think that you can model the rate and get exactly same estimates and standard errors as if you model the count. But, the person-years should then be used in the weight statement.

 

The two proc genmod statements below seems to give exactly same output.

data test;

do i=1 to 100;

pyrs=rand('uniform',0,5);

a=rand('bernoulli',0.5);

y=rand('poisson',pyrs*exp(2+a*1));

logpyrs=log(pyrs);

rate=y/pyrs;

output;

end;

run;

proc genmod data=test;

class a(ref="0");

model y=a/dist=poisson link=log offset=logpyrs;

run;

proc genmod data=test;

class a(ref="0");

model rate=a/dist=poisson link=log;

weight pyrs;

run;

 

Contributor
Posts: 40

Re: Should I use "Offset" or add a "rate" variable in the data set for Poisson r

Thanks, @JacobSimonsen. This is very helpful.
Super Contributor
Posts: 287

Re: Should I use "Offset" or add a "rate" variable in the data set for Poisson r

its because the expression for the likelihood function λx e-pyrs*λ is the same as (λx/pyrs e)pyrs

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 501 views
  • 9 likes
  • 3 in conversation