Re: Parametric bootstrap: Poisson regression

T_Reddy · Posted 04-26-2020 05:27 PM

Hi there. I am trying to use the parametric bootstrap to construct prediction intervals for 5, 10 and 15 day forecasts of the cumulative cases of a disease . I have fitted a three parameter logistic model to cumulative cases and use these parameter estimates to simulate daily new cases (the derivative of cumulative cases which is a function of cumulative cases) for the 1000 datasets. My bootstrap confidence intervals do not contain the true value for the various predictions though and I am not sure what I am doing wrong. I would greatly appreciate any assistance.

/* Simulate multiple samples from a regression model */

data explanatory;set covid_window;keep time i_25 i_30 i_35 i_40;where time le 48;run;

proc print data=parm2_log3pl;var parm_log3pl est_log3pl;run;

%let NumSamples = 1000; /* number of samples */

data RegSim;

call streaminit(123);

set Explanatory; /* implicit loop over obs*/

ObsNum = _N_; /* observation number */

b1=2037.03 ;

b11=2555.72;

b2=0.3301;

b21=-0.2659;

b24=0.07861;

b23=0.001893;

b22=0.003788;

b3=22.8728;

b31=17.8590;

gamma=b2+(b21*i_25)+(b22*i_30)+(b23*i_35)+(b24*i_40);

eta=b3+(b31*i_25);

alpha=b1+(b11*i_25);

mu = alpha/(1+exp(-gamma*(time-eta)));

lambda = (gamma*mu)*(1-(mu/alpha));

do SampleID = 1 to &NumSamples;

Y = rand("Poisson",lambda);

output;

end;

run;

proc sort data=RegSim; /* sort for BY-group processing */

by SampleID ObsNum;

run;

/********* Now generate cumulative************/

data RegSim;set RegSim;by SampleID;

if first.SampleID then

do;

total_cases = 0;

end;

total_cases + Y;

run;

data pred_window;

do SampleID=1 to 1000;

do time=49 to 100;

output;

end;

run;

data covid_window;set RegSim pred_window;run;

data covid_window;set covid_window;

i_25=.;if time le 25 then i_25=0; if time gt 25 then i_25=1;

i_30=.;if time le 30 then i_30=0; if time gt 30 then i_30=1;

i_35=.;if time le 35 then i_35=0; if time gt 35 then i_35=1;

i_40=.;if time le 40 then i_40=0; if time gt 40 then i_40=1;

run;

proc sort data=covid_window;by SampleID time;run;

proc nlmixed data=covid_window gconv=0;

parms b1=2000 b11=2916 b2=0.43 b21=-0.35 b24=0 b23=0 b22=0 b3=20 b31=18;

by SampleID;

gamma=b2+(b21*i_25)+(b22*i_30)+(b23*i_35)+(b24*i_40);

eta=b3+(b31*i_25);

alpha=b1+(b11*i_25);

mu = alpha/(1+exp(-gamma*(time-eta)));

model total_cases ~ poisson(mu);

predict mu out=outputBS_log3pl;

run;

proc sort data=outputBS_log3pl;by time sampleID;run;

proc univariate data=outputBS_log3pl noprint;

var pred;

by time;

output out=Pctls pctlpts = 2.5 97.5

pctlpre = mu

pctlname = lb_2_5 ub_97_5;

run;

Rick_SAS · Posted 04-27-2020 11:00 AM

Can you post a few observations from the EXPLANATORY data?

T_Reddy · Posted 04-27-2020 02:55 PM

time i_25 i_30 i_35 i_40

1	0	0	0	0
2	0	0	0	0
3	0	0	0	0
4	0	0	0	0
5	0	0	0	0
6	0	0	0	0
7	0	0	0	0
8	0	0	0	0
9	0	0	0	0
10	0	0	0	0
11	0	0	0	0
12	0	0	0	0
13	0	0	0	0
14	0	0	0	0
15	0	0	0	0
16	0	0	0	0
17	0	0	0	0
18	0	0	0	0
19	0	0	0	0
20	0	0	0	0
21	0	0	0	0
22	0	0	0	0
23	0	0	0	0
24	0	0	0	0
25	0	0	0	0
26	1	0	0	0
27	1	0	0	0
28	1	0	0	0
29	1	0	0	0
30	1	0	0	0
31	1	1	0	0
32	1	1	0	0
33	1	1	0	0
34	1	1	0	0
35	1	1	0	0
36	1	1	1	0
37	1	1	1	0
38	1	1	1	0
39	1	1	1	0
40	1	1	1	0
41	1	1	1	1
42	1	1	1	1
43	1	1	1	1
44	1	1	1	1
45	1	1	1	1
46	1	1	1	1
47	1	1	1	1
48	1	1	1	1

Rick_SAS · Posted 04-27-2020 03:10 PM

According to my computations, the simulation is working fine. The parameter estimates are close to the parameter values and the confidence intervals encompass the parameter values. If you are seeing something strange, please describe it in detail for NUmSamples=3.

/* Simulate multiple samples from a regression model */
data explanatory;
input time i_25 i_30 i_35 i_40;
datalines;
1	0	0	0	0
2	0	0	0	0
3	0	0	0	0
4	0	0	0	0
5	0	0	0	0
6	0	0	0	0
7	0	0	0	0
8	0	0	0	0
9	0	0	0	0
10	0	0	0	0
11	0	0	0	0
12	0	0	0	0
13	0	0	0	0
14	0	0	0	0
15	0	0	0	0
16	0	0	0	0
17	0	0	0	0
18	0	0	0	0
19	0	0	0	0
20	0	0	0	0
21	0	0	0	0
22	0	0	0	0
23	0	0	0	0
24	0	0	0	0
25	0	0	0	0
26	1	0	0	0
27	1	0	0	0
28	1	0	0	0
29	1	0	0	0
30	1	0	0	0
31	1	1	0	0
32	1	1	0	0
33	1	1	0	0
34	1	1	0	0
35	1	1	0	0
36	1	1	1	0
37	1	1	1	0
38	1	1	1	0
39	1	1	1	0
40	1	1	1	0
41	1	1	1	1
42	1	1	1	1
43	1	1	1	1
44	1	1	1	1
45	1	1	1	1
46	1	1	1	1
47	1	1	1	1
48	1	1	1	1
;

%let NumSamples = 3; /* number of samples */

data RegSim;
call streaminit(123);
set Explanatory; /* implicit loop over obs*/
ObsNum = _N_; /* observation number */
b1=2037.03 ;
b11=2555.72;
b2=0.3301;
b21=-0.2659;
b24=0.07861;
b23=0.001893;
b22=0.003788;
b3=22.8728;
b31=17.8590;
gamma=b2+(b21*i_25)+(b22*i_30)+(b23*i_35)+(b24*i_40);
eta=b3+(b31*i_25);
alpha=b1+(b11*i_25);
mu = alpha/(1+exp(-gamma*(time-eta)));
lambda = (gamma*mu)*(1-(mu/alpha));
do SampleID = 1 to &NumSamples;
   Y = rand("Poisson",lambda);
   output;
end;
drop b: gamma eta alpha mu lambda;
run;

proc sort data=RegSim; /* sort for BY-group processing */
by SampleID ObsNum;
run;

proc sgplot data=Regsim;
series x=Time y=Y / group=SampleID;
run;
title;

/********* Now generate cumulative************/
data RegSimCum;set RegSim;by SampleID;
if first.SampleID then do;
   total_cases = 0;
end;
total_cases + Y;
run;

proc nlmixed data=RegSimCum gconv=0;
by SampleID;
   parms b1=2000 b11=2916 b2=0.43 b21=-0.35 b24=0 b23=0 b22=0 b3=20 b31=18;
   gamma=b2+(b21*i_25)+(b22*i_30)+(b23*i_35)+(b24*i_40);
   eta=b3+(b31*i_25);
   alpha=b1+(b11*i_25);
   mu = alpha/(1+exp(-gamma*(time-eta)));
   model total_cases ~ poisson(mu);
   predict mu out=outputBS_log3pl;
   ODS SELECT  ParameterEstimates;
run;

T_Reddy · Posted 04-28-2020 07:01 AM

Thank you for the speedy response. The issue is that the confidence interval estimated does not contain the value of the actual point estimate (for example the prediction for day 53 is close to 3900 but both the lower and upper bound of the CI are above 4000). It could signal an issue with the algorithm possibly

Rick_SAS · Posted 04-28-2020 07:11 AM

There are MANY estimates in this problem, and I do not know whether you are talking about regression estimates, predicted values (in some data set?), or something else. Please provide a complete example that uses SAS code we can run.

SteveDenham · Posted 04-28-2020 09:33 AM

I think the problem may lie in the area of the data with which the model is being fit. The model is fit to data up to day 48, and then is used in forecasting mode from day 49 to day 100. To have one data point outside the 95% bounds calculated by bootstrapping doesn't sound so bad (assuming the true value at day 53 is what is being compared to the forecast interval at day 53). I would like to think that the parameter estimates would differ considerably if you fit out to day 60 for instance. The data point that seems incorrectly forecast would likely have a large leverage, and cause a shift in the estimates, with a resulting difference in the forecast values and the confidence bounds compared to the original model.

Like @Rick_SAS says, please post data and code.

SteveDenham

T_Reddy · Posted 05-07-2020 03:27 PM

Thank you for your response. I have been examining the predictions and it looks like the issue is how I have programmed the piecewise model (splines). I use a separate dummy variable for each period (i_25, i_30, I_35 etc) and this results in a discontinuity, I would need to reparameterize, Is there any simple way (like the effect statement) in Proc NLIN or NLMIXED?

Rick_SAS · Posted 05-07-2020 03:57 PM

There is no need to generate the spline bases yourself. You can get SAS to generate it for you. See the article "Visualize a regression with splines," which shows how to use the OUTDESIGN= option in PROC GLMSELECT to generate the spline basis.

T_Reddy · Posted 05-08-2020 06:38 AM

This is indeed extremely useful! My model is nonlinear (mu = alpha/(1+exp(-gamma*(time-eta))). It does not appear that any of the procedures which support the effect statement will be able to accomodate our modeling approach.

Rick_SAS · Posted 05-08-2020 06:42 AM

I'm not sure what you are saying. After you generate the spline basis, you can use those variables in any procedure.

1	0	0	0	0
2	0	0	0	0
3	0	0	0	0
4	0	0	0	0
5	0	0	0	0
6	0	0	0	0
7	0	0	0	0
8	0	0	0	0
9	0	0	0	0
10	0	0	0	0
11	0	0	0	0
12	0	0	0	0
13	0	0	0	0
14	0	0	0	0
15	0	0	0	0
16	0	0	0	0
17	0	0	0	0
18	0	0	0	0
19	0	0	0	0
20	0	0	0	0
21	0	0	0	0
22	0	0	0	0
23	0	0	0	0
24	0	0	0	0
25	0	0	0	0
26	1	0	0	0
27	1	0	0	0
28	1	0	0	0
29	1	0	0	0
30	1	0	0	0
31	1	1	0	0
32	1	1	0	0
33	1	1	0	0
34	1	1	0	0
35	1	1	0	0
36	1	1	1	0
37	1	1	1	0
38	1	1	1	0
39	1	1	1	0
40	1	1	1	0
41	1	1	1	1
42	1	1	1	1
43	1	1	1	1
44	1	1	1	1
45	1	1	1	1
46	1	1	1	1
47	1	1	1	1
48	1	1	1	1

1	0	0	0	0
2	0	0	0	0
3	0	0	0	0
4	0	0	0	0
5	0	0	0	0
6	0	0	0	0
7	0	0	0	0
8	0	0	0	0
9	0	0	0	0
10	0	0	0	0
11	0	0	0	0
12	0	0	0	0
13	0	0	0	0
14	0	0	0	0
15	0	0	0	0
16	0	0	0	0
17	0	0	0	0
18	0	0	0	0
19	0	0	0	0
20	0	0	0	0
21	0	0	0	0
22	0	0	0	0
23	0	0	0	0
24	0	0	0	0
25	0	0	0	0
26	1	0	0	0
27	1	0	0	0
28	1	0	0	0
29	1	0	0	0
30	1	0	0	0
31	1	1	0	0
32	1	1	0	0
33	1	1	0	0
34	1	1	0	0
35	1	1	0	0
36	1	1	1	0
37	1	1	1	0
38	1	1	1	0
39	1	1	1	0
40	1	1	1	0
41	1	1	1	1
42	1	1	1	1
43	1	1	1	1
44	1	1	1	1
45	1	1	1	1
46	1	1	1	1
47	1	1	1	1
48	1	1	1	1

SAS Innovate 2025: Call for Content

1	0	0	0	0
2	0	0	0	0
3	0	0	0	0
4	0	0	0	0
5	0	0	0	0
6	0	0	0	0
7	0	0	0	0
8	0	0	0	0
9	0	0	0	0
10	0	0	0	0
11	0	0	0	0
12	0	0	0	0
13	0	0	0	0
14	0	0	0	0
15	0	0	0	0
16	0	0	0	0
17	0	0	0	0
18	0	0	0	0
19	0	0	0	0
20	0	0	0	0
21	0	0	0	0
22	0	0	0	0
23	0	0	0	0
24	0	0	0	0
25	0	0	0	0
26	1	0	0	0
27	1	0	0	0
28	1	0	0	0
29	1	0	0	0
30	1	0	0	0
31	1	1	0	0
32	1	1	0	0
33	1	1	0	0
34	1	1	0	0
35	1	1	0	0
36	1	1	1	0
37	1	1	1	0
38	1	1	1	0
39	1	1	1	0
40	1	1	1	0
41	1	1	1	1
42	1	1	1	1
43	1	1	1	1
44	1	1	1	1
45	1	1	1	1
46	1	1	1	1
47	1	1	1	1
48	1	1	1	1