Hello @john1111,
You received good advice from @RW9 as to how you could simplify your code. It turns out that dataset PRED (the new version without redundant duplicate observations) can be created with this much shorter code:
data pred;
set xxxx2(obs=1 keep=estimate);
do count_y=0 to 80;
if count_y in (5, 12, 20, 30, 45, 50, 60) then do;
probability_FP=0;
do i=-4 to 4;
probability_FP+(5-abs(i))/5*pdf("poisson",count_y+i,exp(estimate));
end;
end;
else probability_FP=pdf("poisson",count_y,exp(estimate));
output;
end;
keep count_y probability_FP;
run;
Note that I do use the SET statement from your first approach (in order to avoid hardcoding the value of ESTIMATE), but enhanced with the appropriate dataset options so that only variable ESTIMATE from one observation of dataset XXXX2 is used.
Now it's fairly obvious why the probabilities are not summing up to 1: For the "exceptional" COUNT_Y values (5, 12, etc.) the probabilities that would have been used otherwise (see the case i=0 in my code!) are substantially increased by eight additional terms (see cases i=-4, ..., -1, 1, ..., 4). For example, for COUNT_Y=50 the calculation amounts to:
data _null_;
estimate=3.99793;
x+0.2*pdf("poisson",46,exp(estimate))
+0.4*pdf("poisson",47,exp(estimate))
+0.6*pdf("poisson",48,exp(estimate))
+0.8*pdf("poisson",49,exp(estimate))
+1.0*pdf("poisson",50,exp(estimate))
+0.8*pdf("poisson",51,exp(estimate))
+0.6*pdf("poisson",52,exp(estimate))
+0.4*pdf("poisson",53,exp(estimate))
+0.2*pdf("poisson",54,exp(estimate));
put x=;
run;
The result is approximately 0.2266, which replaces pdf("poisson",50,exp(3.99793))=0.0465... in the overall sum, which would be cdf('poisson',80,exp(3.99793))=0.999532... without the exceptions.
Look at the bar plot created by the code below to see what happened:
proc sgplot data=pred;
vbar count_y / response=probability_FP;
run;
... View more