BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
john1111
Obsidian | Level 7

I have written a procedure that uses a specific estimate to compute Poisson probabilities for values between 0 to 80.

But at specific values (5, 12, 20, 30, 45, 50, 60), the procedure for computing the probability is a little different as in the code. 

 

DATA xxxx2;
  SET final_sim_data;
  estimate=3.99793;
  IF x1~=1 or x2~=1 or x3~=1 then DELETE;
run;

data pred; set xxxx2; array v[11]; array n[11]; array sf[11]; array t[11]; array f[11]; array a[11]; array w[11]; do i = 1 to 11; n[i] = i + 14; end; do i = 1 to 11; sf[i] = i + 44; end; do i = 1 to 11; v[i] = i + 54; end; do i = 1 to 11; t[i] = i + 24; end; do i = 1 to 11; f[i] = i + 39; end; do i = 1 to 11; a[i] = i - 1; end; do i = 1 to 11; w[i] = i + 6; end; do count_y= 0 to 80; if count_y ~= 5 and count_y ~=12 and count_y ~=20 and count_y ~=30 and count_y ~=45 and count_y ~=50 and count_y ~=60 then probability_FP= pdf("poisson",count_y,exp(estimate)); else if count_y= 20 then do; p = 0; do i = 1 to 6; p = p + ((n[i]-15)/5)*(pdf("poisson",n[i],exp(estimate))); end; do i = 7 to 11; p = p + ((25-n[i])/5)*(pdf("poisson",n[i],exp(estimate))); end; probability_FP=p; end; else if count_y= 60 then do; sum = 0; do i = 1 to 6; sum = sum + ((v[i]-55)/5)*(pdf("poisson",v[i],exp(estimate))); end; do i = 7 to 11; sum = sum + ((65-v[i])/5)*(pdf("poisson",v[i],exp(estimate))); end; probability_FP=sum; end; else if count_y= 30 then do; th = 0; do i = 1 to 6; th = th + ((t[i]-25)/5)*(pdf("poisson",t[i],exp(estimate))); end; do i = 7 to 11; th = th + ((35-t[i])/5)*(pdf("poisson",t[i],exp(estimate))); end; probability_FP=th; end; else if count_y= 50 then do; sd = 0; do i = 1 to 6; sd = sd + ((sf[i]-45)/5)*(pdf("poisson",sf[i],exp(estimate))); end; do i = 7 to 11; sd = sd + ((55-sf[i])/5)*(pdf("poisson",sf[i],exp(estimate))); end; probability_FP=sd; end; else if count_y= 12 then do; bb = 0; do i = 1 to 6; bb = bb + ((w[i]-7)/5)*(pdf("poisson",w[i],exp(estimate))); end; do i = 7 to 11; bb = bb + ((17-w[i])/5)*(pdf("poisson",w[i],exp(estimate))); end; probability_FP=bb; end; else if count_y= 5 then do; aa = 0; do i = 1 to 6; aa = aa + ((a[i]-0)/5)*(pdf("poisson",a[i],exp(estimate))); end; do i = 7 to 11; aa = aa + ((10-a[i])/5)*(pdf("poisson",a[i],exp(estimate))); end; probability_FP=aa; end; else if count_y= 45 then do; fr = 0; do i = 1 to 6; fr = fr + ((f[i]-40)/5)*(pdf("poisson",f[i],exp(estimate))); end; do i = 7 to 11; fr = fr + ((50-f[i])/5)*(pdf("poisson",f[i],exp(estimate))); end; probability_FP=fr; end; output; end; keep count_y probability_FP; run; /********************/ proc means data=pred; class count_y; output out=means mean(probability_FP)=; run;

 

 

1. I am sure the code is so manual (I mean it could be shortened) especially because I am averaging the same value in the last proc code. Is there a way I can shorten everything so that I avoid the proc means code?

 

2. After getting the probabilities for count_y = 0,1,2...80. I notice that the probabilities are not summing to 1. What could be the problem, and Is there a way out.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @john1111,

 

You received good advice from @RW9 as to how you could simplify your code. It turns out that dataset PRED (the new version without redundant duplicate observations) can be created with this much shorter code:

data pred; 
set xxxx2(obs=1 keep=estimate);
do count_y=0 to 80;
  if count_y in (5, 12, 20, 30, 45, 50, 60) then do;
    probability_FP=0;
    do i=-4 to 4;
      probability_FP+(5-abs(i))/5*pdf("poisson",count_y+i,exp(estimate));
    end;
  end;
  else probability_FP=pdf("poisson",count_y,exp(estimate));
  output; 
end;
keep count_y probability_FP;
run;

Note that I do use the SET statement from your first approach (in order to avoid hardcoding the value of ESTIMATE), but enhanced with the appropriate dataset options so that only variable ESTIMATE from one observation of dataset XXXX2 is used.

 

Now it's fairly obvious why the probabilities are not summing up to 1: For the "exceptional" COUNT_Y values (5, 12, etc.) the probabilities that would have been used otherwise (see the case i=0 in my code!) are substantially increased by eight additional terms (see cases i=-4, ..., -1, 1, ..., 4). For example, for COUNT_Y=50 the calculation amounts to:

data _null_;
estimate=3.99793;
x+0.2*pdf("poisson",46,exp(estimate))
 +0.4*pdf("poisson",47,exp(estimate))
 +0.6*pdf("poisson",48,exp(estimate))
 +0.8*pdf("poisson",49,exp(estimate))
 +1.0*pdf("poisson",50,exp(estimate))
 +0.8*pdf("poisson",51,exp(estimate))
 +0.6*pdf("poisson",52,exp(estimate))
 +0.4*pdf("poisson",53,exp(estimate))
 +0.2*pdf("poisson",54,exp(estimate));
put x=;
run;

The result is approximately 0.2266, which replaces pdf("poisson",50,exp(3.99793))=0.0465... in the overall sum, which would be cdf('poisson',80,exp(3.99793))=0.999532... without the exceptions.

 

Look at the bar plot created by the code below to see what happened:

proc sgplot data=pred;
vbar count_y / response=probability_FP;
run;

View solution in original post

3 REPLIES 3
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, a brief look at the code, and I can see a lot which can be narrowed down.  Then first set of loops for instance, can be replaced with;

do i = 1 to 11;
  n[i] = i + 14;
  sf[i] = i + 44;
  v[i] = i + 54;
  t[i] = i + 24;
  f[i] = i + 39;
  a[i] = i - 1;
  w[i] = i + 6;
end;

The if could be resolved to:

select(county);
  when (20) do;
    ...;
    end;
  when (60) do;
    ...;
    end;
  otherwise probability_fp=pdf(...);
end;

The do loop blocks within the if statements are repeating.  Put your values in another array, then do the loop once, e.g:

array vals{5,2} (15,25,
                          55,65,
                           ...);        /* First column is lower, second upper */
array p_res{5,2} 8.;         /* Array of results for p based

select(county);
  when (20) do;
    p=0;
    do i=1 to 6;
      p{i,1}=p{i,1}+((n{i}-vals{i,1})/5)*pdf...;
    end;
    do i=1 to 7;
      p{i,1}=p{i,1}+((vals{i,2}-n{i})/5)*pdf...;
    end;
   end;
...

Do note however that a simple change to your data structure will simplify your coding a lot.  Have one row per set of tests, so all v is on one row, all n on another etc.  This will simplify down to be one array, for each observation.  Also note I have nothing to work with in terms of test data in the form of a datastep, or what its supposed to do, so just generaralising.  

I don't know about he stat question.

john1111
Obsidian | Level 7

I just realized that there is no need for setting any data and therefore there is no need for the data;

I have also set a specific estimate value as below;

 

 

data pred; 
  array v[11];
  array n[11];
  array sf[11];
  array t[11];
  array f[11];
  array a[11];
  array w[11];
  do i = 1 to 11;
	n[i] = i + 14;
   end;
  do i = 1 to 11;
	sf[i] = i + 44;
   end;
  do i = 1 to 11;
     v[i] = i + 54;
   end;
  do i = 1 to 11;
     t[i] = i + 24;
   end;
  do i = 1 to 11;
     f[i] = i + 39;
   end;
  do i = 1 to 11;
     a[i] = i - 1;
   end;
  do i = 1 to 11;
     w[i] = i + 6;
   end;
do count_y= 0 to 80;
if  count_y ~= 5 and count_y ~=12 and count_y ~=20 and count_y ~=30 and count_y ~=45 and count_y ~=50 and count_y ~=60 then 
probability_FP= pdf("poisson",count_y,exp(3.99793));
else if count_y= 20 then do;
    p = 0;
     do i = 1 to 6;
      p = p + ((n[i]-15)/5)*(pdf("poisson",n[i],exp(3.99793)));
     end;
     do i = 7 to 11;
      p = p + ((25-n[i])/5)*(pdf("poisson",n[i],exp(3.99793)));
     end;
     probability_FP=p;
    end;
else if count_y= 60 then do;
     sum = 0;
     do i = 1 to 6;
      sum = sum + ((v[i]-55)/5)*(pdf("poisson",v[i],exp(3.99793)));
     end;
     do i = 7 to 11;
      sum = sum + ((65-v[i])/5)*(pdf("poisson",v[i],exp(3.99793)));
     end;
     probability_FP=sum;
  end;
else if count_y= 30 then do;
    th = 0;
     do i = 1 to 6;
      th = th + ((t[i]-25)/5)*(pdf("poisson",t[i],exp(3.99793)));
     end;
     do i = 7 to 11;
      th = th + ((35-t[i])/5)*(pdf("poisson",t[i],exp(3.99793)));
     end;
     probability_FP=th;
    end;
else if count_y= 50 then do;
    sd = 0;
     do i = 1 to 6;
      sd = sd + ((sf[i]-45)/5)*(pdf("poisson",sf[i],exp(3.99793)));
     end;
     do i = 7 to 11;
      sd = sd + ((55-sf[i])/5)*(pdf("poisson",sf[i],exp(3.99793)));
     end;
     probability_FP=sd;
    end;
else if count_y= 12 then do;
    bb = 0;
     do i = 1 to 6;
      bb = bb + ((w[i]-7)/5)*(pdf("poisson",w[i],exp(3.99793)));
     end;
     do i = 7 to 11;
      bb = bb + ((17-w[i])/5)*(pdf("poisson",w[i],exp(3.99793)));
     end;
     probability_FP=bb;
    end;
else if count_y= 5 then do;
    aa = 0;
     do i = 1 to 6;
      aa = aa + ((a[i]-0)/5)*(pdf("poisson",a[i],exp(3.99793)));
     end;
     do i = 7 to 11;
      aa = aa + ((10-a[i])/5)*(pdf("poisson",a[i],exp(3.99793)));
     end;
     probability_FP=aa;
    end;
else if count_y= 45 then do;
    fr = 0;
     do i = 1 to 6;
      fr = fr + ((f[i]-40)/5)*(pdf("poisson",f[i],exp(3.99793)));
     end;
     do i = 7 to 11;
      fr = fr + ((50-f[i])/5)*(pdf("poisson",f[i],exp(3.99793)));
     end;
     probability_FP=fr;
    end;
output; 
end;
keep count_y probability_FP;
run;

So the idea is that I should get probabilities from 0 to 80 but at points (5,12,20,30,45,50,60) the probability is computed as below;

 

 

As an example for 20 (estimate in the code is the theta below);

tdsfggesrt.PNG

 

I applied the same idea to 5,12,30,45,50,60 as shown in the code

 

FreelanceReinh
Jade | Level 19

Hello @john1111,

 

You received good advice from @RW9 as to how you could simplify your code. It turns out that dataset PRED (the new version without redundant duplicate observations) can be created with this much shorter code:

data pred; 
set xxxx2(obs=1 keep=estimate);
do count_y=0 to 80;
  if count_y in (5, 12, 20, 30, 45, 50, 60) then do;
    probability_FP=0;
    do i=-4 to 4;
      probability_FP+(5-abs(i))/5*pdf("poisson",count_y+i,exp(estimate));
    end;
  end;
  else probability_FP=pdf("poisson",count_y,exp(estimate));
  output; 
end;
keep count_y probability_FP;
run;

Note that I do use the SET statement from your first approach (in order to avoid hardcoding the value of ESTIMATE), but enhanced with the appropriate dataset options so that only variable ESTIMATE from one observation of dataset XXXX2 is used.

 

Now it's fairly obvious why the probabilities are not summing up to 1: For the "exceptional" COUNT_Y values (5, 12, etc.) the probabilities that would have been used otherwise (see the case i=0 in my code!) are substantially increased by eight additional terms (see cases i=-4, ..., -1, 1, ..., 4). For example, for COUNT_Y=50 the calculation amounts to:

data _null_;
estimate=3.99793;
x+0.2*pdf("poisson",46,exp(estimate))
 +0.4*pdf("poisson",47,exp(estimate))
 +0.6*pdf("poisson",48,exp(estimate))
 +0.8*pdf("poisson",49,exp(estimate))
 +1.0*pdf("poisson",50,exp(estimate))
 +0.8*pdf("poisson",51,exp(estimate))
 +0.6*pdf("poisson",52,exp(estimate))
 +0.4*pdf("poisson",53,exp(estimate))
 +0.2*pdf("poisson",54,exp(estimate));
put x=;
run;

The result is approximately 0.2266, which replaces pdf("poisson",50,exp(3.99793))=0.0465... in the overall sum, which would be cdf('poisson',80,exp(3.99793))=0.999532... without the exceptions.

 

Look at the bar plot created by the code below to see what happened:

proc sgplot data=pred;
vbar count_y / response=probability_FP;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 736 views
  • 1 like
  • 3 in conversation