Solved: Re: Use a funnel plot to visualize rates

zimcom · Posted 08-01-2021 09:46 AM

Greeting!

I am learning the program code by Rick Wicklin and trying to use my own data to draw the funnel plot, which I think would be very useful to monitor the Site performance both by safety and efficacy.

https://blogs.sas.com/content/iml/2020/04/20/funnel-plot-covid-nc.html

I would like to thank Rick one more time for posting so many insightful and useful SAS graphics example, from which I benefit a lot already.

below is the program I changed based on Rick's example, which I ran successfully, but for some reason, my slightly updated code is not working:

Can someone take copy and paste the code in below and run, and let me know where is the issue.

THANK YOU SO MUCH!

/**************************************/

data Mortality;
length Sitename Label $27 ;
input Sitename Cases Deaths Proportion Label;
datalines;
Site1 20 2 0.1 Site1
Site2 12 3 0.25 Site2
Site3 2 . . Site3
Site4 7 1 0.14286 Site4
Site5 4 1 0.25 Site5
Site6 2 . . Site6
Site7 2 1 0.5 Site7
Site8 15 4 0.26667 Site8
Site9 6 4 0.66667 Site9
;
proc means data=Mortality P10 P25 P50 P75 P90 P95 P99 Max ndec=0;
var Cases;
run;
/*******************************/
%let DSName = Mortality;
%let Events = Deaths;
%let Trials = Cases;

/* compute overal rate in ESCAPE NEXT */
proc sql ;
select sum(&Events) / sum(&Trials) format=best6. into :AvgProp
from &DSName;
quit;
%put &=&AvgProp;

proc print data=Mortality ;
where Cases >= 2;
run;
/* Isolate the computation of the the control limits for the funnel plot
so it can be reused in multiple programs.
*/
proc iml;
/* Funnel plot for proportion:
https://blogs.sas.com/content/iml/2011/11/23/funnel-plots-for-proportions.html
Compute binomial limits as a function of the nunber of trials.
Adjust the binomial quantiles according to method in Appendix A.1.1
of Spiegelhalter (2005) p. 1197.
These limits do not depend on the data, although we assume that theta
(often the average proportion from the data) is the probability parameter.
Parameters:
theta (scalar): empirical probability of event = sum(Events) / sum(Trials)
nTrials (column vector): sequence of trials at which to evaluate the binomial limits.
For example, nTrials = T(1:100)
prob (row vector): probabilities at which to compute quantiles.
For example, prob = {0.025 0.975} for 95% (~ 2 StdDev) limits
prob = {0.001 0.025 0.975 0.999} for 2 and 3 StdDev limits
*/
start BinomialFunnel(theta, nTrials, prob);
n = round(colvec(nTrials));
n = n <> 1; /* must be integer > 0 */
p = rowvec(prob);
/* compute columns of matrix, one for each CL */
limits = j(nrow(n), ncol(p), .);
do i = 1 to ncol(p);
r = quantile("binom", p[i], theta, n);
numer = cdf("binom", r , theta, n) - p[i];
denom = cdf("binom", r , theta, n) -
cdf("binom", r-1, theta, n);
alpha = numer / denom;
limits[,i] = (r-alpha)/n;
end;
/* we can't have negative limits */
idx = loc(limits < 0);
if ncol(idx)>0 then limits[idx]=0;

/* return these control limits */
return ( n || limits );
finish;
store module=BinomialFunnel;
QUIT;

/* Compute the funnels for the COVID-19 data */
proc iml;
use Mortality;
read all var "&Events" into Events;
read all var "&Trials" into Trials;
close;

/* compute overall proportion */
theta = sum(Events) / sum(Trials);

/* compute confidence limits for range of sample size */
load module=BinomialFunnel;
/*nTrials = T(do(4,200,10)) // T( do(200, round(max(Trials)+20, 50), 50) );*/

nTrials = T(do(4,200,10)) // T( do(200, round(max(Trials)+20, 50), 50) );
prob = {0.025 0.975};
results = BinomialFunnel(theta, nTrials, prob);
labels = {"N" "L2sd" "U2sd"};
create Limits from results[colname=labels];
append from results;
close;
QUIT;

/* concatenate the data and the limits for the funnel plot */
data FunnelMortality;
set Mortality(keep=SITENAME Cases Deaths Proportion Label)
Limits;
run;

/* create funnel plot with data tips. See
https://blogs.sas.com/content/iml/2011/12/09/creating-tooltips-for-scatter-plots-with-proc-sgplot.ht...
*/
ods graphics / imagemap=ON TIPMAX=5000;
title "ESCAPE NEXT";
title "Fatility Rate: Deaths per Enrolled Subjects";
title2 "Sites with 2 or More Subjects";
footnote J=L "data cut off 15Apr2020";

proc sgplot data=FunnelMortality noautolegend;
where Cases=. OR Cases >= 2;
format Cases comma7.;
band x=N lower=L2sd upper=U2sd /
fill fillattrs=GraphConfidence outline lineattrs=GraphPrediction
transparency=0.5 legendlabel="95% limits" name="band95"
curvelabelloc=outside curvelabelupper="Upper Limit" curvelabellower="Lower Limit";
scatter x=Cases y=Proportion / legendlabel="County Proportion"
datalabel=Label tip=(SITENAME Cases Deaths);
refline &AvgProp / axis=y label="Study Proportion = &AvgProp"
SplitChar="/" name="ref"
lineattrs=GraphData2 label;
xaxis grid ;
yaxis grid Label="Fatality Rate";
run;

ods graphics / imagemap=OFF;

/**********************************************************************************************/

Rick_SAS · Posted 08-01-2021 11:19 AM

The example in the blog post had some counties for which the Trials was small and others for which Trials was as large as 1000. That is the reason for the convoluted definition for the nTrials vector. When the number of trials for each entity (county, school, hospital,....) are similar, you can use a regularly spaced vector of values beyween the minimum number of trials and the maximum number of trials. For your data, use

nTrials = T(2:20);

since Site 7 has N=2 cases and Site 1 has N=20 cases.

View solution in original post

Rick_SAS · Posted 08-01-2021 11:19 AM

The example in the blog post had some counties for which the Trials was small and others for which Trials was as large as 1000. That is the reason for the convoluted definition for the nTrials vector. When the number of trials for each entity (county, school, hospital,....) are similar, you can use a regularly spaced vector of values beyween the minimum number of trials and the maximum number of trials. For your data, use

nTrials = T(2:20);

since Site 7 has N=2 cases and Site 1 has N=20 cases.

zimcom · Posted 08-01-2021 04:01 PM

Hi Rick,

Thank you!

The code for study overall mortality works now. Now I am using the code of mortality to tweak and to make plot for Adverse Events and Serious Adverse Events.

I encountered the error again. The code example pasted in below.

Would you please have a look for me again, please!

Thank you!

/*************************/

data all_ae;
length Sitename Label $27 ;
input Sitename Cases AES Proportion Label;
datalines;
Site1 20 43 2.15 Site1
Site2 12 10 0.83333 Site2
Site3 2 3 1.5 Site3
Site4 7 5 0.71429 Site4
Site5 4 4 1 Site5
Site6 2 3 1.5 Site6
Site7 2 1 0.5 Site7
;

proc means data=all_ae P10 P25 P50 P75 P90 P95 P99 Max ndec=0;
var Cases;
run;
/*******************************/
%let DSName = all_ae;
%let Events = AES;
%let Trials = Cases;

proc sql ;
select sum(&Events) / sum(&Trials) format=best6. into :AvgProp
from &DSName;
quit;
%put &=&AvgProp;

proc print data=all_ae ;
where Cases >= 2;
run;
/* Isolate the computation of the the control limits for the funnel plot
so it can be reused in multiple programs.
*/
proc iml;
/* Funnel plot for proportion:
https://blogs.sas.com/content/iml/2011/11/23/funnel-plots-for-proportions.html
Compute binomial limits as a function of the nunber of trials.
Adjust the binomial quantiles according to method in Appendix A.1.1
of Spiegelhalter (2005) p. 1197.
These limits do not depend on the data, although we assume that theta
(often the average proportion from the data) is the probability parameter.
Parameters:
theta (scalar): empirical probability of event = sum(Events) / sum(Trials)
nTrials (column vector): sequence of trials at which to evaluate the binomial limits.
For example, nTrials = T(1:100)
prob (row vector): probabilities at which to compute quantiles.
For example, prob = {0.025 0.975} for 95% (~ 2 StdDev) limits
prob = {0.001 0.025 0.975 0.999} for 2 and 3 StdDev limits
*/
start BinomialFunnel(theta, nTrials, prob);
n = round(colvec(nTrials));
n = n <> 1; /* must be integer > 0 */
p = rowvec(prob);
/* compute columns of matrix, one for each CL */
limits = j(nrow(n), ncol(p), .);
do i = 1 to ncol(p);
r = quantile("binom", p[i], theta, n);
numer = cdf("binom", r , theta, n) - p[i];
denom = cdf("binom", r , theta, n) -
cdf("binom", r-1, theta, n);
alpha = numer / denom;
limits[,i] = (r-alpha)/n;
end;
/* we can't have negative limits */
idx = loc(limits < 0);
if ncol(idx)>0 then limits[idx]=0;

/* return these control limits */
return ( n || limits );
finish;
store module=BinomialFunnel;
QUIT;

/* Compute the funnels for the COVID-19 data */
proc iml;
use all_ae;
read all var "&Events" into Events;
read all var "&Trials" into Trials;
close;

/* compute overall proportion */
theta = sum(Events) / sum(Trials);

/* compute confidence limits for range of sample size */
load module=BinomialFunnel;
*nTrials = T(do(2,20,10)) // T( do(200, round(max(Trials)+20, 50), 50) );
nTrials = T(2:20);
prob = {0.025 0.975};
results = BinomialFunnel(theta, nTrials, prob);
labels = {"N" "L2sd" "U2sd"};
create Limits from results[colname=labels];
append from results;
close;
QUIT;

/* concatenate the data and the limits for the funnel plot */
data FunnelAE;
set all_ae(keep=SITENAME Cases AES Proportion Label)
Limits;
run;

ods graphics / imagemap=ON TIPMAX=5000;
title "ESCAPE NEXT";
title "Adverse Event Rate: Adverse Event per Enrolled Subjects";
footnote J=L "Data Cut off Date:" " &sysdate9..";

proc sgplot data=FunnelAE noautolegend;
where Cases=. OR Cases >= 2;
format Cases comma7.;
band x=N lower=L2sd upper=U2sd /
fill fillattrs=GraphConfidence outline lineattrs=GraphPrediction
transparency=0.5 legendlabel="95% limits" name="band95"
curvelabelloc=outside curvelabelupper="Upper Limit" curvelabellower="Lower Limit";
scatter x=Cases y=Proportion / legendlabel="Site Proportion"
datalabel=Label tip=(SITENAME Cases AES);
refline &AvgProp / axis=y label="Study Proportion = &AvgProp"
SplitChar="/" name="ref"
lineattrs=GraphData2 label;
xaxis grid ;
yaxis grid Label="Adverse Event Rate";
run;
ods graphics / imagemap=OFF;

Rick_SAS · Posted 08-02-2021 06:18 AM

The funnel plot is for binomial proportions, which are numbers between 0 and 1. In a binomial proportion, the number of events cannot be greater than the number of trials.

Your data has proportions that are greater than 1. For example, your first observation says there are 43 'events' out of 20 'trials'. This data cannot be modeled by a binomial proportion.

zimcom · Posted 08-02-2021 07:28 AM

Thanks for the confirmation!

I understand now and that is also what I wondered, since it is the only difference between the motality and AE, mortality is at subject level and AE is at event level.

But in clinical trial, we need constantly monitoring the study sites over/under reporting AEs and SAEs, to monitor if certain site(s) have more protocol deviations than others, etc., funnel plot would be a great tool to quickly identify the outliers if it can be used.

Thank you again for all your quick responses around this topic.

Have a great day!

Yichuan

Rick_SAS · Posted 08-02-2021 01:36 PM

You can use a funnel plot for the MEAN number of AEs. The confidence intervals are computed differently, but the main ideas are the same:

https://blogs.sas.com/content/iml/2011/04/15/funnel-plots-an-alternative-to-ranking.html

zimcom · Posted 08-03-2021 09:31 AM

Thanks again, I will definitely have a look at this recommended example.
I tweaked the funnel plot exampe to plot AE and SAE. To avoid the situation where the number of events are greater than the number of trials, instead of using the number of adverse events, I used the subject who has at least one AE or SAE as "events" and the overall randomized subject as "trials", the plotted results turn out to be a good visuals as well.

zimcom · Posted 08-03-2021 11:33 AM

Hi Rick,

I work as Data Manager in a biotech start up specilized in Stroke thrombectory related clincal trials.

We collect a lot of timing variables in our clinical trial database, e.g., Stroke onset, Hospital arrival, CT Scan, Informed Consent, Randomization, Drug Infusion Strat, Arterial Access, etc. Some of them could be a few minutes or a few hours apart. Our trial enrolls over a thousand subjects. As ongoing effort, we use those timing intervals to measure the Site performance (time is brain, that is waht we always say), and to monitor how fast the Site moves the subject.

It would be nice to see through a graphic view that how each individual Study Site's timing data, to visulize the how site moving subject from one milestone to another. I have kept looking for SAS graphic ideas. If you happen to have or think of any such example, please let me know.

Thank you very much for all your helps!

Yichuan

Ready to join fellow brilliant minds for the SAS Hackathon?