Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Graphics
- /
- Funnel Plots - labelling outliers

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-27-2016 12:40 PM
(2276 views)

Hi all,
I've used Rick Wicklin's post on creating funnel plots (http://blogs.sas.com/content/iml/2011/11/23/funnel-plots-for-proportions.html) successfully on my data. The challenge I'm facing now is figuring out how to label the outliers (above and below) based on either the 95% or 99.5% levels. I can apply labels when the threshold is a set number on the X or Y axis, but I can't seem to find anything on how to do it based on the proportion.
Thanks for your time
Chris
Here's an example of what I'm looking to do, taken from https://blog.ouseful.info/2011/10/31/power-tools-for-aspiring-data-journalists-r/. The article does not provide the code or math to accomplish this, rather it's how to build a simplistic version in R (which doesn't help me).

Has my article or post helped? Please mark as Solution or Like the article!

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You need to merge the data and the limits. To help do that, use all integer values for the limits. You need to make changes in three places:

1) In the IML program, there is a definition of 'n' as

n = T( do(minN, maxN, (maxN-minN)/20) );

Replace that line with all possible trials:

n = T(minN:maxN);

2) Merge the data and the computed limits:

```
/* merge data with proportions */
data Adoptions; merge Adoptions Stats; run;
/* merge control limits */
proc sort data=Adoptions; by Trials; run;
data Funnel;
merge Adoptions(rename=(Trials=N)) Limits;
by N;
Label = Authority;
if L3sd <= Proportion <= U3sd then Label=""; /* delete if w/in +/- 3*sigma */
run;
```

3) Change the SCATTER statement to

scatter x=N y=Proportion / datalabel=Label;

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @Rick_SAS

Thanks for the reply - I actually spent quite a bit of time yesterday working thourhg that blog, as I figured it was the easiest thing to do. Where I'm stumped is in the "Funnel" dataset, we have "Authority", "Events", Trials", "Proportion", and "Expected"; then, rows 144 - 164 (in the same dataset) have "N", "L3sd", "L2sd", "U2sd", and "U3sd". If I understand correctly, what I need to do is match the number of events for the individual authority (for example, Halton has n=20) with the closest N (in my case, probably N=26) so that i can get the lower/upper CI of 0.435 and 0.956. Where I'm getting stuck is figuring out how to do this; I tried splitting the tables into 2 and then doing a join in SQL but couldn't get it to work.

Based on the article you linked to, here's my result based on Proportion <=0.5 and Proportion >=0.95. This works fine for the N<=60 or so, but because the Proportions drop (and therefore the CIs get narrower), this does not work for the right side of the graph. What I'd like is all the points above the upper red line and below the lower red line to be labelled.

Thanks again for your help. I'm hoping I have explained this clearly - I'm multi-tasking (trying to do work and this at the same time :-)).

Chris

Has my article or post helped? Please mark as Solution or Like the article!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You need to merge the data and the limits. To help do that, use all integer values for the limits. You need to make changes in three places:

1) In the IML program, there is a definition of 'n' as

n = T( do(minN, maxN, (maxN-minN)/20) );

Replace that line with all possible trials:

n = T(minN:maxN);

2) Merge the data and the computed limits:

```
/* merge data with proportions */
data Adoptions; merge Adoptions Stats; run;
/* merge control limits */
proc sort data=Adoptions; by Trials; run;
data Funnel;
merge Adoptions(rename=(Trials=N)) Limits;
by N;
Label = Authority;
if L3sd <= Proportion <= U3sd then Label=""; /* delete if w/in +/- 3*sigma */
run;
```

3) Change the SCATTER statement to

scatter x=N y=Proportion / datalabel=Label;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

thanks Rick - that was perfect and I actually think I understand what you did 🙂

Have a great evening

Chris

Has my article or post helped? Please mark as Solution or Like the article!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

thanks for the further explanation. definitely one of the cooler analyses I've worked on in a while!

Chris

Has my article or post helped? Please mark as Solution or Like the article!

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.