BookmarkSubscribeRSS Feed
Quentin
PROC Star

Just getting my hands on 9.4M3, and I'm anxious to play with the broken axis feature.  This is mainly trolling for advice.

 

Say I have a stored process that makes a box-and-whisker plot.  And the plot looks fine for months.  Then one day, there is a group with an extreme outlier, and the y axis is now scaled to fit the outlier and all the boxes are now shrunk to being unreadable.  So the users call me and say "the chart broke."  This is a setting where I think I'd like to employ a broken axis, so that I can have the boxes look reasonable, and still show the extreme outlier.  And I want to dynamically determine the location of the break.  Curious if people have approaches they like for choosing when and where to break an axis.

 

I'm imaging something like:

  1. Compute max(upper fence), and max(inter-quarterile range) for the chart, (i.e. max across groups).  I never want an axis break within any box or whisker.
  2. Compute a provisional break point as max(upper fence) + k * max(inter-quartile range).  Where k is maybe 1 or 2 or....
  3. If there are any outlier values greater than the provisional break point, have the range for the main part of the graph go from min to the max value within the provisional break point, then range after the break point go from the min value greater than the provisional break point to max.
  4. Repeat 1-3 for a possible lower break point as well.  A chart could end up with no break points, one break point, or two break points.

This is mostly a thought exercise at this point. Since I'm not providing sample data/code, not expecting anybody to code up something for me.

 

But just looking for thoughts on how people have approached the idea of dynamically determining where to break an axis, particularly in the box plot setting.

 

Thanks,

--Q.

Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.
2 REPLIES 2
Rick_SAS
SAS Super FREQ

Your ideas sound reasonable. I think max(median +/- k*IQR) would be an effective range, and I'd try k=10 for starters.  My intuition is that a graph should have 0 or 1 breaks. I wouldn't be fond of multiple breaks, although if your data can have upper AND lower outliers, 1 break in the positive and 1 break in the negative direction would probably be fine.

DanH_sas
SAS Super FREQ

Sanjay told me he attended a paper at PharmaSug in 2016 that dealt with finding optimum locations for axis breaks. We were able to find the paper online:

 

http://www.pharmasug.org/proceedings/2016/QT/PharmaSUG-2016-QT12.pdf

 

Hope this helps!

Dan

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1267 views
  • 3 likes
  • 3 in conversation