BookmarkSubscribeRSS Feed
Quentin
Super User

Just getting my hands on 9.4M3, and I'm anxious to play with the broken axis feature.  This is mainly trolling for advice.

 

Say I have a stored process that makes a box-and-whisker plot.  And the plot looks fine for months.  Then one day, there is a group with an extreme outlier, and the y axis is now scaled to fit the outlier and all the boxes are now shrunk to being unreadable.  So the users call me and say "the chart broke."  This is a setting where I think I'd like to employ a broken axis, so that I can have the boxes look reasonable, and still show the extreme outlier.  And I want to dynamically determine the location of the break.  Curious if people have approaches they like for choosing when and where to break an axis.

 

I'm imaging something like:

  1. Compute max(upper fence), and max(inter-quarterile range) for the chart, (i.e. max across groups).  I never want an axis break within any box or whisker.
  2. Compute a provisional break point as max(upper fence) + k * max(inter-quartile range).  Where k is maybe 1 or 2 or....
  3. If there are any outlier values greater than the provisional break point, have the range for the main part of the graph go from min to the max value within the provisional break point, then range after the break point go from the min value greater than the provisional break point to max.
  4. Repeat 1-3 for a possible lower break point as well.  A chart could end up with no break points, one break point, or two break points.

This is mostly a thought exercise at this point. Since I'm not providing sample data/code, not expecting anybody to code up something for me.

 

But just looking for thoughts on how people have approached the idea of dynamically determining where to break an axis, particularly in the box plot setting.

 

Thanks,

--Q.

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
2 REPLIES 2
Rick_SAS
SAS Super FREQ

Your ideas sound reasonable. I think max(median +/- k*IQR) would be an effective range, and I'd try k=10 for starters.  My intuition is that a graph should have 0 or 1 breaks. I wouldn't be fond of multiple breaks, although if your data can have upper AND lower outliers, 1 break in the positive and 1 break in the negative direction would probably be fine.

DanH_sas
SAS Super FREQ

Sanjay told me he attended a paper at PharmaSug in 2016 that dealt with finding optimum locations for axis breaks. We were able to find the paper online:

 

http://www.pharmasug.org/proceedings/2016/QT/PharmaSUG-2016-QT12.pdf

 

Hope this helps!

Dan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1383 views
  • 3 likes
  • 3 in conversation