Box Plots and Broken Axes- dynamically determining where to break

Quentin · Posted 03-03-2017 11:05 AM

Just getting my hands on 9.4M3, and I'm anxious to play with the broken axis feature. This is mainly trolling for advice.

Say I have a stored process that makes a box-and-whisker plot. And the plot looks fine for months. Then one day, there is a group with an extreme outlier, and the y axis is now scaled to fit the outlier and all the boxes are now shrunk to being unreadable. So the users call me and say "the chart broke." This is a setting where I think I'd like to employ a broken axis, so that I can have the boxes look reasonable, and still show the extreme outlier. And I want to dynamically determine the location of the break. Curious if people have approaches they like for choosing when and where to break an axis.

I'm imaging something like:

Compute max(upper fence), and max(inter-quarterile range) for the chart, (i.e. max across groups). I never want an axis break within any box or whisker.
Compute a provisional break point as max(upper fence) + k * max(inter-quartile range). Where k is maybe 1 or 2 or....
If there are any outlier values greater than the provisional break point, have the range for the main part of the graph go from min to the max value within the provisional break point, then range after the break point go from the min value greater than the provisional break point to max.
Repeat 1-3 for a possible lower break point as well. A chart could end up with no break points, one break point, or two break points.

This is mostly a thought exercise at this point. Since I'm not providing sample data/code, not expecting anybody to code up something for me.

But just looking for thoughts on how people have approached the idea of dynamically determining where to break an axis, particularly in the box plot setting.

Thanks,

--Q.

Rick_SAS · Posted 03-03-2017 11:19 AM

Your ideas sound reasonable. I think max(median +/- k*IQR) would be an effective range, and I'd try k=10 for starters. My intuition is that a graph should have 0 or 1 breaks. I wouldn't be fond of multiple breaks, although if your data can have upper AND lower outliers, 1 break in the positive and 1 break in the negative direction would probably be fine.

DanH_sas · Posted 03-03-2017 11:25 AM

Sanjay told me he attended a paper at PharmaSug in 2016 that dealt with finding optimum locations for axis breaks. We were able to find the paper online:

http://www.pharmasug.org/proceedings/2016/QT/PharmaSUG-2016-QT12.pdf

Hope this helps!

Dan

Box Plots and Broken Axes- dynamically determining where to break

Re: Box Plots and Broken Axes- dynamically determining where to break

Re: Box Plots and Broken Axes- dynamically determining where to break

Box Plots and Broken Axes- dynamically determining where to break

Re: Box Plots and Broken Axes- dynamically determining where to break

Re: Box Plots and Broken Axes- dynamically determining where to break

Registration is open

SAS Training: Just a Click Away