turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Interpreting boxplots

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-06-2017 11:23 AM

The issue is with the results in Figure 24.3.4 as part of Example 24.3 Creating Various Styles of Box-and-Whisker plots in the SAS/Stat user's manual.

The problem is with the whiskers in the figure.

The text states that the interquartile range is the difference between the 25th and 75 quartile (the height of the box). The whiskers are 1.5 times the interquartile range beyond the edges of the box. Yet in figure 24.3.4 there is no way that the lower whisker for "16Dec88" is 1.5 times the box height past the 25th percentile.

Something doesn't add up. The problem is that using the SAS code from the example on my data gives me a figure that seems to have this same problem, and I don't know how to explain the output.

Accepted Solutions

Solution

04-06-2017
04:56 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to tebert

04-06-2017 01:11 PM

Regardless of the options the requested whisker will **stop at an actual value in the dataset** whether at the fence (1.5* IQR which is 75th-25th percentile not median-25th, above/below the 75th/25th percentile) or the far fence (3*IQR) or the extreme values depending on requested range for the whiskers

Here's an example that does not draw an upper whisker at all:

data example; do i=1 to 100; group=1; x = round(i,50); output; end; run; proc boxplot data=example; plot x*group; run;

why not? The median is 50, Q1=50, Q3=100. So there are no values greater than Q3 so the upper whisker is suppressed. Adding any of the options to extend the whiskers will not draw one because there are no values outside of the "box" to display.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to tebert

04-06-2017 11:52 AM - edited 04-06-2017 11:53 AM

You should mention which document you are referencing. Box plots have been around a long time and many different versions of SAS may have the same figure number but without the same content.

Note that BOXPLOT procedure in 9.2 help, Figure 24.6 shows the whiskers extend to the largest/smallest OBSERVATION value that actually occurs within that range.

If your data has enough observations of the right ranges you can construct a box and whisker that has no apparent whisker be cause the the maximum value within the 1.5 times range also happens to be within the IQR. Which tells you a lot about the distribution of values: ie they don't spread much in that direction.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

04-06-2017 12:16 PM

This is from the SAS website:

(last accessed 04-06-2017)

The whiskers are not always the range. It depends on the options used in BoxPlot.

Here is the SAS code from the example:

title 'Analysis of Airline Departure Delays';

title2 'BOXSTYLE=SCHEMATICIDFAR';

proc boxplot data=Times;

plot Delay*Day /

boxstyle = schematicidfar

nohlabel;

id Reason;

label Delay = 'Delay in Minutes';

run;

I attached a copy of the figure. I added a pair of red lines that are 1.5 times the distance between the 25th and 75th percentile. I also added a pair of black lines on a neighboring box that are 1.5 times the distance from the median to the 25th percentile. One line is this distance from the 25th percentile, the other line is this distance from the median. None of the lines match up with the whiskers. It is obvious that these whiskers are not the range because there are outliers that are plotted that lie outside the whiskers.

Solution

04-06-2017
04:56 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to tebert

04-06-2017 01:11 PM

Regardless of the options the requested whisker will **stop at an actual value in the dataset** whether at the fence (1.5* IQR which is 75th-25th percentile not median-25th, above/below the 75th/25th percentile) or the far fence (3*IQR) or the extreme values depending on requested range for the whiskers

Here's an example that does not draw an upper whisker at all:

data example; do i=1 to 100; group=1; x = round(i,50); output; end; run; proc boxplot data=example; plot x*group; run;

why not? The median is 50, Q1=50, Q3=100. So there are no values greater than Q3 so the upper whisker is suppressed. Adding any of the options to extend the whiskers will not draw one because there are no values outside of the "box" to display.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

04-06-2017 01:27 PM

So in my case the correct description for the whisker would be as follows.

The upper whisker is the largest data value that does not exceed 1.5 times the interquartile range above the 75th percentile. The lower whisker is the smallest data value that is not less than 1.5 times the interquartile range below the 25th percentile.

Is this correct?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to tebert

04-06-2017 04:43 PM

tebert wrote:

So in my case the correct description for the whisker would be as follows.

The upper whisker is the largest data value that does not exceed 1.5 times the interquartile range above the 75th percentile. The lower whisker is the smallest data value that is not less than 1.5 times the interquartile range below the 25th percentile.

Is this correct?

By George, I think he's go it!

Pardon a little showtune quote. Yes for the default whiskers.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

04-06-2017 04:55 PM

The rain in Spain is on average deposited on the plain subject to some standard deviation and possible outliers. Sigh, it is just not catchy. Maybe needs a slight rewrite. That is probably why my day job is not writing lyrics. Thank you for your help.