Programming the statistical procedures from SAS

Interpreting boxplots

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 19
Accepted Solution

Interpreting boxplots

The issue is with the results in Figure 24.3.4 as part of Example 24.3 Creating Various Styles of Box-and-Whisker plots in the SAS/Stat user's manual.

 

The problem is with the whiskers in the figure.

The text states that the interquartile range is the difference between the 25th and 75 quartile (the height of the box). The whiskers are 1.5 times the interquartile range beyond the edges of the box. Yet in figure 24.3.4 there is no way that the lower whisker for "16Dec88" is 1.5 times the box height past the 25th percentile.

 

Something doesn't add up. The problem is that using the SAS code from the example on my data gives me a figure that seems to have this same problem, and I don't know how to explain the output.


Accepted Solutions
Solution
‎04-06-2017 04:56 PM
Grand Advisor
Posts: 10,075

Re: Interpreting boxplots

Regardless of the options the requested whisker will stop at an actual value in the dataset whether at the fence (1.5* IQR which is 75th-25th percentile not median-25th, above/below the 75th/25th percentile) or the far fence (3*IQR) or the extreme values depending on requested range for the whiskers

 

Here's an example that does not draw an upper whisker at all:

data example;
   do i=1 to 100;
      group=1;
      x = round(i,50);
      output;
   end;
run;

proc boxplot data=example;
   plot  x*group;
run;

why not? The median is 50, Q1=50, Q3=100. So there are no values greater than Q3 so the upper whisker is suppressed. Adding any of the options to extend the whiskers will not draw one because there are no values outside of the "box" to display.

 

View solution in original post


All Replies
Grand Advisor
Posts: 10,075

Re: Interpreting boxplots

[ Edited ]

You should mention which document you are referencing. Box plots have been around a long time and many different versions of SAS may have the same figure number but without the same content.

Note that BOXPLOT procedure in 9.2 help, Figure 24.6 shows the whiskers extend to the largest/smallest OBSERVATION value that actually occurs within that range.

If your data has enough observations of the right ranges you can construct a box and whisker that has no apparent whisker be cause the the maximum value within the 1.5 times range also happens to be within the IQR. Which tells you a lot about the distribution of values: ie they don't spread much in that direction.

Occasional Contributor
Posts: 19

Re: Interpreting boxplots

This is from the SAS website:

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_boxplot_sect...

(last accessed 04-06-2017)

 

The whiskers are not always the range. It depends on the options used in BoxPlot.

 

Here is the SAS code from the example:

   title 'Analysis of Airline Departure Delays';
   title2 'BOXSTYLE=SCHEMATICIDFAR';
   proc boxplot data=Times;
      plot Delay*Day /
         boxstyle = schematicidfar
         nohlabel;
      id Reason;
      label Delay = 'Delay in Minutes';
   run;

 

I attached a copy of the figure. I added a pair of red lines that are 1.5 times the distance between the 25th and 75th percentile. I also added a pair of black lines on a neighboring box that are 1.5 times the distance from the median to the 25th percentile. One line is this distance from the 25th percentile, the other line is this distance from the median. None of the lines match up with the whiskers. It is obvious that these whiskers are not the range because there are outliers that are plotted that lie outside the whiskers.

Solution
‎04-06-2017 04:56 PM
Grand Advisor
Posts: 10,075

Re: Interpreting boxplots

Regardless of the options the requested whisker will stop at an actual value in the dataset whether at the fence (1.5* IQR which is 75th-25th percentile not median-25th, above/below the 75th/25th percentile) or the far fence (3*IQR) or the extreme values depending on requested range for the whiskers

 

Here's an example that does not draw an upper whisker at all:

data example;
   do i=1 to 100;
      group=1;
      x = round(i,50);
      output;
   end;
run;

proc boxplot data=example;
   plot  x*group;
run;

why not? The median is 50, Q1=50, Q3=100. So there are no values greater than Q3 so the upper whisker is suppressed. Adding any of the options to extend the whiskers will not draw one because there are no values outside of the "box" to display.

 

Occasional Contributor
Posts: 19

Re: Interpreting boxplots

So in my case the correct description for the whisker would be as follows.

 

The upper whisker is the largest data value that does not exceed 1.5 times the interquartile range above the 75th percentile. The lower whisker is the smallest data value that is not less than 1.5 times the interquartile range below the 25th percentile.

 

Is this correct?

Grand Advisor
Posts: 10,075

Re: Interpreting boxplots


tebert wrote:

So in my case the correct description for the whisker would be as follows.

 

The upper whisker is the largest data value that does not exceed 1.5 times the interquartile range above the 75th percentile. The lower whisker is the smallest data value that is not less than 1.5 times the interquartile range below the 25th percentile.

 

Is this correct?


By George, I think he's go it!

 

Pardon a little showtune quote. Yes for the default whiskers.

 

Occasional Contributor
Posts: 19

Re: Interpreting boxplots

The rain in Spain is on average deposited on the plain subject to some standard deviation and possible outliers. Sigh, it is just not catchy. Maybe needs a slight rewrite. That is probably why my day job is not writing lyrics.  Thank you for your help.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 100 views
  • 1 like
  • 2 in conversation