BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tebert
Obsidian | Level 7

The issue is with the results in Figure 24.3.4 as part of Example 24.3 Creating Various Styles of Box-and-Whisker plots in the SAS/Stat user's manual.

 

The problem is with the whiskers in the figure.

The text states that the interquartile range is the difference between the 25th and 75 quartile (the height of the box). The whiskers are 1.5 times the interquartile range beyond the edges of the box. Yet in figure 24.3.4 there is no way that the lower whisker for "16Dec88" is 1.5 times the box height past the 25th percentile.

 

Something doesn't add up. The problem is that using the SAS code from the example on my data gives me a figure that seems to have this same problem, and I don't know how to explain the output.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Regardless of the options the requested whisker will stop at an actual value in the dataset whether at the fence (1.5* IQR which is 75th-25th percentile not median-25th, above/below the 75th/25th percentile) or the far fence (3*IQR) or the extreme values depending on requested range for the whiskers

 

Here's an example that does not draw an upper whisker at all:

data example;
   do i=1 to 100;
      group=1;
      x = round(i,50);
      output;
   end;
run;

proc boxplot data=example;
   plot  x*group;
run;

why not? The median is 50, Q1=50, Q3=100. So there are no values greater than Q3 so the upper whisker is suppressed. Adding any of the options to extend the whiskers will not draw one because there are no values outside of the "box" to display.

 

View solution in original post

6 REPLIES 6
ballardw
Super User

You should mention which document you are referencing. Box plots have been around a long time and many different versions of SAS may have the same figure number but without the same content.

Note that BOXPLOT procedure in 9.2 help, Figure 24.6 shows the whiskers extend to the largest/smallest OBSERVATION value that actually occurs within that range.

If your data has enough observations of the right ranges you can construct a box and whisker that has no apparent whisker be cause the the maximum value within the 1.5 times range also happens to be within the IQR. Which tells you a lot about the distribution of values: ie they don't spread much in that direction.

tebert
Obsidian | Level 7

This is from the SAS website:

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_boxplot_sect...

(last accessed 04-06-2017)

 

The whiskers are not always the range. It depends on the options used in BoxPlot.

 

Here is the SAS code from the example:

   title 'Analysis of Airline Departure Delays';
   title2 'BOXSTYLE=SCHEMATICIDFAR';
   proc boxplot data=Times;
      plot Delay*Day /
         boxstyle = schematicidfar
         nohlabel;
      id Reason;
      label Delay = 'Delay in Minutes';
   run;

 

I attached a copy of the figure. I added a pair of red lines that are 1.5 times the distance between the 25th and 75th percentile. I also added a pair of black lines on a neighboring box that are 1.5 times the distance from the median to the 25th percentile. One line is this distance from the 25th percentile, the other line is this distance from the median. None of the lines match up with the whiskers. It is obvious that these whiskers are not the range because there are outliers that are plotted that lie outside the whiskers.

ballardw
Super User

Regardless of the options the requested whisker will stop at an actual value in the dataset whether at the fence (1.5* IQR which is 75th-25th percentile not median-25th, above/below the 75th/25th percentile) or the far fence (3*IQR) or the extreme values depending on requested range for the whiskers

 

Here's an example that does not draw an upper whisker at all:

data example;
   do i=1 to 100;
      group=1;
      x = round(i,50);
      output;
   end;
run;

proc boxplot data=example;
   plot  x*group;
run;

why not? The median is 50, Q1=50, Q3=100. So there are no values greater than Q3 so the upper whisker is suppressed. Adding any of the options to extend the whiskers will not draw one because there are no values outside of the "box" to display.

 

tebert
Obsidian | Level 7

So in my case the correct description for the whisker would be as follows.

 

The upper whisker is the largest data value that does not exceed 1.5 times the interquartile range above the 75th percentile. The lower whisker is the smallest data value that is not less than 1.5 times the interquartile range below the 25th percentile.

 

Is this correct?

ballardw
Super User

@tebert wrote:

So in my case the correct description for the whisker would be as follows.

 

The upper whisker is the largest data value that does not exceed 1.5 times the interquartile range above the 75th percentile. The lower whisker is the smallest data value that is not less than 1.5 times the interquartile range below the 25th percentile.

 

Is this correct?


By George, I think he's go it!

 

Pardon a little showtune quote. Yes for the default whiskers.

 

tebert
Obsidian | Level 7

The rain in Spain is on average deposited on the plain subject to some standard deviation and possible outliers. Sigh, it is just not catchy. Maybe needs a slight rewrite. That is probably why my day job is not writing lyrics.  Thank you for your help.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 3317 views
  • 1 like
  • 2 in conversation