Re: Box and whisker chart, how SAS computes box ends.

bmcohen36 · Posted 08-07-2025 10:04 AM

I am creating a box and whisker graph using 4 data points: -53.8, -41.2, -27.0, and -26.5. I would have expected the box would extend from the 2nd value to the 3rd value and the bottom whisker would extend to the minimum value and the upper whisker would extend to the maximum value. Below is the graph I created in SAS where the box extends to the midpoint between the 1st and 2nd value (-47.5) to the midpoint between the 3rd and 4th value (-26.75), but the whiskers extend to the minimum and maximum. Any ideas why SAS is computing 1st and 3rd quartiles in this manner? With only 4 values, I would expect the median would divide the box evenly.

ballardw · Posted 08-07-2025 12:24 PM

First a caveat: I don't use or have access to Visual Analytics

SAS has different approaches for calculating percentiles depending on usage. I do not that Proc Univariate, Proc SGPlot, Proc Boxplot and others use either a PCTLDEF= or PERCENTILE= option with values from 1 to 5 to specify which approach is used. The boxes are drawn from the 25th to 75th percentiles. So which definition is used does impact the appearance of graphs. Similarly the median, 50th percentile,

PaigeMiller · Posted 08-07-2025 02:44 PM

For what its worth, PROC UNIVARIATE produces the same chart. I think that with just 4 data points, the quartile limits are not going to behave in any type of intuitive way. And I don't think your statement "With only 4 values, I would expect the median would divide the box evenly" is a correct statement.

Anyway, the PROC UNIVARIATE documentation explains exactly how the percentiles are computed. https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/procstat/procstat_univariate_details14.htm

I'm guessing your real data has a lot more than 4 data points. Am I correct? If so, does the boxplot on the real data look more intuitively correct?

--
Paige Miller

FreelanceReinh · Posted 08-08-2025 12:19 PM

Hello @bmcohen36,

@bmcohen36 wrote:

... using 4 data points: -53.8, -41.2, -27.0, and -26.5. I would have expected the box would extend from the 2nd value to the 3rd value (...) With only 4 values, I would expect the median would divide the box evenly.

In addition to the five quantile definitions offered by SAS there are (at least) four more available in other common statistical software packages. Of course, they can be implemented in SAS by programming: see Rick Wicklin's blog article "Sample quantiles: A comparison of 9 definitions" and the accompanying PROC IML code.

However, none of those nine definitions matches your expectations, even though defining the first, second and third quartile of your example data as x_0.25=-41.2, x_0.5=(-41.2-27.0)/2 and x_0.75=-27.0, respectively, would satisfy the criterion

at least 100p percent of the sample values are less than or equal to x_p and

at least 100(1-p) percent of the sample values are greater than or equal to x_p

which is sometimes used to characterize sample p-quantiles x_p (0<p<1). Note that, by this characterization, all values in the interval [-53.8, -41.2] qualify as a first quartile and similarly all values in [-41.2, -27.0] as a median and all values in [-27.0, -26.5] as a third quartile. Hence, your definition would pick the upper interval endpoint for the first quartile, the midpoint of the interval for the median and the lower interval endpoint for the third quartile to make the definition unique. The default quantile definition in SAS, however, consistently uses the interval midpoints in these cases.

Box and whisker chart, how SAS computes box ends.