turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS/GRAPH and ODS Graphics
- /
- GTL Boxplot axis scaled ignoring outliers

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-28-2015 11:24 AM

Hi,

I'm using 9.3 GTL to make a box plot that does NOT display outliers. Is there a way to have the axis scaling algorithm also ignore outliers? I want to include outliers in calculation of the mean and percentiles, I just don't want to display them, and I don't want outliers to cause an extremely long axis.

The docs note that the DISPLAY option I used below does not impact the axis at all. Only approaches I've come up with would be to compute the summary statistics myself and then use BOXPLOTPARM, or to come up with my own algorithm for scaling the y-axis.

Sample code showing y-axis scaled including one outlier, smushing the boxes:

ods path (prepend) work.mytpl; proc template; define statgraph MyBoxPlot; begingraph; layout overlay; boxplot x=country y=actual / display=(caps fill mean median /*outliers*/) ; endlayout;

endgraph; end; run; data prdsale; set sashelp.prdsale; if _n_=1 then actual=3000; run; proc sgrender data=prdsale template="MyBoxPlot"; run;

Thanks,

--Q.

Accepted Solutions

Solution

10-29-2015
01:54 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to djrisks

10-29-2015 01:03 PM - edited 10-29-2015 01:32 PM

Here is one way, though it will need a bit of coding.

Run SGPLOT to create the regular box plot of your data with categories. User ODS OUTPUT SGPLOT=box; statement to get the box plot data in the output data set "Box". This data is in the form suitable to draw the box plot and has a variable called "box_mpg_city_x_origin__st" that is the statistic and "box_mpg_city_x_origin___y" that is the Y value of the statistic. MIN and MAX provide the values for the whiskers.

Extract this data, and place the global min and max values in macro variables. Then rerun the same SGPLOT again, and now specify the NOOUTLIERS option and set YAXIS MIN=&min and MAX=&Max.

ods output sgplot=box(rename=(box_mpg_city_x_origin__st=stat box_mpg_city_x_origin___y=value));

ods graphics / reset width=6in height=4in imagename='Box_With_Outliers';

proc sgplot data=sashelp.cars;

vbox mpg_city / category=origin;

run;

data box2;

retain min 1e6 max -1e6;

keep value stat;

set box (where=(stat in ('MIN', 'MAX'))) end=last;

if stat = 'MIN' then min=min(min, value);

if stat = 'MAX' then max=max(max, value);

if last then do;

call symput ("MIN", min);

call symput ("MAX", max);

end;

run;

ods graphics / reset width=6in height=4in imagename='Box_Without_Outliers';

proc sgplot data=sashelp.cars;

vbox mpg_city / category=origin nooutliers;

yaxis min=&min max=&max;

run;

**I do have some concern with this.** Not being a Statistician, I don't know if this will provide an incorrect presentation to a reader. At least the Y axis indicates presence of outliers, even if they are suppressed. Maybe you can just make them more transparent.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Quentin

10-28-2015 12:27 PM

Hi Quentin,

Have you tried the outlierattrs option? You could try outlierattrs=(size=0) and see if that gets rid of the outliers. I usually use a similar option to not display markers.

Thanks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to djrisks

10-28-2015 12:40 PM

Thanks @djrisks.

OUTLIERATTRS(size=0) works to hide the outliers, achieving the same as the DISPLAY option in my posted code. But the axis is still scaled to include outliers. The outlier value of 3000 forces the y-axis to go up to 3000, instead of the desired ~1000.

-Q.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Quentin

10-28-2015 12:52 PM

Oh, I understand now @Quentin

The solution I can think of at the moment is not elegant, it involves calculating the maximum and minimum values of the dataset without the outliers and then setting those minimum and maximum values as dynamic or numeric macro variables, and then using those min and max values in the yaxis options.

Hopefully, there is a simpler solution out there though.

Thanks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Quentin

10-28-2015 02:25 PM

If you know a maximum value you want to set as an upper bound for the box plot you could make that a dynamic parameter of your GTL code

define statgraph MyBoxPlot;

dynamic ymax;

begingraph;

and use the VIEWMAX = YMAX in a Yaxisopts statement.

proc sgrender data=prdsale template="MyBoxPlot";

ymax=1000;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

10-29-2015 12:10 PM

Thanks @ballardw. The bummer is I don't know the maximum value in advance. (Writing a box plot stored process that will be used for various stuff). So if I need to monkey with viewmax or the axis, I would need to compute the upper/lower whisker for each group, and then find the max/min whisker in the chart, and set viewmax/viewmin after that. Which is doable, but was hoping there would be something automagical.

I haven't used GPLOT in years, but my memory is you had to do something extra to ask for outliers to be displayed (which I didn't like), but if you didn't ask for outliers, the axis was scaled to fit the box-and-whiskers (whcih I did like).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Quentin

10-29-2015 12:21 PM

You may have to go down the route of calculating the values yourself. If you did decide to do that, then once you've found the min/max whisker in the chart, you can set those up as macro variables and then base viewmin and viewmax on those values. That way the code will be reusable.

Solution

10-29-2015
01:54 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to djrisks

10-29-2015 01:03 PM - edited 10-29-2015 01:32 PM

Here is one way, though it will need a bit of coding.

Run SGPLOT to create the regular box plot of your data with categories. User ODS OUTPUT SGPLOT=box; statement to get the box plot data in the output data set "Box". This data is in the form suitable to draw the box plot and has a variable called "box_mpg_city_x_origin__st" that is the statistic and "box_mpg_city_x_origin___y" that is the Y value of the statistic. MIN and MAX provide the values for the whiskers.

Extract this data, and place the global min and max values in macro variables. Then rerun the same SGPLOT again, and now specify the NOOUTLIERS option and set YAXIS MIN=&min and MAX=&Max.

ods output sgplot=box(rename=(box_mpg_city_x_origin__st=stat box_mpg_city_x_origin___y=value));

ods graphics / reset width=6in height=4in imagename='Box_With_Outliers';

proc sgplot data=sashelp.cars;

vbox mpg_city / category=origin;

run;

data box2;

retain min 1e6 max -1e6;

keep value stat;

set box (where=(stat in ('MIN', 'MAX'))) end=last;

if stat = 'MIN' then min=min(min, value);

if stat = 'MAX' then max=max(max, value);

if last then do;

call symput ("MIN", min);

call symput ("MAX", max);

end;

run;

ods graphics / reset width=6in height=4in imagename='Box_Without_Outliers';

proc sgplot data=sashelp.cars;

vbox mpg_city / category=origin nooutliers;

yaxis min=&min max=&max;

run;

**I do have some concern with this.** Not being a Statistician, I don't know if this will provide an incorrect presentation to a reader. At least the Y axis indicates presence of outliers, even if they are suppressed. Maybe you can just make them more transparent.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Sanjay_SAS

10-29-2015 01:51 PM - edited 10-29-2015 01:53 PM

Thanks @Sanjay_SAS, that's a helpful approach. Have to admit, I had never thought to use ODS OUTPUT with SGPLOT. Good to know that it makes data tables available. I can imagine quite a few settings that will come in handy.

That said, my next question (tomorrow or next week), will be about trying to find ways to make SGRENDER run more quickly when generating a boxplot with thousands of data points behind it. As I have it coded currently, the time costs of running an extra SGPLOT step are probably too high (in stored process setting).

The other option I'm considering is to just go ahead and convert to BOXPLOTPARM, using something like %BoxPlotParm (http://support.sas.com/documentation/cdl/en/grstatgraph/65377/HTML/default/viewer.htm#p14r3dprwc36p7... I think if I do that, I could then choose to include/exclude outliers in the data, and then use the default axes. And I think BoxPlotParm might run faster than BOXPLOT, since it is starting with a much-smaller precalculated dataset.

That said, I would vote for changing this in the future if feasible. If I'm making a chart that does not display outliers, I don't see a benefit to including non-displayed outlier values in the axis scaling algorithm.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Quentin

10-29-2015 02:21 PM - edited 10-29-2015 02:37 PM

One variation of the solution is suggested by Prashant and removes the OUTLIER stat values from the data after the first pass. Then, use this modified data directly using BoxPlotParm. However, a VBOXPARM is not available in SGPLOT, you you would have to use the GTL version with BoxPlotParm in the second pass. That could address your concern about the performance for large data. You might as well use the GTL program for the first pass too.

I just tried this method. Make sure to remove all observations with STAT of OUTLIER, FAROUTLIER, DATAMIN, DATAMAX and blank (missing).

We will certainly entertain the possibility of adding an option to the box plot to retain the data extents of only the items being displayed. This should be relatively simple, and will avoid a second pass. If this is of interest to you, you could pass this on to Tech Support as a request for new functionality.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Sanjay_SAS

05-04-2017 10:17 AM

Although it is 18 months after the original question, I would like this feature added to the options for the vbox statement in SGPLOT. I do appreciate the workaround, though.