BookmarkSubscribeRSS Feed
n6
Quartz | Level 8 n6
Quartz | Level 8

I realize that Proc Boxplot isn't to statistic-y but it is in SAS/Stat package so I'll post the question here.  There are two things I can't figure out how to do.

1. The person I'm working with just wants the box with the 25th and 75th percentile at bottom and top.  No lines are wanted and no dots for outlying points are wanted.  I figured out how to get rid of the lines with the WHISKERPERCENTILE option but although the lines go away, the points show up in place of the lines.  I know that there is the CLIPFACTOR option but it uses an algorithm rather than just getting rid of points outside the box entirely.  I've looked pretty intensively and I'm afraid there is simply no way to get rid of the point and have only the box.

2. The other thing is this.  I have five side-by-side boxplots for, say, Groups A, B, C, D and E.  Okay, fine, but then the person wants the same thing except each of those by sex.  But a BY statement puts them on separate plots.  What the users wants is Group A Males next to Group A Females, then Group B Males next to Group B Females, etc.  Again, I'm afraid that this simply isn't possible in SAS.

Any help on either of these two questions is greatly appreciated.

7 REPLIES 7
Reeza
Super User

What version of SAS are you on?

Can you post some code you have as well?

n6
Quartz | Level 8 n6
Quartz | Level 8

Here is the general format.  The "(group)" part is just an attempt to have it put the Male and Female results side by side (which I'm discovering may be easier done in SGPANEL) but it has no effect in terms of the outliers.  If I did not have that WHISKERPERCENTILE bit I'd bet lines out the top and bottom of the box going to the max and min.  I put the WHISKERPERCENTILE and the lines coming out the top and bottom of the box go away but there are still little circles indicating the observations outside the box.

proc boxplot data=dataset;

   plot y * x (group);

   whiskerpercentile = 25;

run;

Reeza
Super User

Looks like you're using a later version than I am. Is there a way to change the colour of the outliers to match the background?

ballardw
Super User

Have you tried the option boxstyle=schematic?

The grouping issue is easy: create a new variable in a data step that combines your group A to E and Sex. So you have "Group A Male" "Group A Female" etc. If you need all 3, "Group A" and the two with gender then you make another group:

Data want;

     set have;

     GroupVar = Catx(' ',"Group", group, sex); output;

     GroupVar = Catx(' ',"Group", group); output;  /* this creates a separate value but you get 2 levels of grouping*/

run;

Rick_SAS
SAS Super FREQ

Use SGPLOT.  Depending on your SAS version you can control the WHISKERPCT.  The side-by-side is easy. See this article: http://blogs.sas.com/content/iml/2012/08/22/categories-vs-groups-in-proc-sgplot.html

data have;

do Group = 'A','B','C','D','E';

   do sex = 'm','f';

      do i = 1 to 10;

         x = rand("Normal");

         output;

      end;

   end;

end;

run;

proc sgplot data=have;

   vbox x / category=Group group=sex whiskerpct=25 nooutliers;

run;

If you don't have the WHISKERPCT option in PROC SGPLOT, it is in the GTL:

BoxPlot X=Group Y=x / Group=sex whiskerpercentile=25 groupdisplay=cluster;

n6
Quartz | Level 8 n6
Quartz | Level 8

I am using Version 9.4, which I meant to mention earlier.

I just tried the code with the WHISKERPCT option in Proc SGPlot and I get an error.

In Proc Boxplot the WHISKERPCT option works (although I call it WHISKERPERCENTILE, but I assume WHISKERPCT would work just as well.  So that gets rid of the whiskers but the problem is the points outside the box still show up.

In Proc SGPanel there is a NOOUTLIERS option I can use to get rid of outliers, but the WHISKERPERCENTILE option doesn't work, so I can get rid of the dots but not the lines.

If there somehow was the WHISKERPERCENTILE and the NOOUTLIERS options in the same graphing plot then I'd be all set, but I haven't been able to find it as of yet.

The point about just changing the whiskers to the color of the background is pretty clever but I haven't investigated it to see if I can do it yet.

As far as the groups go, yes, instead of having Groups A, B, C, D and E and Sexes Male and Females I could just use a Data Step to make groups Male A, Female A, Male B, Female B, etc.  I should have thought of that.  That is one easy answer.  But it will make 10 groups and treat them all alike if I do that.  OTOH I'm fooling with Proc SGPanel now and using

proc sgpanel data=dataset;

   panelby group;

   vbox y / category=sex;

run;

Give me the 10 groups in 5 pairs, with Male and Female for Group A, then a vertical separator line, then Male and Female for Group B, etc.  So putting it into five Male/Female pairs works nice too.  But I still have the issue of not being able to get only a box with the 25th and 75th percentiles as the top and bottom.  It is after Category=Sex that I put Nooutliers, so that's nice, but I still have the darned whiskers that I can't get rid of.

n6
Quartz | Level 8 n6
Quartz | Level 8

Okay, I finally got it, thanks partly to the help of you all here, especially the part about making the whisker be the same color as the background, which worked and I've come to believe is the only way to do it.  Here is my code, made generic.

proc sort data=dataset;   by day2;   run;

proc sgplot data=dataset;

panelby day / rows=1 columns=5 novarname sort=data;

rowaxis label = 'Label'  value = (0 to 20 by 5);

colaxis label = 'Sex';

vbox y / category=sex nooutliers;

whiskerattrs = (color=white);

run;

Here is some explanation.  Pretend the group variable is Day and takes values Mon Tue Wed Thu and Fri.    I made another variable named Day2 like this:

if Day = 'Mon' then Day2 = '1-Mon';

if Day = 'Tue' then Day2 = '2-Tue';

if Day = 'Wed' then Day2 = '3-Wed';

etc.

That is why I sorted by Day2 at the start   And then in the PANELBY line I have Sort=Data.  That tells it to list the items in the order they are in the dataset, which I just sorted by Day2.  So it lists them in the order of 1-Mon, 2-Tue, 3-Wed, etc.  Earlier in that line I have PANELBY Day.  So I'm using Mon, Tue, etc, but instead of ordering by the default of alphabetical order, which would put Fri first, it orders it by 1-Mon, 2-Tue, etc. 

At the top of each column of boxplots it would have Day=Mon or Day=Tue, etc, except that I use the novarname option and that tells it to just put Mon, Tue, etc at the top of each column.

I use rows=1 columns=5 to get them all on the same line.  When I didn't do that it made two rows of three, with the last one on the bottom line empty.  So I tricked it to putting them all on the same line, which made them a little narrow, but it's okay.

I used value = (0 to 20 by 5) because without that the y-axis will go way up to the highest value and all the boxplots will be squished at the bottom since there are some high outliers.

You can omit the label= parts in ROWAXIS and COLAXIS and it will just use the variable name as the label instead.

The nooutliers option is needed because even though the whiskers are white, the outliers will still be there in black unless you tell them not to be via the nooutliers option.

And WHISKATTRS = (color=white) is the coup de grace that makes the whiskers white and thus invisible against the white background.

Thanks for the help.  Hopefully you can use some of the tricks in the code above to make your life easier.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 5772 views
  • 0 likes
  • 4 in conversation