08-25-2015 01:51 PM
I realize that Proc Boxplot isn't to statistic-y but it is in SAS/Stat package so I'll post the question here. There are two things I can't figure out how to do.
1. The person I'm working with just wants the box with the 25th and 75th percentile at bottom and top. No lines are wanted and no dots for outlying points are wanted. I figured out how to get rid of the lines with the WHISKERPERCENTILE option but although the lines go away, the points show up in place of the lines. I know that there is the CLIPFACTOR option but it uses an algorithm rather than just getting rid of points outside the box entirely. I've looked pretty intensively and I'm afraid there is simply no way to get rid of the point and have only the box.
2. The other thing is this. I have five side-by-side boxplots for, say, Groups A, B, C, D and E. Okay, fine, but then the person wants the same thing except each of those by sex. But a BY statement puts them on separate plots. What the users wants is Group A Males next to Group A Females, then Group B Males next to Group B Females, etc. Again, I'm afraid that this simply isn't possible in SAS.
Any help on either of these two questions is greatly appreciated.
08-25-2015 03:25 PM
Here is the general format. The "(group)" part is just an attempt to have it put the Male and Female results side by side (which I'm discovering may be easier done in SGPANEL) but it has no effect in terms of the outliers. If I did not have that WHISKERPERCENTILE bit I'd bet lines out the top and bottom of the box going to the max and min. I put the WHISKERPERCENTILE and the lines coming out the top and bottom of the box go away but there are still little circles indicating the observations outside the box.
proc boxplot data=dataset;
plot y * x (group);
whiskerpercentile = 25;
08-25-2015 03:58 PM
Have you tried the option boxstyle=schematic?
The grouping issue is easy: create a new variable in a data step that combines your group A to E and Sex. So you have "Group A Male" "Group A Female" etc. If you need all 3, "Group A" and the two with gender then you make another group:
GroupVar = Catx(' ',"Group", group, sex); output;
GroupVar = Catx(' ',"Group", group); output; /* this creates a separate value but you get 2 levels of grouping*/
08-25-2015 03:58 PM
Use SGPLOT. Depending on your SAS version you can control the WHISKERPCT. The side-by-side is easy. See this article: http://blogs.sas.com/content/iml/2012/08/22/categories-vs-groups-in-proc-sgplot.html
do Group = 'A','B','C','D','E';
do sex = 'm','f';
do i = 1 to 10;
x = rand("Normal");
proc sgplot data=have;
vbox x / category=Group group=sex whiskerpct=25 nooutliers;
If you don't have the WHISKERPCT option in PROC SGPLOT, it is in the GTL:
BoxPlot X=Group Y=x / Group=sex whiskerpercentile=25 groupdisplay=cluster;
08-25-2015 07:11 PM
I am using Version 9.4, which I meant to mention earlier.
I just tried the code with the WHISKERPCT option in Proc SGPlot and I get an error.
In Proc Boxplot the WHISKERPCT option works (although I call it WHISKERPERCENTILE, but I assume WHISKERPCT would work just as well. So that gets rid of the whiskers but the problem is the points outside the box still show up.
In Proc SGPanel there is a NOOUTLIERS option I can use to get rid of outliers, but the WHISKERPERCENTILE option doesn't work, so I can get rid of the dots but not the lines.
If there somehow was the WHISKERPERCENTILE and the NOOUTLIERS options in the same graphing plot then I'd be all set, but I haven't been able to find it as of yet.
The point about just changing the whiskers to the color of the background is pretty clever but I haven't investigated it to see if I can do it yet.
As far as the groups go, yes, instead of having Groups A, B, C, D and E and Sexes Male and Females I could just use a Data Step to make groups Male A, Female A, Male B, Female B, etc. I should have thought of that. That is one easy answer. But it will make 10 groups and treat them all alike if I do that. OTOH I'm fooling with Proc SGPanel now and using
proc sgpanel data=dataset;
vbox y / category=sex;
Give me the 10 groups in 5 pairs, with Male and Female for Group A, then a vertical separator line, then Male and Female for Group B, etc. So putting it into five Male/Female pairs works nice too. But I still have the issue of not being able to get only a box with the 25th and 75th percentiles as the top and bottom. It is after Category=Sex that I put Nooutliers, so that's nice, but I still have the darned whiskers that I can't get rid of.
08-25-2015 09:02 PM
Okay, I finally got it, thanks partly to the help of you all here, especially the part about making the whisker be the same color as the background, which worked and I've come to believe is the only way to do it. Here is my code, made generic.
proc sort data=dataset; by day2; run;
proc sgplot data=dataset;
panelby day / rows=1 columns=5 novarname sort=data;
rowaxis label = 'Label' value = (0 to 20 by 5);
colaxis label = 'Sex';
vbox y / category=sex nooutliers;
whiskerattrs = (color=white);
Here is some explanation. Pretend the group variable is Day and takes values Mon Tue Wed Thu and Fri. I made another variable named Day2 like this:
if Day = 'Mon' then Day2 = '1-Mon';
if Day = 'Tue' then Day2 = '2-Tue';
if Day = 'Wed' then Day2 = '3-Wed';
That is why I sorted by Day2 at the start And then in the PANELBY line I have Sort=Data. That tells it to list the items in the order they are in the dataset, which I just sorted by Day2. So it lists them in the order of 1-Mon, 2-Tue, 3-Wed, etc. Earlier in that line I have PANELBY Day. So I'm using Mon, Tue, etc, but instead of ordering by the default of alphabetical order, which would put Fri first, it orders it by 1-Mon, 2-Tue, etc.
At the top of each column of boxplots it would have Day=Mon or Day=Tue, etc, except that I use the novarname option and that tells it to just put Mon, Tue, etc at the top of each column.
I use rows=1 columns=5 to get them all on the same line. When I didn't do that it made two rows of three, with the last one on the bottom line empty. So I tricked it to putting them all on the same line, which made them a little narrow, but it's okay.
I used value = (0 to 20 by 5) because without that the y-axis will go way up to the highest value and all the boxplots will be squished at the bottom since there are some high outliers.
You can omit the label= parts in ROWAXIS and COLAXIS and it will just use the variable name as the label instead.
The nooutliers option is needed because even though the whiskers are white, the outliers will still be there in black unless you tell them not to be via the nooutliers option.
And WHISKATTRS = (color=white) is the coup de grace that makes the whiskers white and thus invisible against the white background.
Thanks for the help. Hopefully you can use some of the tricks in the code above to make your life easier.