Hello,
I would like to create a panel of visualizations that have vertical lines that span between min and max stat values for every category, along with a marker for mean values. I am able to get most of the way there using PROC SGPANEL using the vbox statement, however, I am not able to get rid of the boxes themselves. I would like to keep the full whiskers along with their caps, and the marker for the mean value for every categorical series.
This is the last version of the code that I used to create the plots:
proc sgpanel data = dataset; panelby label / layout=rowlattice onepanel noheaderborder sort=data novarname HEADERBACKCOLOR=white HEADERATTRS=(Size=12 Weight=Bold) proportional; vbox ROC /category = tuners nofill WHISKERPCT=0 nomedian WHISKERATTRS=(color=darkgray) meanattrs=(color=darkgray size=5 symbol=x); rowaxis label="ROC Scores By Model" grid labelattrs=(size=12 weight=bold) ; colaxis display=(nolabel) discreteorder=data; run;
Example data if you want a working solution.
You don't mention anything about outliers. Are you just wanting to connect the high and low value within each category? Plus a marker for the Mean?
I might summarize the data an then have a HIGHLOW plot and a SCATTER plot superimposed.
Here is a small example with SGPLOT. Your summary step would include the Panelby variable(s).
data example; do category= 'A', 'B', 'C'; do i= 1 to 10; value = rand('integer',100); output; end; end; run; proc summary data=example nway; class category; var value; output out=summary min=lowval max=highval mean=meanval; run; proc sgplot data=summary; highlow x=category low=lowval high=highval/ highcap=serif lowcap=serif lineattrs=(color=darkgray) ; scatter x=category y=meanval/ markerattrs=(color=darkgray size=5mm symbol=x); run;
Example data if you want a working solution.
You don't mention anything about outliers. Are you just wanting to connect the high and low value within each category? Plus a marker for the Mean?
I might summarize the data an then have a HIGHLOW plot and a SCATTER plot superimposed.
Here is a small example with SGPLOT. Your summary step would include the Panelby variable(s).
data example; do category= 'A', 'B', 'C'; do i= 1 to 10; value = rand('integer',100); output; end; end; run; proc summary data=example nway; class category; var value; output out=summary min=lowval max=highval mean=meanval; run; proc sgplot data=summary; highlow x=category low=lowval high=highval/ highcap=serif lowcap=serif lineattrs=(color=darkgray) ; scatter x=category y=meanval/ markerattrs=(color=darkgray size=5mm symbol=x); run;
This suggestion was very helpful, and I was able to create the charts that I had envisioned. However, for categories where the high and low values are either the same, or very close to each other, none of the lines show in the chart (see attachment). Is there a way to make sure the lines show (even if it looks like just one line in the chart) when the min and max values are either the same or nearly the same?
proc sgpanel data=data noautolegend; panelby label / layout=rowlattice onepanel noheaderborder sort=data novarname HEADERBACKCOLOR=white HEADERATTRS=(Size=12 Weight=Bold) proportional; highlow x=category low=min high=max/ highcap=serif lowcap=serif lineattrs=(color=darkgray thickness=1) ; rowaxis label="ROC Scores By Model" grid labelattrs=(size=12 weight=bold) ; scatter x=category y=mean/ markerattrs=(color=darkblue size=4 symbol=diamondfilled); colaxis display=(nolabel) discreteorder=data; run;
It looks like the highlow plot won't draw caps for short lines. This restriction is stated in the docs:
Restriction
Caps are not displayed for very short bars. Bar height must be at least twice the size of the cap in order for the cap to appear.
Going back to your original boxplot idea, what about adding
boxwidth=0 lineattrs=(thickness=0)
to the options on your VBOX statement? For a simple SGPLOT, I think that gets rid of the box.
@GuyTreepwood wrote:
This suggestion was very helpful, and I was able to create the charts that I had envisioned. However, for categories where the high and low values are either the same, or very close to each other, none of the lines show in the chart (see attachment). Is there a way to make sure the lines show (even if it looks like just one line in the chart) when the min and max values are either the same or nearly the same?
If that represents your data then there is so little range of values I doubt a box plot makes any sense to display on that scale.
Or did you extract one panel? Perhaps you don't actually want all of the rowaxis to be the same. Try adding UNISCALE=Column to the Panelby statement.
I do not want to change the y-axis scales in the panel, so I will keep the chart as it is, without the caps, and make a note of the restriction regarding the close caps found in the documentation.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.