I have 5 treatment groups of 10 subjects each.
I want to plot on 1 graph 2 informations:
- the mean of each group (with a bar chart)
- and a scatter plot of the 10 subjects per group.
I've tried to performed this using the PROC GBARLINE, but I cannot add the scatter plot .....
Here's an idea that you might like to try. This method adds statistical richness to the plot that you won't get with a bar showing the mean. In fact, the bar showing a mean will mask all data below the mean and add a lot of ink to the graph for just 1 data point (the mean). The proposal here does not show the individual data points but does show the distribution along with some quantile information. I imagine that the scatter data could be added with another statement or two.
I'm not too keen on a BoxPlot for 10 data points. It's trying for too much information. I'd suggest a scatter plot where you plot the mean with a separate symbol from the data points.
You can do this by creating a new dataset that contains the original data and another column with just the mean for each group. You can most easily do this using the MEAN aggregate operator in PROC SQL, but you could also do it using PROC means, outputting the 5 records, and SETting the data back together. The you would use PROC GPLOT and the PLOT statement would include the /OVERLAY option, something like
PLOT raw*group mean*group/OVERLAY;
You will need to mess with your SYMBOL statements.
The downside of this method is that data with duplicate values of 'raw' disappear. "Jittering" adds some random scatter to the data so all the points still show. I wrote a macro for that some years ago (before SUGI proceedings were online). You can find the Mayo implementation by googling .
I've already had a similar question, and there are two ways to solve it :
1) using proc GCHART for the bar chart and adding an ANNOTATE dataset for the scatter plot
2) using a template graphics written in GTL (experimental in 9.1.3, production with a slightly different syntax in 9.2).
Sample codes for both, below.
%LET table = work.test ;
%LET group = trt ;
%LET y = value ;
DATA work.anno (DROP = &group &y) ;
SET &table ;
xsys = "2" ; ysys = "2" ; when = "A" ; function = "SYMBOL" ; text = "PLUS" ;
xc = STRIP(&group) ;
y = &y ;
PROC GCHART DATA = &table ;
VBAR &group / DISCRETE TYPE = MEAN SUMVAR = &y ANNOTATE = work.anno ;
RUN ; QUIT ;
Note that you may have to define an AXIS statement to show all values. My suggestion is you collect max and min for your response variable in macro variables and use them in an AXIS definition.
2nd method, using GTL
PROC TEMPLATE ;
DEFINE statGraph barScatter ;
DYNAMIC group y mean ;
LAYOUT OVERLAY ;
BARCHARTPARM X=group Y=mean ;
SCATTER X=group Y=y ;
PROC SQL ;
CREATE TABLE work.data AS
SELECT *, MEAN(&y) AS y_mean
GROUP BY &group
ODS HTML GPATH="c:\temp" ;
DATA _NULL_ ;
SET work.data ;
FILE PRINT ODS=(TEMPLATE="barScatter" DYNAMIC=(group="&group" y="&y" mean="y_mean")) ;
PUT _ODS_ ;
ODS HTML CLOSE ;
Good luck !