Hello,
I have three variables (columns), and I was asked to create box plots for all three variables in the same figure. I used the below code, but due to the large differences in the ranges of these three variables, the box plots for var1 and var3 are not clear.
data long;
set have;
Variable = "Var1"; Value = Var1; output;
Variable = "Var2"; Value = Var2; output;
Variable = "Var3"; Value = Var3; output;
drop Var1 Var2 Var3;
run;
proc sgplot data=long noborder;
vbox Value / category=Variable;
run;
Could you please help me how to fix that?
Thank you,
You could use proc SGPANEL and have the the box plots as rows. See this example:
data long;
set sashelp.cars;
length variable $ 32;
Variable = "msrp"; Value = msrp; output;
Variable = "horsepower"; Value = horsepower; output;
Variable = "mpg_city"; Value = mpg_city; output;
keep variable value;
run;
proc sgpanel data=long ;
panelby variable / uniscale=column columns=1 ;
vbox value;
run;
Which will create this:
In general the solution to having disparate scales is to change the scale of the Y axis. One was to do this is to use a logarithmic scale. This will fail if some of the data values are actually zero — and I also don't know if it will work for VBOX because I never tried it.
There are many options in PROC SGPLOT for creating logarithmic scales. For example:
proc sgplot data=long noborder;
vbox Value / category=Variable;
yaxis type=log logbase=10 logstyle=logexpand;
run;
You can suppress to print these outliers to make your graphic look better.
proc sgplot data=sashelp.heart; vbox weight/category=bp_Status nooutliers; run;
You could use proc SGPANEL and have the the box plots as rows. See this example:
data long;
set sashelp.cars;
length variable $ 32;
Variable = "msrp"; Value = msrp; output;
Variable = "horsepower"; Value = horsepower; output;
Variable = "mpg_city"; Value = mpg_city; output;
keep variable value;
run;
proc sgpanel data=long ;
panelby variable / uniscale=column columns=1 ;
vbox value;
run;
Which will create this:
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.