I have two data set which include two columns and only one of the columns contains data on one data set. One of the data set includes 1250 rows and the other one includes 1580 rows. I would like to create Box Whisker graphic like in the screenshot below. How can i do it with the value which comes from two different data sets.Which Procedure should i use it such as proc gplot or proc boxplot. Is it possible for me to take detailed information here?
Thank you.
It sounds like you want to know how to combine variables from two data sets into a single data set that you can then analyze with PROC SGPLOT or PROC BOXPLOT. The following example shows how to combine two variables. The SET statement combines the data sets. The KEEP= option keeps only the variable names that you specify. The RENAME= option renames the variables so that they have the same name. Therefor they will be vertically concatenated into a single column. The IN= option creates a binary indicator variable that you can use to distinguish observations from the first and second data sets.
data Combine;
set sashelp.class(keep=height rename=(height=y))
sashelp.iris(keep=SepalLength rename=(SepalLength=y) in=inData2);
dataset = inData2; /* binary categorical variable */
run;
proc sgplot data=Combine;
vbox y / category=dataset;
yaxis label="Length";
run;
I would suggest you use sgplot/Graph Template Language as since 9.x this replaces all the older procedures. In graph template you can create overlay graphs, i.e. create on graph and a second one and overlay the second on the first. If you give each a - or + offset in respective cases. You can overlay line plots for medians etc.
This blog has examples for anything you want to do with graphs:
http://blogs.sas.com/content/graphicallyspeaking/
And specifically for your problem:
http://blogs.sas.com/content/graphicallyspeaking/?s=box+and+whisker
Hello RW9,
Your answer helpful for me. But my main question is how to merge two data sets values for one Box-Whisker procedure.I can create one Box-Whisker graphic by using the similar code as below. I want to take two data set and generate two Box Whisker graphic which i posted my first message.
proc boxplot data=Times; plot Delay*Day / boxstyle = schematicid nohlabel; id Reason; label Delay = 'Delay in Minutes'; run;
The blog.sas website helped me about the code as below. I would like to see two columns summary which include values like as below. Is this possible to do it ?
proc means data=sashelp.cars(where=(type ne 'Hybrid')) noprint; class type origin; var mpg_city; output out=CarsMeanMileage mean=Mean median=Median q1=Q1 q3=Q3 p10=P10 p90=P90; run;
Thank you.
It sounds like you want to know how to combine variables from two data sets into a single data set that you can then analyze with PROC SGPLOT or PROC BOXPLOT. The following example shows how to combine two variables. The SET statement combines the data sets. The KEEP= option keeps only the variable names that you specify. The RENAME= option renames the variables so that they have the same name. Therefor they will be vertically concatenated into a single column. The IN= option creates a binary indicator variable that you can use to distinguish observations from the first and second data sets.
data Combine;
set sashelp.class(keep=height rename=(height=y))
sashelp.iris(keep=SepalLength rename=(SepalLength=y) in=inData2);
dataset = inData2; /* binary categorical variable */
run;
proc sgplot data=Combine;
vbox y / category=dataset;
yaxis label="Length";
run;
proc means data=Combine;
class ;
var ;
output out=CarsMeanMileage
Mean=Mean
Median=Median
Q1=Q1
Q3=Q3
Min=Min;
Max=Max;
run
Thank you Reeza. It calculates correctly. As you see in the picture I also want to calculate Q1-Min, Q3-Med and Q3-Max and Average values. Also there are some High values on the data set what does these values mean ? Lastly, i asked my previous message "I want to ask you may i change the 0 and 1 as label in the graphic. Also i would like to add both mean and median function after connect= statement.". Have you got some knowledge about that ?
Hello everyone,
Is it possible to create the below image which i posted before ?
Thank you.
Okay, i will check it and try it again then i'll post whether i succeed or not.
Thank you Reeza.
Hello again,
Actually i created my the report which i wanted to generate. As @Reeza said that i could not be able to do it one step. I tried many data steps. I would like to ask Could i use something else instead of PCTLPTS to calculate Min ,Q1 ,Median ,Q3 and Max ?
PROC UNIVARIATE DATA=Data1;
VAR LGDTahminiUretimVerisi;
output out=Data1 PCTLPTS=100 75 50 25 0 pctlpre=Q_;
RUN;
data Data1 ;
set Data1 (rename=(Q_0=Min Q_25=Q1 Q_50=Median Q_75=Q3 Q_100=Max ));
Q1EksiMin=Q1-Min;
Q3EksiMedian=Q3-Median;
MaxEksiQ3=Max-Q3;
MedianEksiQ1=Median-Q1;
run;
PROC TRANSPOSE DATA=Data1
OUT=Data1
(drop=_label_)
PREFIX=G1THK
NAME=BoxWhisker;
VAR Min Q1 Median Q3 Max Q1EksiMin Q3EksiMedian MaxEksiQ3 MedianEksiQ1;
RUN;
QUIT;
PROC UNIVARIATE DATA=Data2;
VAR LGDBackTestVerisi;
output out=Data2 PCTLPTS=100 75 50 25 0 pctlpre=Q_;
RUN;
data Data2 ;
set Data2 (rename=(Q_0=Min Q_25=Q1 Q_50=Median Q_75=Q3 Q_100=Max ));
Q1EksiMin=Q1-Min;
Q3EksiMedian=Q3-Median;
MaxEksiQ3=Max-Q3;
MedianEksiQ1=Median-Q1;
run;
PROC TRANSPOSE DATA=Data2
OUT=Data2
(drop=_label_)
PREFIX=G2THK
NAME=BoxWhisker;
VAR Min Q1 Median Q3 Max Q1EksiMin Q3EksiMedian MaxEksiQ3 MedianEksiQ1;
RUN;
QUIT;
PROC SQL;
CREATE TABLE BoxWhisker AS
SELECT T.BoxWhisker,
T.G1THK1,
B.G2THK1
FROM Data1 T
INNER JOIN Data2 B
ON (B.BoxWhisker = T.BoxWhisker);
QUIT;
Thank you @Reeza , you told me before and used it when i created to Descriptive Statistics reports. I'll check again.
Thanks again.
Can.
My final question on this discussion is How to add both mean and median function after connect= statement when we use PROC SGPLOT ? Are there anybody who have knowledge about it ?
Thank you.
All you have to do is overlay the same box plot twice.
The first time use CONNECT=MEDIAN and the second time use CONNECT=MEAN.
To prevent the second plot from obscuring the first, use NOFILL.
To prevent the second plot from having different colors, use NOCYCLEATTRS on the PROC SGPLOT statement.
Here's an example:
proc sgplot data=sashelp.cars nocycleattrs noautolegend;
vbox mpg_city / category=origin connect=median;
vbox mpg_city / category=origin connect=mean NOFILL NOOUTLIERS;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.