BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
turcay
Lapis Lazuli | Level 10

I have two data set which include two columns and only one of the columns contains data on one data set. One of the data set includes 1250 rows and the other one includes 1580 rows. I would like to create Box Whisker graphic like in the screenshot below. How can i do it with the value which comes from two different data sets.Which Procedure should i use it such as proc gplot or proc boxplot. Is it possible for me to take detailed information here?

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

It sounds like you want to know how to combine variables from two data sets into a single data set that you can then analyze with PROC SGPLOT or PROC BOXPLOT. The following example shows how to combine two variables. The SET statement combines the data sets. The KEEP= option keeps only the variable names that you specify. The RENAME= option renames the variables so that they have the same name. Therefor they will be vertically concatenated into a single column.  The IN= option creates a binary indicator variable that you can use to distinguish observations from the first and second data sets.

 

 

data Combine;
set sashelp.class(keep=height rename=(height=y))
    sashelp.iris(keep=SepalLength rename=(SepalLength=y) in=inData2);
dataset = inData2;  /* binary categorical variable */
run;

proc sgplot data=Combine;
vbox y / category=dataset;
yaxis label="Length";
run;

View solution in original post

15 REPLIES 15
RW9
Diamond | Level 26 RW9
Diamond | Level 26

I would suggest you use sgplot/Graph Template Language as since 9.x this replaces all the older procedures.  In graph template you can create overlay graphs, i.e. create on graph and a second one and overlay the second on the first.  If you give each a - or + offset in respective cases.  You can overlay line plots for medians etc.

This blog has examples for anything you want to do with graphs:

http://blogs.sas.com/content/graphicallyspeaking/

 

And specifically for your problem:

http://blogs.sas.com/content/graphicallyspeaking/?s=box+and+whisker

turcay
Lapis Lazuli | Level 10

Hello RW9,

Your answer helpful for me. But my main question is how to merge two data sets values for one Box-Whisker procedure.I can create one Box-Whisker graphic by using the similar code as below. I want to take two data set and generate two Box Whisker graphic which i posted my first message.

 

proc boxplot data=Times;
   plot Delay*Day /
      boxstyle = schematicid
      nohlabel;
   id Reason;
   label Delay = 'Delay in Minutes';
run;

 

 

 

The blog.sas website helped me about the code as below. I would like to see two columns summary which include values like as below. Is this possible to do it ?

 

proc means data=sashelp.cars(where=(type ne 'Hybrid')) noprint;
  class type origin;
  var mpg_city;
  output out=CarsMeanMileage
         mean=Mean
         median=Median
         q1=Q1
         q3=Q3
         p10=P10
         p90=P90;
run;

 

 

Thank you.

Rick_SAS
SAS Super FREQ

It sounds like you want to know how to combine variables from two data sets into a single data set that you can then analyze with PROC SGPLOT or PROC BOXPLOT. The following example shows how to combine two variables. The SET statement combines the data sets. The KEEP= option keeps only the variable names that you specify. The RENAME= option renames the variables so that they have the same name. Therefor they will be vertically concatenated into a single column.  The IN= option creates a binary indicator variable that you can use to distinguish observations from the first and second data sets.

 

 

data Combine;
set sashelp.class(keep=height rename=(height=y))
    sashelp.iris(keep=SepalLength rename=(SepalLength=y) in=inData2);
dataset = inData2;  /* binary categorical variable */
run;

proc sgplot data=Combine;
vbox y / category=dataset;
yaxis label="Length";
run;
turcay
Lapis Lazuli | Level 10
Hello Rick,
 
I'm impressed, thank you for the detailed information.I want to ask you may i change the 0 and 1 label in the graphic. Also i would like to add both mean and median function after connect= statement. As i posted my previous message How it is possible to create following table as below by using proc means statement.(Also -> Max-Q3,Q3-Median,Q1-Min,Average)
 
Thank you for your interest.
 
proc means data=Combine;
class ;
var ;
 output out=CarsMeanMileage
         Mean=Mean
         Median=Median
         Q1=Q1
         Q3=Q3
         Min=Min;
  	 Max=Max;
  run
 
data.png
Reeza
Super User
In proc boxplot look at the OUTBOX= datasaet to get a summary dataset that's pretty close to what you want.

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_boxplot_sect...
turcay
Lapis Lazuli | Level 10

Thank you Reeza. It calculates correctly. As you see in the picture I also want to calculate  Q1-Min, Q3-Med and Q3-Max and Average values. Also there are some High values on the data set what does these values mean ? Lastly, i asked my previous message "I want to ask you may i change the 0 and 1 as label in the graphic. Also i would like to add both mean and median function after connect= statement.". Have you got some knowledge about that ?

Ture.png

turcay
Lapis Lazuli | Level 10

Hello everyone,

 

Is it possible to create the below image which i posted before ?

 

 

 

Thank you.

Reeza
Super User
Yes, but you may not be able to do it all in one step. You can do the remaining metrics via a data step manual calculation.
Not ideal, but easily doable.
turcay
Lapis Lazuli | Level 10

Okay, i will check it and try it again then i'll post whether i succeed or not.

 

Thank you Reeza.

turcay
Lapis Lazuli | Level 10

 

Hello again,

 

Actually i created my the report which i wanted to generate. As @Reeza said that i could not be able to do it one step. I tried many data steps. I would like to ask Could i use something else instead of PCTLPTS to calculate Min ,Q1 ,Median ,Q3 and Max ?

 

PROC UNIVARIATE DATA=Data1;
  VAR LGDTahminiUretimVerisi;
  output out=Data1 PCTLPTS=100 75 50 25 0 pctlpre=Q_;
RUN;
data Data1 ;
set Data1 (rename=(Q_0=Min Q_25=Q1 Q_50=Median Q_75=Q3 Q_100=Max ));
Q1EksiMin=Q1-Min;
Q3EksiMedian=Q3-Median;
MaxEksiQ3=Max-Q3;
MedianEksiQ1=Median-Q1;
run;

PROC TRANSPOSE DATA=Data1 
	OUT=Data1 
	(drop=_label_)
	PREFIX=G1THK
	NAME=BoxWhisker;
	VAR Min Q1 Median Q3 Max Q1EksiMin Q3EksiMedian MaxEksiQ3 MedianEksiQ1;
RUN; 
QUIT;

PROC UNIVARIATE DATA=Data2;
  VAR LGDBackTestVerisi;
  output out=Data2 PCTLPTS=100 75 50 25 0 pctlpre=Q_;
RUN;
data Data2 ;
set Data2 (rename=(Q_0=Min Q_25=Q1 Q_50=Median Q_75=Q3 Q_100=Max ));
Q1EksiMin=Q1-Min;
Q3EksiMedian=Q3-Median;
MaxEksiQ3=Max-Q3;
MedianEksiQ1=Median-Q1;
run;

PROC TRANSPOSE DATA=Data2
	OUT=Data2
	(drop=_label_)
	PREFIX=G2THK
	NAME=BoxWhisker;
	VAR Min Q1 Median Q3 Max Q1EksiMin Q3EksiMedian MaxEksiQ3 MedianEksiQ1;
RUN; 
QUIT;
PROC SQL;
   CREATE TABLE BoxWhisker AS 
   SELECT T.BoxWhisker, 
          T.G1THK1, 
          B.G2THK1
      FROM Data1 T
      INNER JOIN Data2 B 
	  ON (B.BoxWhisker = T.BoxWhisker);
QUIT;
 

 

Reeza
Super User
PROC MEANS with STACKODS option is useful...
turcay
Lapis Lazuli | Level 10

Thank you @Reeza , you told me before and used it when i created to Descriptive Statistics reports. I'll check again.

 

Thanks again.

 

Can.

turcay
Lapis Lazuli | Level 10

My final question on this discussion is How to add both mean and median function after connect= statement when we use PROC SGPLOT ? Are there anybody who have knowledge about it ?

 

Thank you.

 

Rick_SAS
SAS Super FREQ

All you have to do is overlay the same box plot twice.

The first time use CONNECT=MEDIAN and the second time use CONNECT=MEAN.

 

To prevent the second plot from obscuring the first, use NOFILL.

To prevent the second plot from having different colors, use NOCYCLEATTRS on the PROC SGPLOT statement.

Here's an example:

 

proc sgplot data=sashelp.cars nocycleattrs noautolegend;
vbox mpg_city / category=origin connect=median;
vbox mpg_city / category=origin connect=mean NOFILL NOOUTLIERS;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 2788 views
  • 5 likes
  • 4 in conversation