Programming the statistical procedures from SAS

Box and Whisker Calculation with Two Data Set

Accepted Solution Solved
Reply
Super Contributor
Posts: 381
Accepted Solution

Box and Whisker Calculation with Two Data Set

[ Edited ]

I have two data set which include two columns and only one of the columns contains data on one data set. One of the data set includes 1250 rows and the other one includes 1580 rows. I would like to create Box Whisker graphic like in the screenshot below. How can i do it with the value which comes from two different data sets.Which Procedure should i use it such as proc gplot or proc boxplot. Is it possible for me to take detailed information here?

 

Thank you.


Accepted Solutions
Solution
‎11-11-2015 06:14 PM
SAS Super FREQ
Posts: 3,306

Re: Box and Whisker Calculation with Two Data Set

It sounds like you want to know how to combine variables from two data sets into a single data set that you can then analyze with PROC SGPLOT or PROC BOXPLOT. The following example shows how to combine two variables. The SET statement combines the data sets. The KEEP= option keeps only the variable names that you specify. The RENAME= option renames the variables so that they have the same name. Therefor they will be vertically concatenated into a single column.  The IN= option creates a binary indicator variable that you can use to distinguish observations from the first and second data sets.

 

 

data Combine;
set sashelp.class(keep=height rename=(height=y))
    sashelp.iris(keep=SepalLength rename=(SepalLength=y) in=inData2);
dataset = inData2;  /* binary categorical variable */
run;

proc sgplot data=Combine;
vbox y / category=dataset;
yaxis label="Length";
run;

View solution in original post


All Replies
Esteemed Advisor
Esteemed Advisor
Posts: 6,977

Re: Box and Whisker Calculation with Two Data Set

I would suggest you use sgplot/Graph Template Language as since 9.x this replaces all the older procedures.  In graph template you can create overlay graphs, i.e. create on graph and a second one and overlay the second on the first.  If you give each a - or + offset in respective cases.  You can overlay line plots for medians etc.

This blog has examples for anything you want to do with graphs:

http://blogs.sas.com/content/graphicallyspeaking/

 

And specifically for your problem:

http://blogs.sas.com/content/graphicallyspeaking/?s=box+and+whisker

Super Contributor
Posts: 381

Re: Box and Whisker Calculation with Two Data Set

[ Edited ]

Hello RW9,

Your answer helpful for me. But my main question is how to merge two data sets values for one Box-Whisker procedure.I can create one Box-Whisker graphic by using the similar code as below. I want to take two data set and generate two Box Whisker graphic which i posted my first message.

 

proc boxplot data=Times;
   plot Delay*Day /
      boxstyle = schematicid
      nohlabel;
   id Reason;
   label Delay = 'Delay in Minutes';
run;

 

 

 

The blog.sas website helped me about the code as below. I would like to see two columns summary which include values like as below. Is this possible to do it ?

 

proc means data=sashelp.cars(where=(type ne 'Hybrid')) noprint;
  class type origin;
  var mpg_city;
  output out=CarsMeanMileage
         mean=Mean
         median=Median
         q1=Q1
         q3=Q3
         p10=P10
         p90=P90;
run;

 

 

Thank you.

Solution
‎11-11-2015 06:14 PM
SAS Super FREQ
Posts: 3,306

Re: Box and Whisker Calculation with Two Data Set

It sounds like you want to know how to combine variables from two data sets into a single data set that you can then analyze with PROC SGPLOT or PROC BOXPLOT. The following example shows how to combine two variables. The SET statement combines the data sets. The KEEP= option keeps only the variable names that you specify. The RENAME= option renames the variables so that they have the same name. Therefor they will be vertically concatenated into a single column.  The IN= option creates a binary indicator variable that you can use to distinguish observations from the first and second data sets.

 

 

data Combine;
set sashelp.class(keep=height rename=(height=y))
    sashelp.iris(keep=SepalLength rename=(SepalLength=y) in=inData2);
dataset = inData2;  /* binary categorical variable */
run;

proc sgplot data=Combine;
vbox y / category=dataset;
yaxis label="Length";
run;
Super Contributor
Posts: 381

Re: Box and Whisker Calculation with Two Data Set

[ Edited ]
Hello Rick,
 
I'm impressed, thank you for the detailed information.I want to ask you may i change the 0 and 1 label in the graphic. Also i would like to add both mean and median function after connect= statement. As i posted my previous message How it is possible to create following table as below by using proc means statement.(Also -> Max-Q3,Q3-Median,Q1-Min,Average)
 
Thank you for your interest.
 
proc means data=Combine;
class ;
var ;
 output out=CarsMeanMileage
         Mean=Mean
         Median=Median
         Q1=Q1
         Q3=Q3
         Min=Min;
  	 Max=Max;
  run
 
data.png
Grand Advisor
Posts: 16,875

Re: Box and Whisker Calculation with Two Data Set

In proc boxplot look at the OUTBOX= datasaet to get a summary dataset that's pretty close to what you want.

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_boxplot_sect...
Super Contributor
Posts: 381

Re: Box and Whisker Calculation with Two Data Set

Thank you Reeza. It calculates correctly. As you see in the picture I also want to calculate  Q1-Min, Q3-Med and Q3-Max and Average values. Also there are some High values on the data set what does these values mean ? Lastly, i asked my previous message "I want to ask you may i change the 0 and 1 as label in the graphic. Also i would like to add both mean and median function after connect= statement.". Have you got some knowledge about that ?

Ture.png

Super Contributor
Posts: 381

Re: Box and Whisker Calculation with Two Data Set

[ Edited ]

Hello everyone,

 

Is it possible to create the below image which i posted before ?

 

 

 

Thank you.

Grand Advisor
Posts: 16,875

Re: Box and Whisker Calculation with Two Data Set

Yes, but you may not be able to do it all in one step. You can do the remaining metrics via a data step manual calculation.
Not ideal, but easily doable.
Super Contributor
Posts: 381

Re: Box and Whisker Calculation with Two Data Set

Okay, i will check it and try it again then i'll post whether i succeed or not.

 

Thank you Reeza.

Super Contributor
Posts: 381

Re: Box and Whisker Calculation with Two Data Set

 

Hello again,

 

Actually i created my the report which i wanted to generate. As @Reeza said that i could not be able to do it one step. I tried many data steps. I would like to ask Could i use something else instead of PCTLPTS to calculate Min ,Q1 ,Median ,Q3 and Max ?

 

PROC UNIVARIATE DATA=Data1;
  VAR LGDTahminiUretimVerisi;
  output out=Data1 PCTLPTS=100 75 50 25 0 pctlpre=Q_;
RUN;
data Data1 ;
set Data1 (rename=(Q_0=Min Q_25=Q1 Q_50=Median Q_75=Q3 Q_100=Max ));
Q1EksiMin=Q1-Min;
Q3EksiMedian=Q3-Median;
MaxEksiQ3=Max-Q3;
MedianEksiQ1=Median-Q1;
run;

PROC TRANSPOSE DATA=Data1 
	OUT=Data1 
	(drop=_label_)
	PREFIX=G1THK
	NAME=BoxWhisker;
	VAR Min Q1 Median Q3 Max Q1EksiMin Q3EksiMedian MaxEksiQ3 MedianEksiQ1;
RUN; 
QUIT;

PROC UNIVARIATE DATA=Data2;
  VAR LGDBackTestVerisi;
  output out=Data2 PCTLPTS=100 75 50 25 0 pctlpre=Q_;
RUN;
data Data2 ;
set Data2 (rename=(Q_0=Min Q_25=Q1 Q_50=Median Q_75=Q3 Q_100=Max ));
Q1EksiMin=Q1-Min;
Q3EksiMedian=Q3-Median;
MaxEksiQ3=Max-Q3;
MedianEksiQ1=Median-Q1;
run;

PROC TRANSPOSE DATA=Data2
	OUT=Data2
	(drop=_label_)
	PREFIX=G2THK
	NAME=BoxWhisker;
	VAR Min Q1 Median Q3 Max Q1EksiMin Q3EksiMedian MaxEksiQ3 MedianEksiQ1;
RUN; 
QUIT;
PROC SQL;
   CREATE TABLE BoxWhisker AS 
   SELECT T.BoxWhisker, 
          T.G1THK1, 
          B.G2THK1
      FROM Data1 T
      INNER JOIN Data2 B 
	  ON (B.BoxWhisker = T.BoxWhisker);
QUIT;
 

 

Grand Advisor
Posts: 16,875

Re: Box and Whisker Calculation with Two Data Set

PROC MEANS with STACKODS option is useful...
Super Contributor
Posts: 381

Re: Box and Whisker Calculation with Two Data Set

Thank you @Reeza , you told me before and used it when i created to Descriptive Statistics reports. I'll check again.

 

Thanks again.

 

Can.

Super Contributor
Posts: 381

Re: Box and Whisker Calculation with Two Data Set

My final question on this discussion is How to add both mean and median function after connect= statement when we use PROC SGPLOT ? Are there anybody who have knowledge about it ?

 

Thank you.

 

SAS Super FREQ
Posts: 3,306

Re: Box and Whisker Calculation with Two Data Set

All you have to do is overlay the same box plot twice.

The first time use CONNECT=MEDIAN and the second time use CONNECT=MEAN.

 

To prevent the second plot from obscuring the first, use NOFILL.

To prevent the second plot from having different colors, use NOCYCLEATTRS on the PROC SGPLOT statement.

Here's an example:

 

proc sgplot data=sashelp.cars nocycleattrs noautolegend;
vbox mpg_city / category=origin connect=median;
vbox mpg_city / category=origin connect=mean NOFILL NOOUTLIERS;
run;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 539 views
  • 5 likes
  • 4 in conversation