turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Box and Whisker Calculation with Two Data Set

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-30-2015 06:49 AM - edited 12-03-2015 06:04 AM

I have two data set which include two columns and only one of the columns contains data on one data set. One of the data set includes 1250 rows and the other one includes 1580 rows. I would like to create Box Whisker graphic like in the screenshot below. How can i do it with the value which comes from two different data sets.Which Procedure should i use it such as proc gplot or proc boxplot. Is it possible for me to take detailed information here?

Thank you.

Accepted Solutions

Solution

11-11-2015
06:14 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to turcay

10-30-2015 08:38 AM

It sounds like you want to know how to combine variables from two data sets into a single data set that you can then analyze with PROC SGPLOT or PROC BOXPLOT. The following example shows how to combine two variables. The SET statement combines the data sets. The KEEP= option keeps only the variable names that you specify. The RENAME= option renames the variables so that they have the same name. Therefor they will be vertically concatenated into a single column. The IN= option creates a binary indicator variable that you can use to distinguish observations from the first and second data sets.

```
data Combine;
set sashelp.class(keep=height rename=(height=y))
sashelp.iris(keep=SepalLength rename=(SepalLength=y) in=inData2);
dataset = inData2; /* binary categorical variable */
run;
proc sgplot data=Combine;
vbox y / category=dataset;
yaxis label="Length";
run;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to turcay

10-30-2015 07:15 AM

I would suggest you use sgplot/Graph Template Language as since 9.x this replaces all the older procedures. In graph template you can create overlay graphs, i.e. create on graph and a second one and overlay the second on the first. If you give each a - or + offset in respective cases. You can overlay line plots for medians etc.

This blog has examples for anything you want to do with graphs:

http://blogs.sas.com/content/graphicallyspeaking/

And specifically for your problem:

http://blogs.sas.com/content/graphicallyspeaking/?s=box+and+whisker

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-30-2015 08:02 AM - edited 11-11-2015 06:16 PM

Hello RW9,

Your answer helpful for me. But my main question is how to merge two data sets values for one Box-Whisker procedure.I can create one Box-Whisker graphic by using the similar code as below. I want to take two data set and generate two Box Whisker graphic which i posted my first message.

proc boxplot data=Times; plot Delay*Day / boxstyle = schematicid nohlabel; id Reason; label Delay = 'Delay in Minutes'; run;

The blog.sas website helped me about the code as below. I would like to see two columns summary which include values like as below. Is this possible to do it ?

proc means data=sashelp.cars(where=(type ne 'Hybrid')) noprint; class type origin; var mpg_city; output out=CarsMeanMileage mean=Mean median=Median q1=Q1 q3=Q3 p10=P10 p90=P90; run;

Thank you.

Solution

11-11-2015
06:14 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to turcay

10-30-2015 08:38 AM

It sounds like you want to know how to combine variables from two data sets into a single data set that you can then analyze with PROC SGPLOT or PROC BOXPLOT. The following example shows how to combine two variables. The SET statement combines the data sets. The KEEP= option keeps only the variable names that you specify. The RENAME= option renames the variables so that they have the same name. Therefor they will be vertically concatenated into a single column. The IN= option creates a binary indicator variable that you can use to distinguish observations from the first and second data sets.

```
data Combine;
set sashelp.class(keep=height rename=(height=y))
sashelp.iris(keep=SepalLength rename=(SepalLength=y) in=inData2);
dataset = inData2; /* binary categorical variable */
run;
proc sgplot data=Combine;
vbox y / category=dataset;
yaxis label="Length";
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

10-30-2015 09:28 AM - edited 11-11-2015 06:16 PM

Hello Rick,

I'm impressed, thank you for the detailed information.I want to ask you may i change the 0 and 1 label in the graphic. Also i would like to add both mean and median function after connect= statement. As i posted my previous message How it is possible to create following table as below by using proc means statement.(Also -> Max-Q3,Q3-Median,Q1-Min,Average)

Thank you for your interest.

```
proc means data=Combine;
class ;
var ;
output out=CarsMeanMileage
Mean=Mean
Median=Median
Q1=Q1
Q3=Q3
Min=Min;
Max=Max;
run
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to turcay

10-30-2015 10:41 AM

In proc boxplot look at the OUTBOX= datasaet to get a summary dataset that's pretty close to what you want.

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_boxplot_sect...

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_boxplot_sect...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

10-30-2015 07:49 PM

Thank you Reeza. It calculates correctly. As you see in the picture I also want to calculate Q1-Min, Q3-Med and Q3-Max and Average values. Also there are some High values on the data set what does these values mean ? Lastly, i asked my previous message "I want to ask you may i change the 0 and 1 as label in the graphic. Also i would like to add both mean and median function after connect= statement.". Have you got some knowledge about that ?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to turcay

11-03-2015 03:05 AM - edited 12-03-2015 06:04 AM

Hello everyone,

Is it possible to create the below image which i posted before ?

Thank you.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to turcay

11-03-2015 10:23 AM

Yes, but you may not be able to do it all in one step. You can do the remaining metrics via a data step manual calculation.

Not ideal, but easily doable.

Not ideal, but easily doable.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

11-03-2015 10:49 AM

Okay, i will check it and try it again then i'll post whether i succeed or not.

Thank you Reeza.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to turcay

11-05-2015 05:24 PM

Hello again,

Actually i created my the report which i wanted to generate. As @Reeza said that i could not be able to do it one step. I tried many data steps. I would like to ask Could i use something else instead of PCTLPTS to calculate Min ,Q1 ,Median ,Q3 and Max ?

```
PROC UNIVARIATE DATA=Data1;
VAR LGDTahminiUretimVerisi;
output out=Data1 PCTLPTS=100 75 50 25 0 pctlpre=Q_;
RUN;
data Data1 ;
set Data1 (rename=(Q_0=Min Q_25=Q1 Q_50=Median Q_75=Q3 Q_100=Max ));
Q1EksiMin=Q1-Min;
Q3EksiMedian=Q3-Median;
MaxEksiQ3=Max-Q3;
MedianEksiQ1=Median-Q1;
run;
PROC TRANSPOSE DATA=Data1
OUT=Data1
(drop=_label_)
PREFIX=G1THK
NAME=BoxWhisker;
VAR Min Q1 Median Q3 Max Q1EksiMin Q3EksiMedian MaxEksiQ3 MedianEksiQ1;
RUN;
QUIT;
PROC UNIVARIATE DATA=Data2;
VAR LGDBackTestVerisi;
output out=Data2 PCTLPTS=100 75 50 25 0 pctlpre=Q_;
RUN;
data Data2 ;
set Data2 (rename=(Q_0=Min Q_25=Q1 Q_50=Median Q_75=Q3 Q_100=Max ));
Q1EksiMin=Q1-Min;
Q3EksiMedian=Q3-Median;
MaxEksiQ3=Max-Q3;
MedianEksiQ1=Median-Q1;
run;
PROC TRANSPOSE DATA=Data2
OUT=Data2
(drop=_label_)
PREFIX=G2THK
NAME=BoxWhisker;
VAR Min Q1 Median Q3 Max Q1EksiMin Q3EksiMedian MaxEksiQ3 MedianEksiQ1;
RUN;
QUIT;
PROC SQL;
CREATE TABLE BoxWhisker AS
SELECT T.BoxWhisker,
T.G1THK1,
B.G2THK1
FROM Data1 T
INNER JOIN Data2 B
ON (B.BoxWhisker = T.BoxWhisker);
QUIT;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to turcay

11-05-2015 05:33 PM

PROC MEANS with STACKODS option is useful...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

11-05-2015 05:36 PM

Thank you @Reeza , you told me before and used it when i created to Descriptive Statistics reports. I'll check again.

Thanks again.

Can.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to turcay

11-10-2015 11:12 AM

My final question on this discussion is How to add both mean and median function after connect= statement when we use PROC SGPLOT ? Are there anybody who have knowledge about it ?

Thank you.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to turcay

11-10-2015 11:37 AM

All you have to do is overlay the same box plot twice.

The first time use CONNECT=MEDIAN and the second time use CONNECT=MEAN.

To prevent the second plot from obscuring the first, use NOFILL.

To prevent the second plot from having different colors, use NOCYCLEATTRS on the PROC SGPLOT statement.

Here's an example:

```
proc sgplot data=sashelp.cars nocycleattrs noautolegend;
vbox mpg_city / category=origin connect=median;
vbox mpg_city / category=origin connect=mean NOFILL NOOUTLIERS;
run;
```