BookmarkSubscribeRSS Feed

How to create a butterfly line chart in SAS Visual Analytics

Started ‎09-10-2018 by
Modified ‎09-10-2018 by
Views 2,657

Butterfly charts are one of my favorite report objects in SAS Visual Analytics.  I really like how they can compare a pair of measures side by side.  Quite often, I try to use butterfly plots in new ways. 

 

Problem: We want to compare more than two pairs of values on a butterfly plot.

 

Classic butterfly charts are essentially two bar charts side by side.  Because of this, you can really only get (at best) two pairs of measures on the plot for comparison.  One example of this is from a previous post of mine where I used SAS Visual Analytics' Graph Builder to create a butterfly chart which compared two pairs of measures.  I used SAS Visual Analytics Graph builder to compare two pairs of measures:

 

  • New Cases (Male) vs. New Cases (Female)
  • Deaths (Male) vs. Deaths (Female)

However if you want to compare MORE than two pairs of measures, the classic bar chart butterfly chart will not work due to space on the graph.  You can only fit so many bars at each position on the y axis.  The solution?  Create a butterfly chart, but use lines instead of bars.

 

To demonstrate this, I'm going to use data I retrieved from the Bureau of Transportation Statistics' Airline On-Time Performance Data which contains four years of data (2014-2017) on departure and arrival flight delays (by at least 15 minutes) in the month of September.  I would like to compare the total number of departure/arrival delays by hour of day.

 

Hence, I want to compare these following pairs of measures:

 

  • 2014 Arrivals vs. 2014 Departures
  • 2015 Arrivals vs. 2015 Departures
  • 2016 Arrivals vs. 2016 Departures
  • 2017 Arrivals vs. 2017 Departures

Wow!  A total of four pairs measures.  How can we create a butterfly chart to visualize this?

 

Solution: Create a butterfly chart using lines!

 

 07.png

 

Pretty cool!  We now have the benefit of using a butterfly chart to compare pairs of measures side by side, but additionally we can compare as many of these pairs as we want!  We can easily see how the amount of delays has decreased over the past four years. 

 

So the question is, how did I do this?  I'll show you!

 

Step 1: Build the graph

 

SAS Visual Analytics 8.3 brings with it the return of the SAS® Graph Builder.  This is great news to report developers who used the SAS® Graph Builder in the Visual Analytics 7.x series.  So in order to create the graph above, I built a custom graph in Visual Analytics 8.3.  The graph consists of two vector plots side by side.  These are pictured below, one in Orange and one in Purple.

 

02.png

 

For the Delayed Arrivals (Orange) Vector Plot, the roles are named:

 

  • X = Delayed Arrivals
  • X Origin = Delayed Arrivals Origin

For the Delayed Departures (Purple) Vector Plot the roles are named:

 

  • X = Delayed Departures
  • X Origin = Delayed Departures Origin

There are three Shared Data Roles:

 

  • Y = Hour of Day
  • Y Origin = Hour of Day Origin
  • Color = Year

These roles are shown after being constructed in the Graph Builder.

 

03.png

 

Some modifications have also been made to the formatting of both the x and y axis.  You can explore these by importing the attached JSON file and looking in the "Options panel" in the Graph Builder.

 

Step 2: Prepare the data

 

Now that our graph is built, we need to make sure our data will work with the vector plot.  Vector plots can pretty much 'draw' anything because they allow the report builder to define where the vector lines start and end.  Since this is the case, the report designer needs to create the origin values for the vector plots for the source data.  Let's take a look at a few lines of the source data:

 

04.png

 

Above we can see the original data we have for our plot.  From this we could create a simple line plot.  However since vector plots need origin values, we are going to need to use the lag function.  So we submit the SAS Code below to create the origin values for Hour_of_Day, Departure_Delays and Arrival_Delays.

 

data create_origins;
  set source_data;
  by year;

  Hour_of_Day_Origin=lag(Hour_of_Day);
  if first.year then Hour_of_Day_Origin=.;
 
  Arrival_Delays_Origin=lag(Arrival_Delays);
  if first.year then Arrival_Delays_Origin=.;
 
  Departure_Delays_Origin=lag(Departure_Delays);
  if first.year then Departure_Delays_Origin=.;

run;

 

The above code creates three new variables:

 

  • Hour_of_Day_Origin
  • Arrival_Delays_Origin
  • Departure_Delays_Origin

Unless it is the first observation of the year, each one of these values is the previous observation value (respectively).  Now all we have to do is format our data and remove the created missing values by submitting the following code:

 

proc sql;
create table flight_delays as select
Year,
Hour_of_Day_Origin label="Hour of Day Origin",
Hour_of_Day label="Hour of Day",
Arrival_Delays_Origin label="Delayed Arrivals Origin" format=comma8.0,
Arrival_Delays label="Delayed Arrivals" format=comma8.0,
Departure_Delays_Origin label="Delayed Departures  Origin" format=comma8.0,
Departure_Delays label="Delayed Departures" format=comma8.0
from create_origins
where Hour_of_Day_Origin ne .;
quit;
proc sql;
create table flight_delays as select
Year,
Hour_of_Day_Origin label="Hour of Day Origin",
Hour_of_Day label="Hour of Day",
Arrival_Delays_Origin label="Delayed Arrivals Origin" format=comma8.0,
Arrival_Delays label="Delayed Arrivals" format=comma8.0,
Departure_Delays_Origin label="Delayed Departures Origin" format=comma8.0,
Departure_Delays label="Delayed Departures" format=comma8.0
from create_origins
where Hour_of_Day_Origin ne .;
quit;

 

After submitting this code we can see our resulting table:

05.png

 

Now all the pieces for our vector plot are in place!   We have both the origin (or starting) values and the ending values on the same line for Hour_of_Day, Departure_Delays and Arrival_Delays.

 

Step 3: Build the report

 

All that's left is for us is to import the custom graph to a Visual Analytics report and apply our data to the appropriate roles.  This is shown below:

 

08.png

 

 

Since our "Year" variable is a 'color' role, a legend is not generated by default with the vector plot.  So in order to create a legend, I added a heat map to the bottom of the report with the variable using "Year" for the Tile Role, and Frequency for the Size.  I also created a report-level Display rule so the colors between the legend (heat map) and vector plots will match. 

 

How to make this example work for you

This example was created in SAS Visual Analytics 8.3.  The data from the images above can be obtained from Bureau of Transportation Statistics' Airline On-Time Performance Data.  However, if you do not wish to extract the data form the Airline On-Time Performance Data, there is some simulated data attached to this post which can also work in this example.

 

The attachments for this post are:

  • The code to create the sample data and the dataset containing the origins - Create_sample_data_and_Origins.sas
  • The completed sample dataset containing the origins- flight_delays.sas7bdat
  • A JSON file containing the custom graph - Vertical_Butterfly_Line.json
  • A JSON file containing the completed report - Flight_Delays_By_Hour.json

Import the data on your SAS Visual Analytics instance.  Import the report via the "Import via GUI" section of these instructions.  

 

Comments

Why is 2014 so different? Was there some difference in volume, or was there a regulation change?

Version history
Last update:
‎09-10-2018 09:48 AM
Updated by:
Contributors

sas-innovate-2024.png

📢

ANNOUNCEMENT

The early bird rate has been extended! Register by March 18 for just $695 - $100 off the standard rate.

 

Check out the agenda and get ready for a jam-packed event featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events. 

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags