Butterfly charts are one of my favorite report objects in SAS Visual Analytics. I really like how they can compare a pair of measures side by side. Quite often, I try to use butterfly plots in new ways.
Classic butterfly charts are essentially two bar charts side by side. Because of this, you can really only get (at best) two pairs of measures on the plot for comparison. One example of this is from a previous post of mine where I used SAS Visual Analytics' Graph Builder to create a butterfly chart which compared two pairs of measures. I used SAS Visual Analytics Graph builder to compare two pairs of measures:
However if you want to compare MORE than two pairs of measures, the classic bar chart butterfly chart will not work due to space on the graph. You can only fit so many bars at each position on the y axis. The solution? Create a butterfly chart, but use lines instead of bars.
To demonstrate this, I'm going to use data I retrieved from the Bureau of Transportation Statistics' Airline On-Time Performance Data which contains four years of data (2014-2017) on departure and arrival flight delays (by at least 15 minutes) in the month of September. I would like to compare the total number of departure/arrival delays by hour of day.
Hence, I want to compare these following pairs of measures:
Wow! A total of four pairs measures. How can we create a butterfly chart to visualize this?
Pretty cool! We now have the benefit of using a butterfly chart to compare pairs of measures side by side, but additionally we can compare as many of these pairs as we want! We can easily see how the amount of delays has decreased over the past four years.
So the question is, how did I do this? I'll show you!
SAS Visual Analytics 8.3 brings with it the return of the SAS® Graph Builder. This is great news to report developers who used the SAS® Graph Builder in the Visual Analytics 7.x series. So in order to create the graph above, I built a custom graph in Visual Analytics 8.3. The graph consists of two vector plots side by side. These are pictured below, one in Orange and one in Purple.
For the Delayed Arrivals (Orange) Vector Plot, the roles are named:
For the Delayed Departures (Purple) Vector Plot the roles are named:
There are three Shared Data Roles:
These roles are shown after being constructed in the Graph Builder.
Some modifications have also been made to the formatting of both the x and y axis. You can explore these by importing the attached JSON file and looking in the "Options panel" in the Graph Builder.
Now that our graph is built, we need to make sure our data will work with the vector plot. Vector plots can pretty much 'draw' anything because they allow the report builder to define where the vector lines start and end. Since this is the case, the report designer needs to create the origin values for the vector plots for the source data. Let's take a look at a few lines of the source data:
Above we can see the original data we have for our plot. From this we could create a simple line plot. However since vector plots need origin values, we are going to need to use the lag function. So we submit the SAS Code below to create the origin values for Hour_of_Day, Departure_Delays and Arrival_Delays.
data create_origins;
set source_data;
by year;
Hour_of_Day_Origin=lag(Hour_of_Day);
if first.year then Hour_of_Day_Origin=.;
Arrival_Delays_Origin=lag(Arrival_Delays);
if first.year then Arrival_Delays_Origin=.;
Departure_Delays_Origin=lag(Departure_Delays);
if first.year then Departure_Delays_Origin=.;
run;
The above code creates three new variables:
Unless it is the first observation of the year, each one of these values is the previous observation value (respectively). Now all we have to do is format our data and remove the created missing values by submitting the following code:
proc sql;
create table flight_delays as select
Year,
Hour_of_Day_Origin label="Hour of Day Origin",
Hour_of_Day label="Hour of Day",
Arrival_Delays_Origin label="Delayed Arrivals Origin" format=comma8.0,
Arrival_Delays label="Delayed Arrivals" format=comma8.0,
Departure_Delays_Origin label="Delayed Departures Origin" format=comma8.0,
Departure_Delays label="Delayed Departures" format=comma8.0
from create_origins
where Hour_of_Day_Origin ne .;
quit;
proc sql;
create table flight_delays as select
Year,
Hour_of_Day_Origin label="Hour of Day Origin",
Hour_of_Day label="Hour of Day",
Arrival_Delays_Origin label="Delayed Arrivals Origin" format=comma8.0,
Arrival_Delays label="Delayed Arrivals" format=comma8.0,
Departure_Delays_Origin label="Delayed Departures Origin" format=comma8.0,
Departure_Delays label="Delayed Departures" format=comma8.0
from create_origins
where Hour_of_Day_Origin ne .;
quit;
After submitting this code we can see our resulting table:
Now all the pieces for our vector plot are in place! We have both the origin (or starting) values and the ending values on the same line for Hour_of_Day, Departure_Delays and Arrival_Delays.
All that's left is for us is to import the custom graph to a Visual Analytics report and apply our data to the appropriate roles. This is shown below:
Since our "Year" variable is a 'color' role, a legend is not generated by default with the vector plot. So in order to create a legend, I added a heat map to the bottom of the report with the variable using "Year" for the Tile Role, and Frequency for the Size. I also created a report-level Display rule so the colors between the legend (heat map) and vector plots will match.
This example was created in SAS Visual Analytics 8.3. The data from the images above can be obtained from Bureau of Transportation Statistics' Airline On-Time Performance Data. However, if you do not wish to extract the data form the Airline On-Time Performance Data, there is some simulated data attached to this post which can also work in this example.
The attachments for this post are:
Import the data on your SAS Visual Analytics instance. Import the report via the "Import via GUI" section of these instructions.
Why is 2014 so different? Was there some difference in volume, or was there a regulation change?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.