BookmarkSubscribeRSS Feed

Create a Bar Chart Race in SAS using SGPLOT Procedure

Started ‎05-05-2020 by
Modified ‎05-06-2020 by
Views 5,936

Racing Bar Charts are mesmerizing to watch. No wonder it is one of the new trends in the field of data analytics to display data that spans across time. I saw a few discussions online discussing the possibility of creating a Racing Bar Chart comparison using SAS. Hence, I thought of creating the same. I used SAS Macro, SAS SGPLOT and a few data step tricks to create this stunning racing bar chart in SAS. Below I am listing the steps to accomplish the same using SAS.

 

The following racing bar chart (full video link: https://youtu.be/OhSL7dxLAlI) of Worldwide COVID-19 Positive Cases is created in SAS:
output.gif

 

Part 1: Prepare the Data

To create a racing bar chart, you would require data in a defined structure. It should contain cumulative values in dates which are in the form of a column.

 

I have the following data source which I’ve converted to my desired data using PROC Transpose procedure:

 

prepare_data.png

 

 

Here is the code that will transpose the data:

proc transpose data = data.worldwide_covid19_cases out=worldwide_covid19_cases prefix=D;
by location iso_code;
id date;
idlabel date;
var total_cases ;
run;

 

Part 2: Include all country’s flags in the graph using Scatter Plot trick

As you can see in the graph, country flags are also displayed next to the country name on yaxis, there’s no standard procedure to display the images next to the yaxis data values (Kindly share any other idea to achieve the same), hence, I have used the symbolimage option along with dattrmap option in the SGPLOT. One another challenge for using symbolimage was to integrate the Country name’s value with the flag path. Also, symbolimage syntax should be generated for all country’s flags. This would ensure that any country that pops up randomly in the top list, should have the country’s image along with the name.


To solve this issue, I have used following code for country_codes (it contains country name, ISO 3 digit as well as 2 digit codes) table:

data country_codes;
set data.country_codes;
length cnt_nospace $50. imageloc $250.;
cnt_nospace=compress(name,"`~!@#$%^&*()-_=+\|[]{};:',.<>?/ " ); /* many country's name contains special characters, which is required to bbe compressed for annotation values in SGPLOT */
imageloc=cat("symbolimage name=", cnt_nospace ,"image='D:\My Research\Covid-19\Data\free-flag-icons-96px\96px\",alpha_2, ".png'","  / HOFFSET=-0.5  ;"); /* imageloc statement creates the syntax for all country's flag image path reference in SGPLOT */
run;

Part 3: SGPLOT Procedure to create the following graph

Two PROC SGPLOT procedures have been used to create the Graph,

  1. To create a band graph
  2. To create a hbar graph

1. Band Graph

Following SGPLOT procedure has been used to create the Band graph from the data:

Proc sgplot data=data.worldwide_covid19_cases noautolegend noborder;
band x=date lower=0 upper=total_cases  ;
where date<="&currentdate"d and location="World" ;
xaxis label=" "   values=("&seriesdate1"d to "&lastdate"d ) display=none fitpolicy=none;
yaxis  label=" " values=(0 to 3500000 by 100000) display=none  fitpolicy=none;
run;

As you can see from the above code, &currentdate macro has been used to generate images of this graph for each date’s value through the macro definition, and generated graph images for each date, all those images were incorporated into hbar graph’s image for each date. Following are the examples of graph images generated by this SGPLOT procedure in the macro loop:

 

series_plots.png

 

2. Hbar Graph

This is the main SGPLOT Procedure which generates thousands of frames for smooth transition to display racing bar chart from first day to the last date, following code has been used inside the macro definition to generate Hbar graph:

proc SGPLOT data = hbar_data_worldwide NOAUTOLEGEND noborder  sganno=anno dattrmap=hbarattrmap ; /* "anno" dataset adds the band chart image in the hbar chart, "hbarattrmap" conaints information for country's flag values and bar colors for each countries */
	format &&DATES&i comma14.0 ;
hbarparm category=location  response= &&DATES&i / attrid=myid barwidth=0.85 group=location groupdisplay=cluster datalabel DATASKIN=MATTE datalabel datalabelattrs=(color="#343d46" size=10pt ); /* This step creates horizontal bar charts in the graph */
	inset (	
	"Total Cases:" = "  &TOTALCNT."
	"Date:" = "   &&datevalue&i")
	/ textattrs=( color=BLACK weight=normal size=15) position=bottomright;
	xaxis label=" " valueattrs=(color="#343d46" size=11pt) fitpolicy=none values=(0 to %sysevalf(&highest_cases.*1.3) by 2000)  ;
	yaxis label=" " valueattrs=(color="#343d46" size=11pt) display=(noline) fitpolicy=none discreteorder=data ;
		%do s=1 %to 15; /*This loop adds country's flag image's path reference for top 15 records */
		&&image&s
		%end;
		symbolimage name=International image='D:\My Research\Covid-19\Data\free-flag-icons-96px\96px\ZZ.png' / HOFFSET=-0.5 ; /* This command adds image reference for Internation location cases */
scatter X=id Y= location /attrid=myid markerattrs=(size=60) group=location  dataskin=matte; /* This plot adds country flags in the graph */
	where top_count<15 and  location~="World";
	run;

Here is the detailed explanation of each option used and its effect on the graph:

sgplot_hbar_explanation.png

 

Part 4: Create intermediate graph frames for a smoother transition

Few data step logics have been used to create intermediate frames. These data steps have been incorporated in the macro definition which generates Hbar graphs. A number of intermediate graph frames can be controlled via “intermediate_frames” macro value while executing macro definition, following data step code generates intermediate frame data in the macro loop:

data get_difference; 
		set hbar_data_worldwide (keep= location &next_date &&DATES&i cnt_nospace imageloc fillcolor);
		diff=&next_date-&&DATES&i;
		cumulative=diff / &intermediate_frames;
		run;
		
		/*This table will be used by sgplot which has cumulative data for intermediate frames */
		data intermediate_data;
		set get_difference;
		INTPL_&&DATES&i=&&DATES&i+cumulative;
		run;

First data step, creates the cumulative number which will be added each time in intermediate frame’s data values, and second data step adds that cumulative value in data set, this intermediate_data will be used for hbar graph’s SGPLOT procedure for intermediate frame generation, the more “intermediate_frames” value we give, the more intermediate frames data it will generate, which makes a smoother animation in video or gif.

 

Part 5: Convert PNG Sequence images to the video/GIF

At last, we will get a large number of images, which are generated in sequence numbers, we can convert this images into a GIF animation by using SAS itself, one article from @Jay54 explains about creating animation in SAS using ODS Graphics and printerpath options:https://blogs.sas.com/content/graphicallyspeaking/2013/05/23/animation-using-sgplot/

 

In my case, I’ve used an opensource video editing tools FFmpeg, which converts all PNG sequence files to mp4 or any other video format with good compression ration and desired frame rate, I’ve attached code for FFmpeg in the attachment along with all codes, datasets and flag images.

 

Source References used in this work.

Data Source: https://github.com/owid/covid-19-data

All Country’s flags images: https://blogging.com/free-flag-icons/

Comments

Very cool. Saw your post on LinkedIn earlier today, and was curious how you did it.  These animated graphs are a great tool for engaging data story-telling. Thanks for this comprehensive article!

Wonderful Blog... Nicely explained the entire process .. animated graphs with the images of countries is unique,

Good research, during this covid-19 situation these types of visualisation are very help full for the government.. really nice work done.. keep it up..

,

Very beautiful animation! Well done and well explained. Thank you for sharing it.

 

One feature I miss in these animations is a horizontal timeline slider where I could advance time at my own speed, moving it back and force and stop at the time of interest. Is it doable?

@Quentin & @Jatin_Jim  Thank you for your feedback, glad you like it 🙂

@LeonidBatkhan Thanks for your feedback,
I liked the idea of making this graph more interactive by adding Horizontal sliders for time control,
For GIF format it's not possible due to the GIF's own limitation, but if we talk about video format files, we can achieve the same via using Video Players as well as on online platforms like Youtube where you can adjust the speed of the video as well as move/pause the video with their sliders.
You can view the video format of this graph here: https://youtu.be/OhSL7dxLAlI

COOL !

And Calling @tc 

Version history
Last update:
‎05-06-2020 08:21 AM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags