Being a sports fan, I love looking at data about rankings. Almost every week, I am looking through the The Top 25 AP Poll (a poll which provides weekly rankings of the top 25 NCAA teams in one of three Division I college sports) to see how my favorite teams are doing.
I'm particularly a fan of this type of graph which visually displays college football teams rankings on a week-by-week basis. As a matter of fact, I like these ranking visualizations so much, I decided to build my own version using the SAS® Graph Builder in Visual Analytics!
First I needed some data for my graph, and since Thanksgiving is right around the corner I decided to use data from this Curriculum Pathways Data Depot resource which examines the number, pounds, and value of turkey in selected states in the U.S. from 1998 to 2014. Using this data source and a SAS custom graph, I was able to create the same type of 'rankings visualization' that my favorite sports websites use:
The yearly rankings graph above was actually quite easy to build. In this post, I'll show you how I constructed it!
Start by creating a new custom graph using the graph builder. First, drag a Series Plot from the elements menu onto the graphics pane. Then drag a schedule chart on top of the series plot you just added (*note, these two steps must be done in this order).
Next go to the "Roles" menu on the right side of your screen. Add the following Roles:
Next change the schedule chart's "Start" role to be a Datetime:
Next, click the three dots next to the "Schedule Chart 1 Group" role and select "Create shared role" -> "Series Plot 1 Group".
Name this new shared role "Group".
Next, under the "Shared Roles" section click the three dots next to "Shared Role 1".
Rename this role "Rank".
Next, from the "Options" menu select the drop down and choose the "A1" section:
From the "Overlaid Graph Colors:" section choose "Start each graph with its first default color"
Now from the top drop down under the options menu, select "Schedule Chart 1" and change the spacing to be 50%.
Finally, from the top drop down under the options menu, select "X Axis". UNCHECK the "Axis Label" checkbox.
And you're done building your custom graph! Your completed roles should look like this:
Save your custom graph and give it a name.
Again, for this post's data I used this Curriculum Pathways Data Depot resource which examines the number, pounds, and value of turkey in selected states in the U.S. from 1998 to 2014. Even though all this information is included in the data source, we will only be needing the variables:
In order to get the data to repeat in our schedule plot, we need to first flip the data using some data manipulation techniques (the code to do this is included in this post). After flipping the data set, the data looks like this:
Next we need to rank our data based on the "value" column. Fortunately we can use PROC RANK to do so!
proc rank data=flipped_data out=ranked ties=low descending; by Year; var Value; ranks Rank_n; run;
Now that our data has been ranked, we can do the final additions to make our graph look the way we want. Since we are using a schedule chart, we have to define a 'start' and 'end' for the time periods in our data. First, we will create a start by converting the 'year' column into a datetime, then we will use the intnx function to calculate the end of the year. Also we will create a character version of the 'rank_n' variable proc rank gave us. This will be used as the 'task' on our schedule chart. Also, for this post, I'm only interested in the values with rank of 6 or less, so a where clause is used to get this subset.
length Rank $ 4;
Rank = put(Rank_n,z2.);
Period_start = input(compress('01jan' || year),date9.);
Period_finish = intnx('Year',Period_start,1,'same');
format Period_start year4.;
format Period_finish year4.;
length rank $ 25;
where rank_n le 6;
The other plot we have in our custom graph is a series plot. Similar to the schedule chart, the series plot has to have two points from which to draw the series line. However the series plot's values cannot be on the same observation line. Hence we will use a do loop to create a variable called seriesx. This will have beginning and end values for the series plot.
data series_plot_prep; set schedule_chart_prep; do i=1 to 2; if i = 1 then seriesx = Period_start; else if i = 2 then seriesx = Period_finish; format seriesx year4.; output; end; run;
Finally there is the question of adding the labels to the graph. Schedule charts place labels at the end of their 'periods'. However we want our labels to be inside the schedule's blocks. So to do this, we will add a second series of schedule data with the period_start being the same value as the period_finish (this is similar to using an attribute dataset in SAS Graph). This will place the label inside the block. Then we will construct the label variable itself. We do this by concatenating the observation's rank, state's FULL name (borrowed from the maps.us2 dataset), and the value its rank was calculated on (in this case the amount of turkey produced). The code to do this is below:
proc sql; create table add_labels as select t1.Rank, t1.Period_start, t1.Period_start as Period_finish, catx(': ',catx(' - ',t1.Rank,t2.statename),put(t1.value,comma9.0)) as label, t2.statename from series_plot_prep as t1 left join maps.us2 as t2 on (t1.state=t2.statecode) where t1.i = 1; quit; data ranks_ready; set series_plot_prep add_labels; run;
And that's it for the data preparation stage. The complete end-to-end code to create the 'ranks_ready' data set is attached to this post.
Now the fun part: building the report! All that's left for us is to import the custom graph to a Visual Analytics report.
Add the data set 'ranks_ready' to your report and fill out the roles as shown:
After these assignments your report will display as below! However you might notice that the ranks are not in the correct order:
Right click on the "Ranks" label on the y-axis and choose Sort -> Rank: Descending:
And now your report is displaying correctly! For my final output, I abbreviated some of the labels to shorten their length. I also gave my report's tab an appropriate title. Notice how it not only graphically displays the rankings of the U.S. states, but the labels we added include supplemental information on the their rank, full name and (most importantly) the VALUE that the states are ranked on.
This example was created in SAS Visual Analytics 8.3. Again, for this post's data, I used the Curriculum Pathways Data Depot resource.
The attachments for this post are:
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.