Comparing trends between two different time periods is one of my favorite types of data visualizations. One of the best examples of such a visualization can be found in the Strava Mobile app's "Monthly Stats" line chart. An example of what this chart looks like can be found here.
I love this type of trend comparison chart. It contains several elements which allow the user to easily see how their workouts statistics are trending in comparison to the previous month. The chart contains two filled line plots: a light grey line for last month and a dark black line for the current month. There is a reference line (with a label) which displays the total number of activities for the previous month and a red dot which represents the total number of activities in the current month (so far).
As typical, my next question was: can I build something similar to this in SAS Visual Analytics? And the answer is yes! In just a short time I was able to create a custom graph and build the following report which compares trends in total precipitation between two different time periods:
The report above uses data from the USGS Water Services website (Credit: U.S. Geological Survey - Department of the Interior/USGS). Specifically, I am using data from Site:02087182 (also known as Falls Lake, NC).
The lighter colored line represents the cumulative precipitation total for the first quarter (Q1) of last year while the darker colored line represents the total for the first quarter (so far) for this year. I've placed labels on the graph using the same method I referenced in my previous post. However notice how I was able to get the reference line to be a dashed line. This helps the report consumer know that this is only a reference line and not a line representing actual data.
Using this report, it's very easy to see how this year's precipitation totals are trending in comparison to last years. Pretty cool! And the bonus is, in this post I'll show you how to build this type of graph in SAS Visual Analytics!
Start by creating a new custom graph using the SAS® Graph Builder. Drag the following objects onto the canvas in the following order:
Next select the options menu on the left and from the drop down select "A1" and from the "Overlaid Graph Colors" select "Start Each Graph with its first default color".
Still within the options menu, select "Line Chart 2" and select the "Overlay filled" option from the "Grouping Style" drop down menu:
While still within the options menu, select "Line Chart 1" and select the "Overlay filled" option from the "Grouping Style" drop down menu. However also move the Transparency slider to 60%:
The last setting that we need to change within the options menu is to select "Left Y Axis" and deselect the "Axis label" options:
The next thing to do is to edit our roles. From the left sided menu switch over to the "Roles" menu and for the following graph elements add these roles:
We will also need to make an edit to the "Series Plot Y" role for the "Series Plot 1" graph element. Under the "Series Plot 1" section click the three dots next to the role "Series Plot Y" to edit it. After which, click the "Allow multiple data assignments" checkbox:
And you're done! You've successfully built the custom graph. Save your graph and give it a name.
As I mentioned before, the source data for the report shown at the top of this post can be obtained from the USGS Water Services' website (Credit: U.S. Geological Survey - Department of the Interior/USGS). However if you do not wish to download data from the USGS Water Services' website there is some simulated data attached to this post which can also work in this example.
After importing the downloaded data into SAS it looks like this:
The first step is to subset our data by the first quarter and create variables for the 'year' and 'day'. After this we will need to create a running total variable which holds the cumulative precipitation within each year. The code to do this is below:
/*Make Year, date and season columns*/ data create_vars; set src.source_precip_data; year = put(year(date),best4.); day = put(date-intnx('year',date,0),z2.); if year = year(today()) then Season = "This Year"; else Season = "Last Year";
/*Get just the first quarter*/ where date-intnx('year',date,0) le 89; run; /*Create running total*/ data running_total; set create_vars; by year; if first.year then Running_total=0; Running_total + Precip; run;
The next step is for us to format the data into a report ready format. You may remember that we used two different line plots when building our custom graph. So hence we need to split "Last Year" and "This Year"s values into two data sets. After we've done that we will get the max for last year's cumulative total and merge it to be along all of last year's data. We will also create labels for: last year's total. We will also make a new variable using the value for the last observation for the current year and create an associated label with it. Finally we will add variables to indicate the display rules needed to get the proper colors on our graph. After doing this, we will join our two data sets together and also includes a 'dummy' variable named series_plot_value. The code to do all this is below:
/* Get Max from last year */ proc sql; create table Last_year as select *, max(Running_total) as Last_Years_total from running_total where Season = "Last Year"; quit; /* create label for Max Value from last year */ data Last_year; set Last_year; by season; if first.Season then Last_Year_Total_label = "Total Last Year: " || compress(put(Last_Years_total,comma8.2)); run; /* Create values for the "Current Bubble" */ data This_year; set running_total; by Season; if last.season then Bubble_Value = Running_total; if last.season then Bubble_Label = "Current Total: " || Compress(put(Running_total,comma8.2)); if last.season then Bubble_Color = "Current Value"; where Season = "This Year"; run; /* Join together Last_Year and This_Year data */ proc sql; create table report_ready as select t1.Day label="Day of Year", t1.Running_total as Last_Year format=comma8.2, t1.Last_Years_total format=comma8.2, t1.Last_Year_Total_label, t2.Running_total as This_year format=comma8.2, t2.Bubble_Value as Bubble_Value, t2.Bubble_Label, t2.Bubble_Color, . as series_plot_value from Last_Year as t1 full join This_year as t2 on (t1.Day = t2.Day); quit;
And that's it! When you're done, the outputted data will look like this:
Now the fun part: building the report! All that's left for us is to import the custom graph to a Visual Analytics report.
Add the data set 'report_ready' to your report and apply the roles as shown:
You will also need to adjust a property at the report level. To do this click the white space at the very top of the page (above the tab for the report) and select the options menu. This will allow you to change report-level properties. From here find the "Data elements style rotation" drop down and select "Rotate All Attributes".
The reason we are doing this is so when we drag a second variable into the "Series Plot 1 Y" role, it will appear as a dashed line rather than an alternate color. However, SAS Visual Analytics will always place a solid line as the representation of the first variable in the role. Therefore we need our first role to be 'missing' (which is why we created the dummy column 'series_plot_value' in our data).
The last thing we need to do is add a display rule so "current" bubble's label is red and the total from last year is black. Also, you will want to give your report tab a title. The display rules and titles I used are included in the report json file that is attached to this post.
Congrats! You've rebuilt the Yearly Trend Comparison Report! Below is an image of what the custom graph will look like using the simulated data attached to this post:
This example was created in SAS Visual Analytics 8.3. The data from the report above can be obtained from the USGS Water Services' website (Credit: U.S. Geological Survey - Department of the Interior/USGS). However, if you do not wish to extract the data from the USGS Water Services' website, there is some simulated data attached to this post which can also work in this example.
The attachments for this post are:
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.