Now that spring is upon us, one thought comes to my mind. When will it be warm enough to go swimming?! As I've mentioned before, when the weather gets warm, I love being outdoors enjoying activities that have to do with being on the water.
I recently ran across this article which contains a graph showing various air temperature distributions by month. I like this graph quite a bit! Not only do the filled in line plots show the differences in the monthly temperature distributions, but the the addition of the median lines really make the differences in temperatures stand out. More so, I'm a big fan of the median labels in the white/against black text. This really makes that one specific data point jump out at the report consumer. So how do we create a similar graph that can be used in SAS Visual Analytics? Turns out the answer is quite simple! Using the SAS Graph Builder, I was able to create the following graph:
The report above uses data from the USGS Water Services website (Credit: U.S. Geological Survey - Department of the Interior/USGS). Specifically, I downloaded data from a site located right outside of Tampa, FL (site number 0230602). You can see from the plot that median temperature in March is a chilly 71 degrees and the range of temperatures in March is quite wide.
Using this report, it's very easy to see which months of the year are best for spending time on the water. Pretty cool! In this post I'll show you how to build this type of graph in SAS Visual Analytics.
Start by creating a new custom graph using the SAS® Graph Builder. Drag the following objects onto the canvas in the following order:
Next select the options menu on the left and from the drop down select "Vector Plot 1" and clear the "Show arrowheads" checkbox:
Still within the options menu, select "Series Plot 1" and select the "Break on missing values" option:
We will also need to make some axis edits. Select "X Axis" from the options menu drop down and clear all the check boxes EXCEPT for the "Tick Values" option. Also set the "Grid lines" option to be "Off":
Next, select "Y Axis" from the options menu drop down and select "Off" for the Grid lines option. Also clear all the check boxes in this menu.
Now that our graph options are set, we need to make some edits in the "Roles" menu. Select this menu and start by adding a new "Data Driven Lattice Role":
At the next window, keep all default values and press "OK".
Next, under the roles for "Vector Plot 1" click the three dots next to the "Vector Plot 1 X Origin" and choose "Use Shared Role" -> "Shared Role 1":
Still under the roles for "Vector Plot 1" click the three dots next to the "Vector Plot 1 X Origin" and choose "Create shared role" -> "Band Plot 1 Lower Limit":
At the next window name this new role "Baseline" and press "OK":
Under the "Series Plot 1" roles select "Add Role":
In the next window choose the role type: "Data Label" and name the role "Label":
And you're done! You've successfully built the custom graph. Save your graph and give it a name.
As I mentioned before, the source data for the report shown at the top of this post can be obtained from the USGS Water Services' website (Credit: U.S. Geological Survey - Department of the Interior/USGS). However if you do not wish to download data from the USGS Water Services' website there is some simulated data attached to this post which can also work in this example.
After importing the downloaded data into SAS it looks like this:
The first step is to create a variable that has the month name in it. This is completed using the following code:
data add_month; set water_data_imp; month = date; format month monname3.; run;
You might have noticed when you first saw the graph that the line plots look similar to histogram plots. Meaning that the plot shows us how often each data point occurs in the data set. In order to get this information from our source data, we can use PROC UNIVARIATE to create several histogram by temperature (one for each month). After which, we can borrow the histogram's output data set and calculate the highest count for all of these distributions. We will then save that value to the macro variable maxcount. The code to do this is below:
proc univariate data=add_month noprint; class month; histogram Water_Temp / nrows = 12 outhist=MidPtOut; run;
max(_COUNT_) into :maxcount
The output data set "MidPtOut" contains the distributions of temperature for each month. Which is the same data that is used to create the histograms in PROC UNIVARIATE. This is all that is needed to create the line plot section of our custom graph. The next piece of the puzzle is to create the medians for each month. These median values will be where on the x axis the 'median line' is to be plotted. However we will also need to include median lines to stretch from the bottom to the top of the y axis. Hence we will use the 'maxcount' macro variable we calculated earlier. Finally, we will need to add the label to print the actual median value on the graph. This is done by creating a variable named 'label' and placing the character representation of the median value in it. As far as the location of the label goes, we want it to be next to the x axis, so we create a variable called 'series_plot_y' and give it a value of zero. The code to do this is below:
proc sql; create table get_medians as select put(date,monname3.) as month, median(water_temp) as median_water_Temp from add_month group by calculated month; quit; data add_label_to_medians; set get_medians; Vector_Y = &maxcount; series_plot_y = 0; label = put(median_water_Temp,comma2.0); rename median_water_Temp = _MIDPT_; run;
The final step of the data preparation is to append the add_label_to_medians to the MidPtOut data set. Since we want the x axis of our graph to be the base for plotting the midpoints from the PROC UNIVARIATE procedure and the medians lines themselves, we renamed the median_water_temp variable to be _MIDPT_ in the previous step. Additionally, since we are using a band plot for the background of our graph, I've added a 'zero' variable to be the 'lower bound' for the plot. Now we can successfully do the append and apply report appropriate variable labels to the output data set:
data water_temps_graph; set MidPtOut add_label_to_medians; zero = 0; run; proc sql; create table water_temps_report as select month label="Month", _MIDPT_ label="Temperatures", _count_ label="Temperatures Frequency", Vector_Y, series_plot_y, Label, zero from water_temps_graph; quit;
Ready to build the water temperature graph report? Great! All that's left for us is to import the custom graph to a new SAS Visual Analytics report.
Add the data set 'water_temp_graph' to your report and apply the roles as shown:
Now our plot is report is starting to come together! However we need to add some global color options to get the report to look right. To do this click the white space at the very top of the page (above the tab for the report) and select the options menu. This will allow you to change report-level properties. From there change the first three colors of the "Fill" and "Line/Marker"color palettes. Make the first color black and the second and third colors white:
Data labels in series plots custom graphs are typically placed to the upper right of the point they are representing. Depending on the distribution of your data, your label values might extend beyond the top arc of the line plot. If this happens, you can adjust the overall height using of your report within the "Set fixed report size" menu within the global report options section. In the screenshot below I've set my report height to be 900. But your report's height might need to be set at a different value.
The last step is to make our graph thinner so it looks more like the example. This is easily done by selecting the graph and looking in it's options menu. Click the "Specify width" checkbox and set it to 35%. Also uncheck the "Extend width if available" and "Shrink width if necessary" options. A
Now our report is complete! Data labels in series plots custom graphs are Give your tab an appropriate name and save your report!
This example was created in SAS Visual Analytics 8.3. The data from the report above can be obtained from the USGS Water Services' website (Credit: U.S. Geological Survey - Department of the Interior/USGS). However, if you do not wish to extract the data from the USGS Water Services' website, there is some simulated data in our Visual Analytics Custom Graphs GitHub which can also work in this example.
On Github, you will find the following support files for this article:
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.