BookmarkSubscribeRSS Feed

3 steps to building a monthly temperature comparison chart

Started ‎04-03-2019 by
Modified ‎04-01-2019 by
Views 2,198

Now that spring is upon us, one thought comes to my mind.  When will it be warm enough to go swimming?!  As I've mentioned before, when the weather gets warm, I love being outdoors enjoying activities that have to do with being on the water. 

 

I recently ran across this article which contains a graph showing various air temperature distributions by month.  I like this graph quite a bit!  Not only do the filled in line plots show the differences in the monthly temperature distributions, but the the addition of the median lines really make the differences in temperatures stand out.  More so, I'm a big fan of the median labels in the white/against black text.  This really makes that one specific data point jump out at the report consumer.  So how do we create a similar graph that can be used in SAS Visual Analytics?  Turns out the answer is quite simple!  Using the SAS Graph Builder, I was able to create the following graph:

 

 

01.png

 

The report above uses data from the USGS Water Services website (Credit: U.S. Geological Survey - Department of the Interior/USGS).  Specifically, I downloaded data from a site located right outside of Tampa, FL (site number 0230602). You can see from the plot that median temperature in March is a chilly 71 degrees and the range of temperatures in March is quite wide. 

 

Using this report, it's very easy to see which months of the year are best for spending time on the water.  Pretty cool!  In this post I'll show you how to build this type of graph in SAS Visual Analytics.

 

Step 1: Build the graph

 

Start by creating a new custom graph using the SAS® Graph Builder.  Drag the following objects onto the canvas in the following order:

  1. Band Plot
  2. Series Plot
  3. Vector Plot

 

Next select the options menu on the left and from the drop down select "Vector Plot 1" and clear the "Show arrowheads" checkbox:

 

B01.png

 

Still within the options menu, select "Series Plot 1" and select the "Break on missing values" option:

 

B02.png

 

We will also need to make some axis edits.  Select "X Axis" from the options menu drop down and clear all the check boxes EXCEPT for the "Tick Values" option.  Also set the "Grid lines" option to be "Off":

 

B08.png

 

Next, select "Y Axis" from the options menu drop down and select "Off" for the Grid lines option.  Also clear all the check boxes in this menu. 

 

Now that our graph options are set, we need to make some edits in the "Roles" menu.  Select this menu and start by adding a new "Data Driven Lattice Role":

 

B07.png

 

At the next window, keep all default values and press "OK".

 

Next, under the roles for "Vector Plot 1" click the three dots next to the "Vector Plot 1 X Origin" and choose "Use Shared Role" -> "Shared Role 1":

 

B04.png

 

Still under the roles for "Vector Plot 1" click the three dots next to the "Vector Plot 1 X Origin" and choose "Create shared role" -> "Band Plot 1 Lower Limit":

 

 

B09.png

 

At the next window name this new role "Baseline" and press "OK":

 

B10.png

 

Under the "Series Plot 1" roles select "Add Role":

B05.png

 

In the next window choose the role type: "Data Label" and name the role "Label":

 

B06.png

 

And you're done!  You've successfully built the custom graph.  Save your graph and give it a name.

 

Step 2: Prepare your data

 

As I mentioned before, the source data for the report shown at the top of this post can be obtained from the USGS Water Services' website (Credit: U.S. Geological Survey - Department of the Interior/USGS).  However if you do not wish to download data from the USGS Water Services' website there is some simulated data attached to this post which can also work in this example.

 

After importing the downloaded data into SAS it looks like this:

 

D01.png

The first step is to create a variable that has the month name in it.  This is completed using the following code:

 

data add_month;
set water_data_imp;
month = date;
format month monname3.;
run;

You might have noticed when you first saw the graph that the line plots look similar to histogram plots.  Meaning that the plot shows us how often each data point occurs in the data set.  In order to get this information from our source data, we can use PROC UNIVARIATE to create several histogram by temperature (one for each month).  After which, we can borrow the histogram's output data set and calculate the highest count for all of these distributions.  We will then save that value to the macro variable maxcount.  The code to do this is below:

 

proc univariate data=add_month noprint;
   class month;
   histogram Water_Temp / nrows = 12 outhist=MidPtOut;
run;

proc sql;
select
max(_COUNT_) into :maxcount
from MidPtOut;
quit;

 

The output data set "MidPtOut" contains the distributions of temperature for each month.  Which is the same data that is used to create the histograms in PROC UNIVARIATE.  This is all that is needed to create the line plot section of our custom graph.  The next piece of the puzzle is to create the medians for each month.  These median values will be where on the x axis the 'median line' is to be plotted.  However we will also need to include median lines to stretch from the bottom to the top of the y axis.  Hence we will use the 'maxcount' macro variable we calculated earlier.  Finally, we will need to add the label to print the actual median value on the graph.  This is done by creating a variable named 'label' and placing the character representation of the median value in it.  As far as the location of the label goes, we want it to be next to the x axis, so we create a variable called 'series_plot_y' and give it a value of zero.  The code to do this is below:

 

proc sql;
create table get_medians as select 
put(date,monname3.) as month,
median(water_temp) as median_water_Temp from add_month 
group by calculated month; 
quit;

data add_label_to_medians;
set get_medians;
Vector_Y = &maxcount;
series_plot_y = 0;
label = put(median_water_Temp,comma2.0);
rename median_water_Temp = _MIDPT_;
run;

 

The final step of the data preparation is to append the add_label_to_medians to the MidPtOut data set.  Since we want the x axis of our graph to be the base for plotting the midpoints from the PROC UNIVARIATE procedure and the medians lines themselves, we renamed the median_water_temp variable to be _MIDPT_ in the previous step.  Additionally, since we are using a band plot for the background of our graph, I've added a 'zero' variable to be the 'lower bound' for the plot.  Now we can successfully do the append and apply report appropriate variable labels to the output data set:

data water_temps_graph;
set MidPtOut add_label_to_medians;
zero = 0;
run;

proc sql;
create table water_temps_report
as select
month label="Month",
_MIDPT_ label="Temperatures",
_count_ label="Temperatures Frequency",
Vector_Y,
series_plot_y,
Label,
zero
from water_temps_graph;
quit;

 

Step 3: Build the report!

 

Ready to build the water temperature graph report?  Great!  All that's left for us is to import the custom graph to a new SAS Visual Analytics report. 

 

Add the data set 'water_temp_graph' to your report and apply the roles as shown:

 

R1.png

 

Now our plot is report is starting to come together!  However we need to add some global color options to get the report to look right.  To do this click the white space at the very top of the page (above the tab for the report) and select the options menu. This will allow you to change report-level properties.  From there change the first three colors of the "Fill" and "Line/Marker"color palettes.  Make the first color black and the second and third colors white:

 

R3.png

 

Data labels in series plots custom graphs are typically placed to the upper right of the point they are representing.  Depending on the distribution of your data, your label values might extend beyond the top arc of the line plot.  If this happens, you can adjust the overall height using of your report within the "Set fixed report size" menu within the global report options section.  In the screenshot below I've set my report height to be 900.  But your report's height might need to be set at a different value.

 

r6.png

 

The last step is to make our graph thinner so it looks more like the example.  This is easily done by selecting the graph and looking in it's options menu.  Click the "Specify width" checkbox and set it to 35%.  Also uncheck the "Extend width if available" and "Shrink width if necessary" options.  A

 

R4.png

 

 

 

 

Now our report is complete!  Data labels in series plots custom graphs are Give your tab an appropriate name and save your report!

 

01.png

 

How to make this example work for you

This example was created in SAS Visual Analytics 8.3.  The data from the report above can be obtained from the USGS Water Services' website (Credit: U.S. Geological Survey - Department of the Interior/USGS). However, if you do not wish to extract the data from the USGS Water Services' website, there is some simulated data in our Visual Analytics Custom Graphs GitHub which can also work in this example.

 

On Github, you will find the following support files for this article:

  • A simulated data set of water temperatures - simulated_water_temps.sas7bdat
  • The code to which creates the final water_temps_report data set - water_temps_report_ETL.sas
  • The completed output data set (sourced from the simulated data set) - water_temps_report.sas7bdat
  • A JSON file containing the completed custom graph - Water_Temp_CG.json
  • A JSON file containing the completed report - water_temps_report.json

 Take Me to GitHub!

 

Import the data on your SAS Visual Analytics instance.  Import the report via the "Import via GUI" section of these instructions.  

Comments

What a shapely graph. I like the stark contrast of the black and white too. Simple and effective! Great graph builder instructions too... 

 

As the weather starts to get cooler here in Australia I was thinking how different the overall shape would be for here. 

 

Cheers,

Michelle

Thanks Michelle!

 

You're right!  I'm willing to bet if we plotted the water temperatures in Australia along side the data from this post, the data values would mirror each other.  Inverse data flows like that would work perfectly in this custom graph.  I might try that in a future post! 😉

 

- Mike

Ahhhh the joys of being data curious... 😉

 

Version history
Last update:
‎04-01-2019 03:02 PM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags