I recently was revisiting the data I used to build a monthly temperature comparison graph and was wondering if there might be another way to display this data's distribution. After looking around a bit I ran across a type of graph I had never seen before known as a Strip Plot. Here is an example what this graph looks like.
I think this graph is really cool! Even though all the data points fall within one of the four categorical values on the X axis, by adding some random noise (or jitter) to the data, you can see the distribution of the data points along the graph's y axis. So how do we create a similar graph that can be used in SAS Visual Analytics? Using the SAS Graph Builder builder of course! In just a short time, I was able to create the following graph:
The report above uses data from the USGS Water Services website (Credit: U.S. Geological Survey - Department of the Interior/USGS). Specifically, I downloaded data from a site located within the San Fransisco bay (site number: 375607122264701). You can see from the plot that the temperature distribution in March and November are quite wide and shaped uniquely.
Pretty cool! In this post I'll show you how to build this type of graph in SAS Visual Analytics.
Start by creating a new custom graph using the SAS® Graph Builder. Drag the following objects onto the canvas in the following order:
Next select the options menu on the left and from the drop down select "Vector Plot 1" and clear the "Show arrowheads" checkbox. Also set the line width to be "2":
Still within the options menu, select "Scatter Plot 1" and set the transparency to 90%. Also change the Marker style to be a unfilled circle and set it's size to be 9:
We will also need to make some axis edits. Select "X Axis" from the options menu drop down and clear all the check boxes EXCEPT for the "Axis line" option. Also set the "Grid lines" option to be "Off":
Next, select "Y Axis" from the options menu drop down and select "Off" for the Grid lines option.
The last edit we need to make in the options menu is to remove all legends. Select "Discrete Legend" from the options menu drop down and deselect the check boxes next to Scatter Plot 2, Vector Plot 1 and Scatter Plot 1.
Now that our graph options are set, we need to make some edits in the "Roles" menu. Select this menu and start by adding a new "Data Driven Lattice Column":
At the next window, select Lattice Column and press "OK".
Under the "Shared Roles" section we will need to rename the shared role which was automatically created when we added our graph elements. Click the three dots next to "Shared Role 1" and click "Edit Role":
At the next screen, rename the role to "Range_Line_Location". Press "OK"
We will also need to rename some of the other roles in our graph. As before, for each role below click the three dots next to the role name and change their respective names to:
We will next need to make use of our shared role. Click the three dots next to the "Vector Plot 1 X Origin" role and select "Use Shared Role" -> "Range_Line_Location":
Inversely, we are going to prevent a role from using our shared role. Under the "SCATTER PLOT 1" section click the three dots next to the "Range_Line_Location" role and choose "Unshare":
Once the role has been unshared. Rename it from "Scatter Plot 1 X" to "Jitter".
And you're done! You've successfully built the custom graph. Save your graph and give it a name.
As I mentioned before, the source data for the report shown at the top of this post can be obtained from the USGS Water Services' website (Credit: U.S. Geological Survey - Department of the Interior/USGS). However if you do not wish to download data from the USGS Water Services' website there is some simulated data attached to this post which can also work in this example.
After importing the downloaded data into SAS, I converted the water temperature readings to fahrenheit. After this conversion the data looks like this:
Next we need to create two character columns, one that has the month name and one with the month number. Also in order to avoid all the data values being plotted on top of each other along the X axis, we will add some random noise to the data. This is also known as Jittering. Jittering is the act of adding random noise to data in order to prevent overplotting in data visualizations. You can read more about adding random jitter in this fantastic paper. We will also be needing a zero constant value which is easily added using an assignment statement.
Finally, we will need to calculate the minimum, maximum and median values for each month. We will use PROC SQL to do this.
The SAS code to complete these steps is below:
data addcols; set src; month_num = put(month(date),z2.); month = date; format month monname3.; Jitter=rannor(0)/100;
zero = 0; run;
create table waterTempJitterStripPlot as select Month label="Month",
water_Temp label="Water Temp", Jitter, zero, min(water_Temp) as min,
max(water_Temp) as max, median(water_Temp) as Median label="Median"
from addcols group by month_num;
A subset of the final output data is below:
Great! Our data is now ready for SAS Visual Analytics!
All that's left for us is to import the custom graph to a Visual Analytics report.
Add the data set 'waterTempStripPlot' to your report and apply the roles as shown:
Now we will need to change the data limit for our custom graph. With the graph selected, go to the options menu and select the "Override system data limit:" option and add a value of your choice. I decided to make my limit 40,000 (which is the default limit for scatter plots).
We can now see our plot!
All that's left is for us to do is add some window dressing. For this, I created a custom legend from a text object and placed it under the graph. I also have adjusted the object's style options to acquire the colors in my final report. Both of these final report edits are included in the report's JSON file that is in this GitHub Repository. After these edits, our final report looks like this:
Congratulations! You have successfully built a Strip Plot custom graph you can use in SAS Visual Analytics!
This example was created in SAS Visual Analytics 8.4. The data from the report above can be obtained from the USGS Water Services' website (Credit: U.S. Geological Survey - Department of the Interior/USGS). However, if you do not wish to extract the data from the USGS Water Services' website, there is some simulated data in our Visual Analytics Custom Graphs GitHub which can also work in this example.
On Github, you will find the following support files for this article:
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.