I recently was revisiting the data I used to build a monthly temperature comparison graph and was wondering if there might be another way to display this data's distribution. After looking around a bit I ran across a type of graph I had never seen before known as a Strip Plot. Here is an example what this graph looks like.
I think this graph is really cool! Even though all the data points fall within one of the four categorical values on the X axis, by adding some random noise (or jitter) to the data, you can see the distribution of the data points along the graph's y axis. So how do we create a similar graph that can be used in SAS Visual Analytics? Using the SAS Graph Builder builder of course! In just a short time, I was able to create the following graph:
The report above uses data from the USGS Water Services website (Credit: U.S. Geological Survey - Department of the Interior/USGS). Specifically, I downloaded data from a site located within the San Fransisco bay (site number: 375607122264701). You can see from the plot that the temperature distribution in March and November are quite wide and shaped uniquely.
Pretty cool! In this post I'll show you how to build this type of graph in SAS Visual Analytics.
Start by creating a new custom graph using the SAS® Graph Builder. Drag the following objects onto the canvas in the following order:
Next select the options menu on the left and from the drop down select "Vector Plot 1" and clear the "Show arrowheads" checkbox. Also set the line width to be "2":
Still within the options menu, select "Scatter Plot 1" and set the transparency to 90%. Also change the Marker style to be a unfilled circle and set it's size to be 9:
We will also need to make some axis edits. Select "X Axis" from the options menu drop down and clear all the check boxes EXCEPT for the "Axis line" option. Also set the "Grid lines" option to be "Off":
Next, select "Y Axis" from the options menu drop down and select "Off" for the Grid lines option.
The last edit we need to make in the options menu is to remove all legends. Select "Discrete Legend" from the options menu drop down and deselect the check boxes next to Scatter Plot 2, Vector Plot 1 and Scatter Plot 1.
Now that our graph options are set, we need to make some edits in the "Roles" menu. Select this menu and start by adding a new "Data Driven Lattice Column":
At the next window, select Lattice Column and press "OK".
Under the "Shared Roles" section we will need to rename the shared role which was automatically created when we added our graph elements. Click the three dots next to "Shared Role 1" and click "Edit Role":
At the next screen, rename the role to "Range_Line_Location". Press "OK"
We will also need to rename some of the other roles in our graph. As before, for each role below click the three dots next to the role name and change their respective names to:
We will next need to make use of our shared role. Click the three dots next to the "Vector Plot 1 X Origin" role and select "Use Shared Role" -> "Range_Line_Location":
Inversely, we are going to prevent a role from using our shared role. Under the "SCATTER PLOT 1" section click the three dots next to the "Range_Line_Location" role and choose "Unshare":
Once the role has been unshared. Rename it from "Scatter Plot 1 X" to "Jitter".
And you're done! You've successfully built the custom graph. Save your graph and give it a name.
As I mentioned before, the source data for the report shown at the top of this post can be obtained from the USGS Water Services' website (Credit: U.S. Geological Survey - Department of the Interior/USGS). However if you do not wish to download data from the USGS Water Services' website there is some simulated data attached to this post which can also work in this example.
After importing the downloaded data into SAS, I converted the water temperature readings to fahrenheit. After this conversion the data looks like this:
Next we need to create two character columns, one that has the month name and one with the month number. Also in order to avoid all the data values being plotted on top of each other along the X axis, we will add some random noise to the data. This is also known as Jittering. Jittering is the act of adding random noise to data in order to prevent overplotting in data visualizations. You can read more about adding random jitter in this fantastic paper. We will also be needing a zero constant value which is easily added using an assignment statement.
Finally, we will need to calculate the minimum, maximum and median values for each month. We will use PROC SQL to do this.
The SAS code to complete these steps is below:
data addcols;
set src;
month_num = put(month(date),z2.);
month = date;
format month monname3.;
Jitter=rannor(0)/100;
zero = 0;
run;
proc sql;
create table waterTempJitterStripPlot as select Month label="Month",
water_Temp label="Water Temp", Jitter, zero, min(water_Temp) as min,
max(water_Temp) as max, median(water_Temp) as Median label="Median"
from addcols group by month_num;
quit;
A subset of the final output data is below:
Great! Our data is now ready for SAS Visual Analytics!
All that's left for us is to import the custom graph to a Visual Analytics report.
Add the data set 'waterTempStripPlot' to your report and apply the roles as shown:
Now we will need to change the data limit for our custom graph. With the graph selected, go to the options menu and select the "Override system data limit:" option and add a value of your choice. I decided to make my limit 40,000 (which is the default limit for scatter plots).
We can now see our plot!
All that's left is for us to do is add some window dressing. For this, I created a custom legend from a text object and placed it under the graph. I also have adjusted the object's style options to acquire the colors in my final report. Both of these final report edits are included in the report's JSON file that is in this GitHub Repository. After these edits, our final report looks like this:
Congratulations! You have successfully built a Strip Plot custom graph you can use in SAS Visual Analytics!
This example was created in SAS Visual Analytics 8.4. The data from the report above can be obtained from the USGS Water Services' website (Credit: U.S. Geological Survey - Department of the Interior/USGS). However, if you do not wish to extract the data from the USGS Water Services' website, there is some simulated data in our Visual Analytics Custom Graphs GitHub which can also work in this example.
On Github, you will find the following support files for this article:
Import the data on your SAS Visual Analytics instance. Import the report via the "Import via GUI" section of these instructions.
Really nice work!!!!
Thanks Fredrik!
Excellent article, shows the strength of Graph builder and what you can do to create custom charts.
Thanks Himesh!
If you're looking for more SAS Grapoh Builder visualizations, there are lots of examples in the sassoftware/va-custom-graphs GitHub Repository.
- Mike
Awesome gallery, very inspiring!
Hi Mike,
This is a tremendously helpful resource for someone new to Visual Analytics. I had one question--when I tried to replicate this using my own data, I thought I could compute the max, min and median as aggregated measures from within Visual Analytics using the [_bygroup_] option. As you probably guessed, this didn't work, e.g., it created one median at each temperature point! For my learning, I'm trying to understand why it didn't know that the by-group was the data-driven lattice column (month, in your example)? Would appreciate any guidance if you have a minute.
Thanks,
Andy
Hi Andy!
Glad you found this to be useful!
The reason that the aggregated measures are not working the way you're expecting is because the foundation of this custom graph is a scatter plot. The SAS Visual Analytics documentation states “Scatter plots do not use aggregated data.” Because of this, using an aggregated measure in this custom graph will not create the output you’re looking for.
However, if you would still like to perform your aggregations within Visual Analytics and use them in this custom graph, you could try this method:
Create an aggregated data source that contains only the "Month" and calculated item: "Aggregated Median" variables. Then create a new data source join and left join the new aggregated data source to the main data source. This would create the needed median column that has all the values repeated within each month.
Hope this helps!
- Mike
Thanks so much, Mike! I really appreciate it.
-Andy
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.