01-17-2016 04:34 PM
Soft Skills – Graphing II
This is the first post in this series where we get more technical – but I only provide examples of graphs, not the code to create them as I don’t want to lose focus of my purpose. My last post talked about the “ethics of graphics” and I wanted to spend this article some examples to what I was referring.
SAS is fantastic at providing multiple avenues to creating your graphs (ODS Graphics Designer, Graph’n’Go, Enterprise Guide, SAS Studio Tasks or Snippets), but other than limiting the variables you can use for example, categorical / response variables, SAS is not “intelligent” enough to know what graph(s) you should use (nor should it, in my opinion – although a certain spreadsheet program has recently started to try, without success in my opinion). So because SAS can provide you different paths to your graphic enlightenment, not only do you need to understand the strengths / weaknesses of each tool, but even more importantly you need to understand the appropriate graph type for your data. This post will cover an extremely basic example, using the SHOES dataset from the SASHELP library.
You’ve been asked to prepare a brief report for the Board on how each region has been performing this fiscal year. You have the data in SAS already, and decide that because you’re pressed for time and assume no one is really going to pay attention, that you’ll just put the information into a pie graph, put it into Powerpoint, and still make it home in time for pizza.
Here’s the graph that you assume will be sufficient:
Right off, one significant issue is that the labels for Asia and Africa overlap. Another is that there doesn’t appear to be any order to the positioning of the slices – it’s not by increasing amounts, alphabetical, or anything else I can tell. The other challenge, as mentioned in my previous article, is that the human eye cannot distinguish differences when the shapes are irregularly shaped. Although I can see the difference between the Pacific and United States, does it make sense that the US and Western Europe, even though side-by-side, look almost the same size?
OK, going back to our scenario – but this time, it’s mid-afternoon, and rather than rushing you have some time to think about it. You decide a bar graph will be more appropriate, so you pull one together and the result looks like this:
Well, this is better! You clearly see the differences between the regions, particularly between Africa and Asia. The US and Western Europe also has a much clearer difference. The nice thing about bar graphs like this is that you can easily add a horizontal reference line, to show benchmarks like Expected Sales, Average from Previous Period, etc.
OK so in the next scenario you remember something about graphs from a course you took online last year, and figured that a scatterplot was the way you were going to go. Here’s what the graph looks like:
So this has the different stores as the individual dots, grouped by region. Well, this is fantastic – right away, you can clearly see some pretty significant outliers. Looking back at the bar graph, South America has higher overall total sales than the Pacific region. Looking at the scatterplot, something much more interesting appears – one store in the Pacific region has much higher sales than any other in South America, but because South America has more stores, they pull in more money. As an analyst, this may be something you want to highlight on the graph and let the board make a decision about more stores in the Pacific region.
Looking at the Middle East region, there appears to be 3 distinct clusters – low, medium and high – and it may be worth suggesting that the lower performing stores be closed down, and additional stores be opened up close by the higher performance store. Staff performance, area of location etc. would have to be detailed, but this would never have been found if you used one of the previous graphs.
Finally, my favourite type of graph for this type of data – boxplots. You know that this type of graph tells a complete story, and because you are an analyst that works with your team, the Board also knows the basic interpretation of a boxplot. You can see the variation in the regions much more comprehensively than any previous graph. Clusters of outliers make themselves more apparent than on the scatterplot, and the range of the interquartile ranges may allow the marketing or others in your organisation to start planning for changes in the structure of the business.
I realise of course this is sort of a ultra-basic example, but I feel having the typical types of graphs shown and compared to two less-frequently used graphs highlights the need for a solid understanding of the data, the question, and the possible analyses. I would like to emphasize as well that a thorough data exploration is key – even if it’s a dataset you’re familiar with – because new data may have been entered, there may be new tables / fields (in the case of a database), or there may be something that requires further investigation (NULLs for example in a field that should not have them).
As always I look forward to hearing your thoughts and experiences with graphing; until the next time, happy analysing!