Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
Welcome back to the Free Data Friday series! I hope everyone had a great summer and looks forward to getting back into analytics and free data.
Today’s post is relatively simple and answers questions I’ve had for a while on transportation costs. We’re going to be looking at gas prices, specifically for the US and how they’ve changed over the past 22 years or so.
I got the data from the US Energy Information Administration, and you can download it here.
There were a lot of columns in the data set – many of them I didn’t understand – so I went with the Date and the first column called “Weekly U.S. Regular Conventional Retail Gasoline Prices (Dollars per Gallon).” I used this because it had the most data available and I really wanted “big data." I changed the name of the variable to “All_Types_Dollar_Per_Gallon” when I imported the file.
The first thing I wanted to visualize was a simple line graph using all the data so I could see when things really peaked. I used the SAS University Edition Task to create the graph, but when I ran it I got the following error message:
“Too many discrete values.” OK, so basically there’s too much data and SAS UE can’t handle it for the line graph. Not to worry, we have alternatives! Our second option is a scatterplot; again, I’m going to use the Task and select the Category and Group variables, leaving everything else blank.
Now this is better – and very apparent that there has been a dramatic and significant increase in the price of gas throughout the US over the past 12 years. (And all my American friends are now going “No kidding!”) Although this graph is very clear and provides some really good information, I wanted to drill down a little more and see what the average per quarter was.
Proc sql; Create table work.quarter as Select distinct qtr(Date), avg(All_Types_Dollar_Per_Gallon) From work.import Group by qtr(Date) Order by qtr(Date); Quit;
When I created the scatterplot for this, it looks like 2005 was the year that average price jumped above $2 a gallon and pretty much has stayed there since.
So now I wanted to see if there was any particular quarter that was consistently lower/higher than any other. I use this type of analysis when I’m starting any sort of forecasting or longitudinal analysis to see if there are any clear seasonal trends that I need to be aware of. For this I’m using the Box Plot task, and using the data in the Work.Quarter table.
For the data I’m looking at, it doesn’t appear that there are any significant differences between quarters – Q2 and Q3 tend to be a little higher, but if I was continuing this analysis I wouldn’t be too terribly concerned. I should note however that a more in-depth look at the year-by-year variation would be needed before I went too much further with any forecasts or projections.
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.