Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
Every day at 5 pm the UK government holds a televised press conference on the situation with regard to the current coronavirus pandemic. They always show three time series graphs - the first being the daily amount of traffic across various modes of transport, in order to demonstrate the effectiveness of the lockdown.
In this edition of Free Data Friday we will be looking at data from the Census Transportation Planning Products Program (CTPP) to explore the popularity of commuting methods in the US.
The data can be downloaded from this page on the CTPP web site in a choice of formats. I chose to download it as a comma separated file (CSV) and used Proc Import to bring the data into a SAS data set like this
filename reffile '/folders/myshortcuts/Dropbox/Means of Transportation.csv'; proc import datafile=reffile dbms=csv out=means replace; getnames=yes; guessingrows=2000; datarow=4; run;
You will notice that as usual I used a large value for the guessingrows statement and that the data starts at row 4.
There were a few issues with the file after the import, in particular:
I decided to fix these with a data step
data means_clean(drop=var3 var4); length commute_num 8.; set means(rename=(A102106___Means_of_transportati=state var2=mode)); commute_num=input(var4,comma32.); if var3="Estimate" and mode ne "Total, means of transportation" then output; run;
This gave me a file looking like this
I decided to use the data to discover which state had the largest number of commuters for each mode of travel. This was quite straightforward using Proc Sort, Proc Rank and Proc SQL
proc sort data=means_clean out=sorted; by mode; run; proc rank data=sorted out=ranked_mode descending; by mode; var commute_num; ranks ranking; run; proc sql; select state, mode,commute_num from ranked_mode where ranking=1; quit;
Here is the output from the Proc SQL call
Not surprisingly the list is dominated by the two huge states in terms of population, California and New York. Interestingly by far the most popular mode of transport is driving alone in a vehicle.It would seem that this method is a prime target for authorities to target to try to encourage car pooling and thereby reduce congestion, global warming and pollution.
I then decided to see what else I could discover with this data set by ranking the states according to the position of bicycle riding in their own "league table" of methods
proc sort data=means_clean out=sorted; by state mode; run; proc rank data=sorted out=ranked_state descending; by state; var commute_num; ranks ranking; run; proc sql; select state, mode,ranking, commute_num from ranked_state where mode="Bicycle" order by ranking; quit;
We can see that the winner here is Montana where bicycle riding is the fifth most popular method of commuting. I confess that I have no idea why this would be the case so if you know please leave a message in the comments below.
There is a lot more you could do with the data in this and other tables from from the CTPP so why not download SAS University Edition and some data and give it a try yourself? If you have any problems just leave a question in the appropriate SAS Communities Forum.
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Visit [[this link]] to see all the Free Data Friday articles.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.