Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
Every day at 5 pm the UK government holds a televised press conference on the situation with regard to the current coronavirus pandemic. They always show three time series graphs - the first being the daily amount of traffic across various modes of transport, in order to demonstrate the effectiveness of the lockdown.
In this edition of Free Data Friday we will be looking at data from the Census Transportation Planning Products Program (CTPP) to explore the popularity of commuting methods in the US.
The data can be downloaded from this page on the CTPP web site in a choice of formats. I chose to download it as a comma separated file (CSV) and used Proc Import to bring the data into a SAS data set like this
filename reffile '/folders/myshortcuts/Dropbox/Means of Transportation.csv';
proc import datafile=reffile
dbms=csv
out=means
replace;
getnames=yes;
guessingrows=2000;
datarow=4;
run;
You will notice that as usual I used a large value for the guessingrows statement and that the data starts at row 4.
There were a few issues with the file after the import, in particular:
I decided to fix these with a data step
data means_clean(drop=var3 var4);
length commute_num 8.;
set means(rename=(A102106___Means_of_transportati=state
var2=mode));
commute_num=input(var4,comma32.);
if var3="Estimate" and mode ne "Total, means of transportation" then output;
run;
This gave me a file looking like this
I decided to use the data to discover which state had the largest number of commuters for each mode of travel. This was quite straightforward using Proc Sort, Proc Rank and Proc SQL
proc sort data=means_clean out=sorted;
by mode;
run;
proc rank data=sorted out=ranked_mode descending;
by mode;
var commute_num;
ranks ranking;
run;
proc sql;
select state, mode,commute_num
from ranked_mode
where ranking=1;
quit;
Here is the output from the Proc SQL call
Not surprisingly the list is dominated by the two huge states in terms of population, California and New York. Interestingly by far the most popular mode of transport is driving alone in a vehicle.It would seem that this method is a prime target for authorities to target to try to encourage car pooling and thereby reduce congestion, global warming and pollution.
I then decided to see what else I could discover with this data set by ranking the states according to the position of bicycle riding in their own "league table" of methods
proc sort data=means_clean out=sorted;
by state mode;
run;
proc rank data=sorted out=ranked_state descending;
by state;
var commute_num;
ranks ranking;
run;
proc sql;
select state, mode,ranking, commute_num
from ranked_state
where mode="Bicycle"
order by ranking;
quit;
This gives
We can see that the winner here is Montana where bicycle riding is the fifth most popular method of commuting. I confess that I have no idea why this would be the case so if you know please leave a message in the comments below.
There is a lot more you could do with the data in this and other tables from from the CTPP so why not download SAS University Edition and some data and give it a try yourself? If you have any problems just leave a question in the appropriate SAS Communities Forum.
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Visit [[this link]] to see all the Free Data Friday articles.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.