Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
Muir Woods, Part of Golden Gate NRA
Thanks to a bill signed in 1872 by President Ulysses S. Grant, which created Yellowstone, the first of many such parks, millions of Americans have enjoyed these national parks and historic monuments. In 1916, President Woodrow Wilson created the National Park Service to manage these places and ensure their conservation for future generations.
This edition of Free Data Friday uses decades of park service data to discover which site has attracted the most recreational visitors.
The data can be downloaded in a number of formats from the National Parks Service Statistics viewer. I chose National Reports from this page and Annual Summary Report 1904-Last Calendar Year. This presents you with a viewer and options for the level of detail to view and download. The file contains data for many types of visitors (recreational and non-recreational, campers, visitors in RVs etc). I chose to download an Excel file but CSV is also available.
From the Excel file I deleted some header rows at the top, summary rows from the bottom and renamed both the workbook and worksheet for convenience. I saved the resulting file in XLSX format so that I could use the XLSX engine to read it.
I decided to concentrate on recreational visitors and to look first at which park had received the most all-time recreational visitors. This was a simple task - firstly sorting the file by park name, then running PROC Means to aggregate recreational visitors for each park and then re-sorting by the total number of such visitors. Finally I ran PROC SGPlot to create a horizontal bar chart of the top ten parks. Here is the code I used followed by the chart generated:
options validvarname=any;
libname visitors xlsx "/folders/myshortcuts/Dropbox/NationalParksVisitors.xlsx";
proc sort data=visitors.visitorstats out=visitorsorted;
by parkname;
run;
proc means data=visitorsorted noprint sum;
by parkname;
var recreationvisitors;
output out=recvisitors sum=totrecvisitors;
run;
proc sort data=recvisitors;
by descending totrecvisitors ;
run;
ods graphics / imagemap;
title1 'US National Parks & Monuments';
title2 'All-Time Most Popular for Recreational Visitors';
footnote1 j=l 'Data from US National Parks Service';
footnote2 j=l 'https://irma.nps.gov/Stats/';
proc sgplot data=recvisitors(obs=10);
hbar parkname / response=totrecvisitors
categoryorder=respdesc
dataskin=pressed fillattrs=(color=vpab)
tip=(parkname totrecvisitors)
tiplabel=("Name:" "Total Visitors:")
tipformat=(auto comma12.0);
xaxis label="Number of Recreational Visitors" fitpolicy=none;
yaxis label="Park/Monument Name";
run;
We can see from this chart that Blue Ridge Parkway is by far the most visited park but this isn't the whole story - parks were established at different times so naturally the longer established parks have an advantage. In order to level the playing field a little, I filtered out the data for the most recent year available, 2018. Here is the code and output.
proc sort data=visitors.visitorstats(where=(year=2018)) out=visitors2018;
by descending recreationvisitors;
run;
ods graphics / imagemap;
title1 'US National Parks & Monuments';
title2 'Most Popular for Recreational Visitors in 2018';
footnote1 j=l 'Data from US National Parks Service';
footnote2 j=l 'https://irma.nps.gov/Stats/';
proc sgplot data=visitors2018(obs=10);
hbar parkname / response=recreationvisitors
categoryorder=respdesc
dataskin=pressed fillattrs=(color=vpab)
tip=(parkname recreationvisitors)
tiplabel=("Name:" "Total Visitors:")
tipformat=(auto comma10.0);
xaxis label="Number of Recreational Visitors" fitpolicy=none;
yaxis label="Park/Monument Name";
run;
Here we can see that in 2018 Golden Gate was the most popular park closely followed by Blue Ridge. The file contains data for Blue Ridge from 1941 (shortly after its formation in 1936) whereas Golden Gate (formed in 1972) has data from 1973. It seemed to me that for Golden Gate to reach second in the all-time list and top in 2018 was a remarkable achievement considering it was established only fairly recently in historical terms. We can, however, get a clearer picture by seeing how often each park achieved top spot in the rankings.
proc sql;
create table topbyyear
as select year, parkname, max(recreationvisitors) as visitornum format=comma10.
from visitorsorted
group by year
having recreationvisitors=max(recreationvisitors);
quit;
proc sql;
create table toppark
as select parkname, count(parkname) as topcount
from topbyyear
group by parkname
order by topcount desc;
quit;
ods graphics / imagemap;
title1 'US National Parks & Monuments';
title2 'Number of Years in Top Place for Recreational Visitors';
footnote1 j=l 'Data from US National Parks Service';
footnote2 j=l 'https://irma.nps.gov/Stats/';
proc sgplot data=toppark;
hbar parkname / response=topcount
categoryorder=respdesc
dataskin=pressed fillattrs=(color=vpab)
tip=(parkname topcount)
tiplabel=("Name:" "Times Top Park:")
tipformat=(auto comma10.0);
xaxis label="Number of Times Top Park" fitpolicy=none;
yaxis label="Park/Monument Name";
run;
Again, Blue Ridge is far ahead of the pack making it consistently America's most visited park. Golden Gate is a long way behind but again still ahead of many parks which were established earlier. Finally, let's chart the attendance records of Blue Ridge and Golden Gate against each other.
ods graphics / imagemap;
title1 'US National Parks & Monuments';
title2 'Blue Ridge v Golden Gate Visitors Timeline';
footnote1 j=l 'Data from US National Parks Service';
footnote2 j=l 'https://irma.nps.gov/Stats/';
proc sgplot data=visitorsorted(where=(parkname in ("Blue Ridge PKWY" "Golden Gate NRA")));
series x=year y=recreationvisitors /
group=parkname name="pname"
markers
tip=(parkname year recreationvisitors)
tiplabel=("Name:" "year" "Total Visitors:")
tipformat=(auto auto comma10.0);
keylegend "pname" / type=markersymbol;
keylegend "pname" / type=linecolor;
run;
There are several points to note here:
So, having completed my analysis I took a look at the chart showing the top parks for 2018 and realised I have visited five of the top ten:
Considering I'm British and have lived in the UK all my life, all of these visits have taken place during holidays and visits to the US for SAS Global Forum, so I think five out of ten is pretty good. Tell us in the comments below how many you have visited.
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Visit [[this link]] to see all the Free Data Friday articles.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.