BookmarkSubscribeRSS Feed

Discovering America's National Parks with SAS

Started ‎10-25-2019 by
Modified ‎08-03-2021 by
Views 3,537

Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:

 

Access Now

Muir Woods, Part of Golden Gate NRAMuir Woods, Part of Golden Gate NRA

 

Thanks to a bill signed in 1872 by President Ulysses S. Grant, which created Yellowstone, the first of many such parks, millions of Americans have enjoyed these national parks and historic monuments. In 1916, President Woodrow Wilson created the National Park Service to manage these places and ensure their conservation for future generations.

 

This edition of Free Data Friday uses decades of park service data to discover which site has attracted the most recreational visitors.

 

Get the Data

 

The data can be downloaded in a number of formats from the National Parks Service Statistics viewer. I chose National Reports from this page and Annual Summary Report 1904-Last Calendar Year. This presents you with a viewer and options for the level of detail to view and download. The file contains data for many types of visitors (recreational and non-recreational, campers, visitors in RVs etc). I chose to download an Excel file but CSV is also available.

 

Get Started with SAS OnDemand for Academics

 
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:
 

Get Started

 

Getting the Data Ready

 

From the Excel file I deleted some header rows at the top, summary rows from the bottom and renamed both the workbook and worksheet for convenience. I saved the resulting file in XLSX format so that I could use the XLSX engine to read it.

 

The Results

 

I decided to concentrate on recreational visitors and to look first at which park had received the most all-time recreational visitors. This was a simple task - firstly sorting the file by park name, then running PROC Means to aggregate recreational visitors for each park and then re-sorting by the total number of such visitors. Finally I ran PROC SGPlot to create a horizontal bar chart of the top ten parks. Here is the code I used followed by the chart generated:

 

 

options validvarname=any;
libname visitors xlsx "/folders/myshortcuts/Dropbox/NationalParksVisitors.xlsx";

proc sort data=visitors.visitorstats out=visitorsorted;
	by parkname;
run;

proc means data=visitorsorted noprint sum;
	by parkname;
	var recreationvisitors;
	output out=recvisitors sum=totrecvisitors;
run;

proc sort data=recvisitors;
	by descending totrecvisitors ;
run;

ods graphics / imagemap;
title1 'US National Parks & Monuments';
title2 'All-Time Most Popular for Recreational Visitors';
footnote1 j=l 'Data from US National Parks Service';
footnote2 j=l 'https://irma.nps.gov/Stats/';
proc sgplot data=recvisitors(obs=10);
	hbar parkname /  response=totrecvisitors
		categoryorder=respdesc
		dataskin=pressed fillattrs=(color=vpab)
		tip=(parkname totrecvisitors)
		tiplabel=("Name:" "Total Visitors:")
		tipformat=(auto comma12.0);
	xaxis label="Number of Recreational Visitors" fitpolicy=none;
	yaxis label="Park/Monument Name";
run;

 

All-Time Chart.png

 

We can see from this chart that Blue Ridge Parkway is by far the most visited park but this isn't the whole story - parks were established at different times so naturally the longer established parks have an advantage. In order to level the playing field a little, I filtered out the data for the most recent year available, 2018. Here is the code and output.

 

 

proc sort data=visitors.visitorstats(where=(year=2018)) out=visitors2018;
	by descending recreationvisitors;
run;

ods graphics / imagemap;
title1 'US National Parks & Monuments';
title2 'Most Popular for Recreational Visitors in 2018';
footnote1 j=l 'Data from US National Parks Service';
footnote2 j=l 'https://irma.nps.gov/Stats/';
proc sgplot data=visitors2018(obs=10);
	hbar parkname /  response=recreationvisitors
		categoryorder=respdesc
		dataskin=pressed fillattrs=(color=vpab)
		tip=(parkname recreationvisitors)
		tiplabel=("Name:" "Total Visitors:")
		tipformat=(auto comma10.0);
	xaxis label="Number of Recreational Visitors" fitpolicy=none;
	yaxis label="Park/Monument Name";
run;

 

2018 Chart.png

 

Here we can see that in 2018 Golden Gate was the most popular park closely followed by Blue Ridge. The file contains data for Blue Ridge from 1941 (shortly after its formation in 1936) whereas Golden Gate (formed in 1972) has data from 1973. It seemed to me that for Golden Gate to reach second in the all-time list and top in 2018 was a remarkable achievement considering it was established only fairly recently in historical terms. We can, however, get a clearer picture by seeing how often each park achieved top spot in the rankings.

 

 

proc sql;
	create table topbyyear
	as select year, parkname, max(recreationvisitors) as visitornum format=comma10.
	from visitorsorted
	group by year
	having recreationvisitors=max(recreationvisitors);
quit;

proc sql;
	create table toppark
	as select  parkname, count(parkname) as topcount
	from topbyyear
	group by parkname
	order by  topcount desc;
quit;


ods graphics / imagemap;
title1 'US National Parks & Monuments';
title2 'Number of Years in Top Place for Recreational Visitors';
footnote1 j=l 'Data from US National Parks Service';
footnote2 j=l 'https://irma.nps.gov/Stats/';
proc sgplot data=toppark;
	hbar parkname /  response=topcount
		categoryorder=respdesc
		dataskin=pressed fillattrs=(color=vpab)
		tip=(parkname topcount)
		tiplabel=("Name:" "Times Top Park:")
		tipformat=(auto comma10.0);
	xaxis label="Number of Times Top Park" fitpolicy=none;
	yaxis label="Park/Monument Name";
run;

 

Top Place.png

 

Again, Blue Ridge is far ahead of the pack making it consistently America's most visited park. Golden Gate is a long way behind but again still ahead of many parks which were established earlier. Finally, let's chart the attendance records of Blue Ridge and Golden Gate against each other.

 

 

ods graphics / imagemap;
title1 'US National Parks & Monuments';
title2 'Blue Ridge v Golden Gate Visitors Timeline';
footnote1 j=l 'Data from US National Parks Service';
footnote2 j=l 'https://irma.nps.gov/Stats/';	
proc sgplot data=visitorsorted(where=(parkname in ("Blue Ridge PKWY" "Golden Gate NRA")));
	series x=year y=recreationvisitors /
		group=parkname name="pname"
		markers
		tip=(parkname year recreationvisitors)
		tiplabel=("Name:" "year" "Total Visitors:")
		tipformat=(auto auto comma10.0);
		keylegend "pname" / type=markersymbol;
		keylegend "pname" / type=linecolor;
run;

 

Golden Gate v Blue Ridge.png

 

There are several points to note here:

 

  1. Blue Ridge's visitor numbers grew steadily from its establishment until 2002 after which numbers started to decline until achieving stability relatively recently; and
  2. Golden Gate saw explosive growth in visitor number during its early years, showing a sharp decline after 1988 and being relatively stable since; and
  3. The last few years have seen a very close battle between Blue Ridge and Golden Gate with neither park achieving supremacy. In fact the topbyyear data set tells us that one of these two parks has occupied top spot every year since 1965 when National Capital Parks Combined took the number one place!

So, having completed my analysis I took a look at the chart showing the top parks for 2018 and realised I have visited five of the top ten:

 

  1. Golden Gate
  2. The Lincoln Memorial
  3. The George Washington Memorial
  4. The Grand Canyon
  5. The Vietnam Veterans Memorial

Considering I'm British and have lived in the UK all my life, all of these visits have taken place during holidays and visits to the US for SAS Global Forum, so I think five out of ten is pretty good. Tell us in the comments below how many you have visited.

 

Now it's your Turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

Visit [[this link]] to see all the Free Data Friday articles.

 

Version history
Last update:
‎08-03-2021 02:06 PM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags