BookmarkSubscribeRSS Feed

Finding the Most Golf Crazy State with SAS

Started ‎04-19-2019 by
Modified ‎08-04-2021 by
Views 1,763

Introduction

 

 

FreeDataFriday_graphic.jpgI always enjoyed the Free Data Friday articles contributed by a number of SAS Community users (with a special mention for @DarthPathos who wrote most of them). These were short, snappy articles which showed how you could easily use open data  with SAS to answer interesting questions and practice your SAS skills. We haven’t had one of these articles for a while and so, with the massive increase in the quantity of open data available over the past few years, I thought I’d revive the tradition.

 

Last weekend sports fans were thrilled by the amazing comeback story of Tiger Woods in the US Masters Golf Tournament at Augusta. I decided to mark this event by trying to find out which US state is the most “golf crazy” and the answer turned out to be more unexpected and interesting than I could possibly have imagined!

 

The Data

 

There is no objective measurement of “golf craziness” so I decided to use the number of courses per state as a reasonable proxy. After all you would think that the state with most courses is the one most interested in the game. A Google search pointed me towards the Bureau of Labor Statistics where I found a downloadable CSV file containing the number of golf and country clubs by state (while I have no direct experience of US country clubs I’ll assume most, if not all of them have a golf course). I could download this with Proc HTTP.

 

 

%let url=https://data.bls.gov/cew/data/api/2018/3/industry/713910.csv;
filename out "/folders/myshortcuts/Dropbox/golfcourses.csv";
proc http url="&url" method="get" out=out;
run;

 

The file was more complex than most CSVs as it contained data at different levels of aggregation, used five-digit FIPS codes and contained data for some locations which were not states (Puerto Rico, Washington DC and the US Virgin Islands). By examining the metadata for the file, I was able to determine which aggregate level code to use, how to split up the area FIPS code and which state codes to exclude. I then imported it into SAS using the following data step.

 

 

data golfcourses;
	length area_fips $5 own_code $1 industry_code $3 agglvl_code $2
		size_code $1 year $4 qtr $1 disclosure_code $1
		qtrly_estabs 8 state_code $2 state_name $15;
	infile "/folders/myshortcuts/Dropbox/golfcourses.csv" dlm="," dsd
		missover truncover firstobs=2;
	input area_fips own_code industry_code agglvl_code size_code
		year qtr disclosure_code qtrly_estabs;
	if agglvl_code="58" then do;
		state_code=substr(area_fips,1,2);
		if state_code not in("11" "72" "78") then do;
			state_name=fipnamel(input(state_code,8.));
			output;
		end;
	end;	
run;

 

 

I also wanted a more objective measure of golf interest; after all you would expect the larger more populous states such as California, Florida and Texas to have the most courses so I decided to calculate the number of courses per 100,000 people in order to remove this “big state bias”. In order to do this, I needed data showing state populations for 2018, the same year as the golf course data. Naturally I found that at the US Census Bureau web site, downloaded it and imported it into SAS (again removing geographies which aren't states).

 

 

data population;
	length sumlev 8 region 8 division 8 state 8 name $30
		state_name $15 census2010pop 8 estimatebase2010
		popestimate2010-popestimate2018 8;
	infile "/folders/myshortcuts/Dropbox/nst-est2018-alldata.csv" dlm="," dsd
		missover truncover firstobs=2;
	input sumlev region division state name
		census2010pop estimatebase2010 
		popestimate2010-popestimate2018;
	state_name=name;
	if name not in ('United States' 'Northeast Region' 'Midwest Region'
		'South Region' 'West Region' 'District of Columbia'
		'Puerto Rico') then output;
run;

 

Processing

 

The next step was to create an aggregated data table of courses by state and order it by the total number of courses by state

(the CSV has them split by ownership category).

.

 

 

proc sql;
	create table course_summary
	as select distinct state_name, sum(qtrly_estabs) as tot_courses
	from golfcourses
	group by state_name
	order by tot_courses descending;
quit;

 

I could then list the top ten states by number of golf courses.

 

 

title "Number of Golf Courses per State - 2018";
proc print data=course_summary(obs=10);
run;

 

 

Proc Print Golf Courses per State.png

 

So far there are no real surprises here – as expected the big states of Florida, California, New York and Texas are all in the top ten. I then merged the population data with the total courses data, calculated the number of courses per 100,000 people, sorted the output by number of courses for every 100,000 people and again printed the top ten courses.

 

 

proc sort data=course_summary;
	by state_name;
run;

proc sort data=population;
	by state_name;
run;

data courses_per100k;
	merge course_summary population(keep=state_name popestimate2018);
	by state_name;
	courses_per100k=tot_courses/(popestimate2018/100000);
run;

proc sort data=courses_per100k;
	by descending courses_per100k;
run;

title "Number of Golf Courses per State per 100K People- 2018";
proc print data=courses_per100k(obs=10);
	var state_name courses_per100k;
run;

 

 

Proc Print Golf Courses per State per 100K people.png

 

 

This is where for me the surprise comes - at the top of this list with over 10 courses per 100,000 people is North Dakota, followed by Iowa with South Dakota weighing in at number three. These are all smaller states in terms of population and also more northerly states where I would expect year-round usage of golf courses to be less likely than in the sun belt states of the south. Florida, which tops the previous list for the number of courses, appears in a lowly 22nd position on this list!

 

Conclusion

 

So, do you know why North Dakota should appear by this measure to be the most golf crazy state? If you do, please leave a message in the comments below. In addition, why not consider using this, or any other open data to write your own article – it’s a great way to become more involved in the community.

 

Visit [[this link]to see all the Free Data Friday articles.

Comments

Hi @ChrisBrooks - This is amazing and I have been wanting to get back into the Free Data Friday blogs, so thanks for picking up this work!  I'm thrilled you enjoyed the Blog, and I'll reach out to @BevBrown - if you're interested maybe we could do a Tag Team sort of set up?  I'll email her now, and look forward to chatting with you soon!

Chris

Hi @DarthPathos and many thanks - that sounds a good idea, perhaps we can make arrangements via Private Messages and copy Bev in.

Ah, the power of Chris2 ... Can't wait to see what you two cook up!

 

Version history
Last update:
‎08-04-2021 01:37 PM
Updated by:

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags