Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
Geospatial analysis is one of my favourite types of analysis -- being able to take potentially complex, messy, and big data and see how it fits onto a map is just so cool. Whether it’s looking at people who have come down with the flu, broken watermains, or traffic accidents, seeing it on a map can be much more effective that seeing it on a “regular” graph.
Unfortunately, SAS University Edition does not come with the powerful geographic analytical tools that are standard in Base SAS, nor does University Edition have the ability to create the maps that can be done. Having said that, there are some cool things you can do with some creativity, patience, and a bunch of SQL.
Get the Data
I’m using a crowdsourced dataset on campgrounds – I used to love camping as a kid, and was really excited to see this data. The original dataset is multiple CSV files, which I merged, cleaned (I didn’t need all the variables, and there was 1 duplicate record) and imported into SAS University Edition. The data can be downloaded here, and I recommend you copy / paste the list of short forms from the site into your code for easy reference.
Getting the data ready
Other than merging the data into a single Excel file, there wasn’t much I had to do. Before I imported it, I used the Excel “delete duplicates” tool (which removed the 1 record I mentioned above) and then imported it into the University Edition environment.
The Results
So one of the key questions you can answer using geospatial analysis is the entity (campground, library, school, etc.) that is closest to another entity (your location, all the other campgrounds / libraries / schools, etc.).
There are a few different methods to determining distance between two places; refer to Calculating Geographic Distance: Concepts and Methods by Frank Ivis to see detailed examples of the more common ones. I have typically used “straight line” distance, as it’s the least computer intensive and does provide a decent approximation. I have always had the most success
using the Haversine formula, as the distances are always closest to what I get using mapping software. This is the method I use below.
Now that we have our campground data, I want to see what the closest campgrounds are that meet specific parameters. Let’s take a look at the code and break it down.
proc sql;
select
/* -43.670, -79.386 */
/* 1 */
/* (note – the (constant(‘pi’)
is the way you use Pi in SAS, and works for e, i and other constants you may need */
6371*(2 * arsin(min(1,sqrt(SIN( ((LAT-43.670)*(constant('pi')/180))/2)**2 +
cos(43.670*(constant('pi')/180)) * cos(lat*(constant('pi')/180))
* sin ( ((lon-(-79.386))*(constant('pi')/180)/2)**2)))))
as Dist_KM,
/* 2 */
3959*(2 * arsin(min(1,sqrt(SIN( ((LAT-43.670)*(constant('pi')/180))/2)**2 +
cos(43.670*(constant('pi')/180)) * cos(lat*(constant('pi')/180))
* sin ( ((lon-(-79.386))*(constant('pi')/180)/2)**2))))) as Dist_Miles,
/* 3 */
name, prov_state, town
from work.IMPORT
/* 4 */
where amenities like '%FT%'
and amenities like '%NP%';
quit;
So I am Canadian, born, raised and living in Toronto for most of my life. So in my example, I’m using the main intersection in Toronto (Yonge, pronounced young, and Bloor) as my starting point. I add the latitude and longitude in my comments at the very top so I can copy / paste them if needed.
In the first section, I am using the Haversine formula to calculate the distance in kilometers. You’ll see I have my latitude and longitude as part of the calculation; if you’re running this against two datasets, just replace the numbers with the names of your variables. I took the original code from Ivis’ paper and modified it. In his paper, he uses the data step, and runs each part separately. I prefer SQL and validating each step separately, but find it to be more efficient to run the whole thing at once.
The second section is basically the same as the first, with the exception that it’s calculating the distance in miles. You’ll notice that the only difference is the first number – this is the radius of the Earth (in km or miles, depending on what you’re calculating).
The third section is just a couple of variables from my data that I want included in my output, and the final section are my parameters. If you refer to the legend on the site, you’ll see I’m looking for all campgrounds with flushing toilets and no pets allowed.
When I run the code, I get exactly two results. Because the data is crowdsourced, it is not 100% complete so there may be plenty more campgrounds that meet our parameters, but these are the ones in our data.
Depending on your requirements, you can tweak the SQL (where amenities like '%FT%'
OR amenities like '%NP%'; where amenities NOT like '%FT%' and amenities NOT like '%NP%'; etc.) – play around and see what you can find!
Now it’s your turn!
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Thanks for sharing this, @DarthPathos! One clarification: the geo capabilities (such as the map data sets and PROC GEOCODE, GMAP, GPROJECT, etc) are part of SAS/Graph, not Base SAS. Many customers have SAS/Graph and so might take it for granted, but the SAS University Edition doesn't include it for two reasons:
Base SAS does include the geodist() function, and that works in SAS University Edition.
Example:
data _null_;
distance=geodist(30.68, -88.25, 35.43, -82.55, 'M');
put 'Distance = ' distance 'miles';
distance=geodist(30.68, -88.25, 35.43, -82.55, 'K');
put 'Distance = ' distance 'kilometers';
run;
Output:
Distance = 465.29081088 miles Distance = 748.6529147 kilometers
Oops - good catch about the SAS/Graph! As for the GEODIST, I'm embarassed to admit I did not know that, and should have tried it first. I'll update it for next week. Glad I put it to Revision rather than direct to post!
Have a great (long) weekend - we're off on Monday as well, so have two SAS books in my bag and lots of data I want to play with 😄
Chris
It's all good @DarthPathos! I think it's valuable to see the "old school" methods for calculating geodesic distance. The ARSIN, COS and PI references are a nice reminder that the Earth is round, and that measuring from "here" to "there" involves more than just a ruler on a map.
(BTW, this did make it to a public post -- maybe @BeverlyBrown is to thank for that...but now you have fodder for "the easy way" post.)
I think next week's article just wrote itself, thanks @ChrisHemedinger 😄
Have a good long weekend
Chris
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.