We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Basic Geospatial Analysis with SAS University Edition

by Regular Contributor on ‎10-07-2016 01:53 PM (748 Views)

Geospatial analysis is one of my favourite types of analysis -- being able to take potentially complex, messy, and big data and see how it fits onto a map is just so cool.  Whether it’s looking at people who have come down with the flu, broken watermains, or traffic accidents, seeing it on a map can be much more effective that seeing it on a “regular” graph.Geospatial analysis with University Edition.jpg

 

Unfortunately, SAS University Edition does not come with the powerful geographic analytical tools that are standard in Base SAS, nor does University Edition have the ability to create the maps that can be done.  Having said that, there are some cool things you can do with some creativity, patience, and a bunch of SQL.

 

Get the Data

I’m using a crowdsourced dataset on campgrounds – I used to love camping as a kid, and was really excited to see this data.  The original dataset is multiple CSV files, which I merged, cleaned (I didn’t need all the variables, and there was 1 duplicate FreeDataFriday_graphic.jpgrecord) and imported into SAS University Edition.  The data can be downloaded here, and I recommend you copy / paste the list of short forms from the site into your code for easy reference. 

 

How to go about getting SAS University Edition

If you don’t already have University Edition, get it here and follow the instructions from the pdf carefully. If you need help with almost any aspect of using University Edition, check out these video tutorials. Additional resources are available in this article.

 

Getting the data ready

Other than merging the data into a single Excel file, there wasn’t much I had to do.  Before I imported it, I used the Excel “delete duplicates” tool (which removed the 1 record I mentioned above) and then imported it into the University Edition environment.

 

The Results

So one of the key questions you can answer using geospatial analysis is the entity (campground, library, school, etc.) that is closest to another entity (your location, all the other campgrounds / libraries / schools, etc.). 

 

There are a few different methods to determining distance between two places; refer to Calculating Geographic Distance: Concepts and Methods by Frank Ivis to see detailed examples of the more common ones.  I have typically used “straight line” distance, as it’s the least computer intensive and does provide a decent approximation.  I have always had the most success
using the Haversine formula, as the distances are always closest to what I get using mapping software.  This is the method I use below.

 

Now that we have our campground data, I want to see what the closest campgrounds are that meet specific parameters.  Let’s take a look at the code and break it down.

 

 

proc sql;
 
select
 
/* -43.670, -79.386  */
 
/*    1   */
/* (note – the (constant(‘pi’) 
is the way you use Pi in SAS, and works for e, i and other constants you may need */   6371*(2 * arsin(min(1,sqrt(SIN( ((LAT-43.670)*(constant('pi')/180))/2)**2 +      cos(43.670*(constant('pi')/180)) * cos(lat*(constant('pi')/180))      * sin ( ((lon-(-79.386))*(constant('pi')/180)/2)**2))))) as Dist_KM,   /*    2   */   3959*(2 * arsin(min(1,sqrt(SIN( ((LAT-43.670)*(constant('pi')/180))/2)**2 +      cos(43.670*(constant('pi')/180)) * cos(lat*(constant('pi')/180))      * sin ( ((lon-(-79.386))*(constant('pi')/180)/2)**2))))) as Dist_Miles,   /*    3   */   name, prov_state, town           from work.IMPORT   /*   4   */   where amenities like '%FT%' and amenities like '%NP%'; quit;

 

 

So I am Canadian, born, raised and living in Toronto for most of my life.  So in my example, I’m using the main intersection in Toronto (Yonge, pronounced young, and Bloor) as my starting point.  I add the latitude and longitude in my comments at the very top so I can copy / paste them if needed. 

 

In the first section, I am using the Haversine formula to calculate the distance in kilometers.  You’ll see I have my latitude and longitude as part of the calculation; if you’re running this against two datasets, just replace the numbers with the names of your variables.  I took the original code from Ivis’ paper and modified it.  In his paper, he uses the data step, and runs each part separately.  I prefer SQL and validating each step separately, but find it to be more efficient to run the whole thing at once. 

 

The second section is basically the same as the first, with the exception that it’s calculating the distance in miles.  You’ll notice that the only difference is the first number – this is the radius of the Earth (in km or miles, depending on what you’re calculating). 

 

The third section is just a couple of variables from my data that I want included in my output, and the final section are my parameters.  If you refer to the legend on the site, you’ll see I’m looking for all campgrounds with flushing toilets and no pets allowed. 

 

When I run the code, I get exactly two results.  Because the data is crowdsourced, it is not 100% complete so there may be plenty more campgrounds that meet our parameters, but these are the ones in our data. 

 

image1.png

 

Depending on your requirements, you can tweak the SQL (where amenities like '%FT%'

OR amenities like '%NP%'; where amenities NOT like '%FT%' and amenities NOT like '%NP%'; etc.) – play around and see what you can find!

 

Now it’s your turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

Need data for learning?

 

The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:

 

Image11.png

 

We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:

 

IMAGE12.png

 

Click Analytics U, then select "Subscribe" from the Options menu.

 

Happy Learning!

Comments
by Community Manager
on ‎10-07-2016 02:50 PM

Thanks for sharing this, @DarthPathos! One clarification: the geo capabilities (such as the map data sets and PROC GEOCODE, GMAP, GPROJECT, etc) are part of SAS/Graph, not Base SAS.  Many customers have SAS/Graph and so might take it for granted, but the SAS University Edition doesn't include it for two reasons:

  1. Size -- those data sets are BIG
  2. ODS Graphics - aside from the mapping features, PROC SGPLOT and other SG* techniques cover just about everything you need from a #DataViz perspective -- and that's part of Base SAS and thus part of SAS University Edition.

Base SAS does include the geodist() function, and that works in SAS University Edition. 

 

Example:

 

data _null_;
   distance=geodist(30.68, -88.25, 35.43, -82.55, 'M');
   put 'Distance = ' distance 'miles';
   distance=geodist(30.68, -88.25, 35.43, -82.55, 'K');
   put 'Distance = ' distance 'kilometers';   
run;

 

Output:

 Distance = 465.29081088 miles
 Distance = 748.6529147 kilometers
by Regular Contributor
on ‎10-07-2016 07:21 PM

Oops - good catch about the SAS/Graph!  As for the GEODIST, I'm embarassed to admit I did not know that, and should have tried it first.  I'll update it for next week.  Glad I put it to Revision rather than direct to post!  

 

Have a great (long) weekend - we're off on Monday as well, so have two SAS books in my bag and lots of data I want to play with :D

Chris

 

by Community Manager
on ‎10-08-2016 12:33 PM

It's all good @DarthPathos! I think it's valuable to see the "old school" methods for calculating geodesic distance.  The ARSIN, COS and PI references are a nice reminder that the Earth is round, and that measuring from "here" to "there" involves more than just a ruler on a map.  

 

(BTW, this did make it to a public post -- maybe @BeverlyBrown is to thank for that...but now you have fodder for "the easy way" post.)

by Regular Contributor
on ‎10-08-2016 02:14 PM

I think next week's article just wrote itself, thanks @ChrisHemedinger :D

 

Have a good long weekend

Chris

Your turn
Sign In!

Want to write an article? Sign in with your profile.