Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Looking for an efficient method of comparison for distance

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-19-2016 05:59 PM
(1751 views)

I'm trying to find a method of comparison for distance between particular locations that outputs only those observations that meet specific criteria.

Take, for example, the following data set,

```
data HAVE;
input Store Lat Lon;
datalines;
1164 42.30861 -83.44031
1251 42.32282 -83.45289
1927 41.67517 -87.7969
6523 43.81187 -97.72712
;
run;
```

where * Store *assigns an identifier code to a given store,

What I'd like to do is compare each store against every other store, calculate the distance between them using the GEODIST function, and then output only those stores which are within three miles of each other and not equal to zero.

So, Store 1164 gets compared to Store 1164 and the GEODIST is calculated as 0.

Store 1164 gets compared to 1251 and the GEODIST is calculated as 1.17 miles.

Store 1164 gets compared to 1927 and the GEODIST is calculated as 228.55 miles.

Store 1164 gets compared to 6523 and the GEODIST is calculated as 729.79 miles.

Therefore, the only comparison which is output is the 1164 to 1251 comparison.

Then the process repeats and Store 1251 is compared to all other stores.

What I * don't* want to do is use multiple steps, create a Cartesian product using, another data set, and apply my logic qualifiers.

I was hoping someone far more clever than I was out who had a more efficient method.

I was wondering if it might be possible to put the HAVE data set in a SET statement, then use a DO loop, set the HAVE data set again, and calculate all distances then combine this with an OUTPUT statement to do everything in one go.

I'd appreciate any suggestions!

Thanks

P.S. - The GEODIST function is as follows for anyone who is not familiar:

=GEODIST(*latitude1*,*longitude1*,*latitude2*,*longitude2*, 'M')

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

By definition what you're doing is a Cartesian product.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I'd like to find a method to accomplish this task without outputting a new data set and instead, hopefully, evaluate these conditions in the PDV.

Do you have any suggestions, or were you just looking for an opportunity to increase your post count?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@KyleM_Corrie wrote:

Do you have any suggestions, or were you just looking for an opportunity to increase your post count?

Is there a reward for post count that I missed?!?

if you can define a problem your closer to solving it - ie can google efficient Cartesian product.

The last time I dealt with this issue Using a double set with key was the most efficient solution.

A hash solution may also be very efficient. Both of these would output a new data set though.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for the suggestions. I only mentioned the post count because I didn't see what other purpose your original post could have served.

The method I described was only mentioned to describe what it was I hoped to achieve. It has been the only way I can figure out how to make it work, but I'd like to imagine there are much better methods.

Currently, it requires me to import data that is stored in an XLSX file, create a new data set using PROC SQL which results in a Cartesian product, and then evaluate my conditions on the new data set in a different data step.

It's not pretty, but it gets the job done.

I was just curious if one of you guys had any way to make it pretty as well!

Again, thanks much for the suggestions. It could be that what I want to do just can't be done in the manner I'd like it to be done.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Use The previous SQL query I provided with a WHERE clause (where dist<3). No extra data step, but not super efficient.

I used to do this in SAS but have since moved towards using GIS tools for these types of analysis. Theyre optimized, and its usually available via GUI. QGIS is free and open source. It used to be okay to approximate with geodist but nowadays driving distance and travel time are more important.

I used to do this in SAS but have since moved towards using GIS tools for these types of analysis. Theyre optimized, and its usually available via GUI. QGIS is free and open source. It used to be okay to approximate with geodist but nowadays driving distance and travel time are more important.

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.