Help using Base SAS procedures

Calculating GEODIST for various combinations, cross looping

Reply
Regular Contributor
Posts: 196

Calculating GEODIST for various combinations, cross looping

I have a dataset  A, that contains the lat and long. I have another dataset B that also contains lat and long.

 

Dataset A has close to 1m points and Dataset B has close to 20m points.  I want iterate for all the points from dataset A and calculate the distance of points on the dataset B and if distance of any points is <500m, flag as 1 else 0.

 

So I have a variable within_500m, that says 1/0, when we calculate the distance from points in dataset A.

 

Is there any efficient way to do this, as this combination is really huge. I was thinking about making sort of buckets on both the datasets using rank function and process them as blocks, but not sure, if that will give expected results.

 

Any ideas, I can try?

 

I saw a post on GEODIST, for cross join, but not sure if thats an efficient method:

https://communities.sas.com/t5/SAS-Statistical-Procedures/Calculating-All-GEODIST-Combos/m-p/255163#...

Super User
Posts: 13,521

Re: Calculating GEODIST for various combinations, cross looping

Posted in reply to munitech4u

Pretty much as soon as you say "every record in set a with every record in set b" efficiency goes out the window, especially with the specific requirement to mark every result.

 

You idea of blocks or ranks will add complexity and time to the "every record" requirement.

SAS Super FREQ
Posts: 4,240

Re: Calculating GEODIST for various combinations, cross looping

Posted in reply to munitech4u

I hope you realize that you are talking about making (1M x 20M) = 2 x 10^13 = 20 TRILLION comparisons. If you store the (long,lat) for both points, the indicator variable, and the distance, that will occupy 900 terabytes of storage.

 

500 meters is really close.  What are the ranges of (long, lat) for these data sets? Are they all in a small area (maybe a state or county) or are the locations global (some in Europe, others in Asi, others in N. America....)?  Are any points near the poles?

 

You can look at the raw (long, lat) values to exclude a bunch of comparisons. A degree of latitude is more than 100 km, so you never have to call GEODIST unless the difference between latitudes is a fraction of a degree.  If you can bound the latitude away from the poles, we can say more.

 

Regular Contributor
Posts: 196

Re: Calculating GEODIST for various combinations, cross looping

The latitude and longitude are across the United States. But as you pointed out 1 degree in lat and long is more than 100 KM that defeats my purpose. I would rather go with zip variable.
SAS Super FREQ
Posts: 4,240

Re: Calculating GEODIST for various combinations, cross looping

Posted in reply to munitech4u

I wouldn't recommend using ZIP codes. Two addresses can be next door to each other and have different zip codes. In fact, two addresses can be in different STATES and still be within 500 meters. 

Ask a Question
Discussion stats
  • 4 replies
  • 196 views
  • 0 likes
  • 3 in conversation