BookmarkSubscribeRSS Feed
munitech4u
Quartz | Level 8

I have a dataset  A, that contains the lat and long. I have another dataset B that also contains lat and long.

 

Dataset A has close to 1m points and Dataset B has close to 20m points.  I want iterate for all the points from dataset A and calculate the distance of points on the dataset B and if distance of any points is <500m, flag as 1 else 0.

 

So I have a variable within_500m, that says 1/0, when we calculate the distance from points in dataset A.

 

Is there any efficient way to do this, as this combination is really huge. I was thinking about making sort of buckets on both the datasets using rank function and process them as blocks, but not sure, if that will give expected results.

 

Any ideas, I can try?

 

I saw a post on GEODIST, for cross join, but not sure if thats an efficient method:

https://communities.sas.com/t5/SAS-Statistical-Procedures/Calculating-All-GEODIST-Combos/m-p/255163#...

4 REPLIES 4
ballardw
Super User

Pretty much as soon as you say "every record in set a with every record in set b" efficiency goes out the window, especially with the specific requirement to mark every result.

 

You idea of blocks or ranks will add complexity and time to the "every record" requirement.

Rick_SAS
SAS Super FREQ

I hope you realize that you are talking about making (1M x 20M) = 2 x 10^13 = 20 TRILLION comparisons. If you store the (long,lat) for both points, the indicator variable, and the distance, that will occupy 900 terabytes of storage.

 

500 meters is really close.  What are the ranges of (long, lat) for these data sets? Are they all in a small area (maybe a state or county) or are the locations global (some in Europe, others in Asi, others in N. America....)?  Are any points near the poles?

 

You can look at the raw (long, lat) values to exclude a bunch of comparisons. A degree of latitude is more than 100 km, so you never have to call GEODIST unless the difference between latitudes is a fraction of a degree.  If you can bound the latitude away from the poles, we can say more.

 

munitech4u
Quartz | Level 8
The latitude and longitude are across the United States. But as you pointed out 1 degree in lat and long is more than 100 KM that defeats my purpose. I would rather go with zip variable.
Rick_SAS
SAS Super FREQ

I wouldn't recommend using ZIP codes. Two addresses can be next door to each other and have different zip codes. In fact, two addresses can be in different STATES and still be within 500 meters. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1409 views
  • 0 likes
  • 3 in conversation