11-03-2017 06:00 AM
I have a dataset A, that contains the lat and long. I have another dataset B that also contains lat and long.
Dataset A has close to 1m points and Dataset B has close to 20m points. I want iterate for all the points from dataset A and calculate the distance of points on the dataset B and if distance of any points is <500m, flag as 1 else 0.
So I have a variable within_500m, that says 1/0, when we calculate the distance from points in dataset A.
Is there any efficient way to do this, as this combination is really huge. I was thinking about making sort of buckets on both the datasets using rank function and process them as blocks, but not sure, if that will give expected results.
Any ideas, I can try?
I saw a post on GEODIST, for cross join, but not sure if thats an efficient method:
11-03-2017 10:10 AM
Pretty much as soon as you say "every record in set a with every record in set b" efficiency goes out the window, especially with the specific requirement to mark every result.
You idea of blocks or ranks will add complexity and time to the "every record" requirement.
11-03-2017 10:31 AM
I hope you realize that you are talking about making (1M x 20M) = 2 x 10^13 = 20 TRILLION comparisons. If you store the (long,lat) for both points, the indicator variable, and the distance, that will occupy 900 terabytes of storage.
500 meters is really close. What are the ranges of (long, lat) for these data sets? Are they all in a small area (maybe a state or county) or are the locations global (some in Europe, others in Asi, others in N. America....)? Are any points near the poles?
You can look at the raw (long, lat) values to exclude a bunch of comparisons. A degree of latitude is more than 100 km, so you never have to call GEODIST unless the difference between latitudes is a fraction of a degree. If you can bound the latitude away from the poles, we can say more.
11-06-2017 02:55 AM
11-06-2017 09:19 AM
I wouldn't recommend using ZIP codes. Two addresses can be next door to each other and have different zip codes. In fact, two addresses can be in different STATES and still be within 500 meters.