Calculating a dynamic cluster average based on geo-coordinates

Accepted Solution Solved
Reply
Contributor
Posts: 23
Accepted Solution

Calculating a dynamic cluster average based on geo-coordinates

Hi all

 

Taking the data below, I have a challenge calculating the cluster average for variable x,. where the cluster average should be based on all observations no further than 250 meters away. In other words, we could person A living at

36.0980903	-109.0386987

having an income of x = 1000 $ a week. My goal would be to compare his income to the average income y of citizens in his neighbourhood living within a radius of 250 meters.

 

This would mean a dynamic and possibly different cluster group for every observation.

 

How would this be done?

 

Thanks Smiley Happy

 

data have1;		
infile cards;		
input Latitude     Longitude     x;		
cards;		
36.0980903	-109.0386987	1000
36.0984013	-109.0385617	1500
36.0989923	-109.0300557	2000
36.0976343	-109.0301467	750
36.0969993	-109.0333207	850
36.0993153	-109.0354437	900
36.0997053	-109.0357997	347
36.0976233	-109.0287737	650
36.1013493	-109.0335057	1100
36.1023103	-109.0330507	1200
36.1018293	-109.0341797	952
36.0861703	-109.0128447	652
36.1056053	-109.0212337	311
36.1065093	-109.0217417	1000
36.1065093	-109.0217417	1500
36.0983103	-109.0084197	2000
36.1019023	-109.0176657	750
36.0975443	-109.0091287	850
36.0928063	-109.0243767	900
36.0990893	-109.0535547	347
36.0980763	-109.0530487	650
36.0962203	-109.0111137	1100
36.1026333	-109.0195427	1200
36.1030713	-109.0249407	952
36.0900623	-109.0227547	652
36.1044493	-109.0112987	311
36.1093803	-109.0201137	1000
36.1085193	-109.0194697	1500
36.1093893	-109.0194947	2000
36.0871543	-109.0235497	750
36.0880483	-109.0244707	850
36.1016473	-109.0191277	900
36.0939833	-109.0551347	347
36.0936903	-109.0535577	650
36.0988083	-109.0595137	1100
36.0920383	-109.0488857	1200
36.0916593	-109.0474277	952
36.0921533	-109.0462577	652
36.1099053	-109.0181187	311
36.0988563	-109.0462907	5000;		
run;		

 


Accepted Solutions
Solution
‎12-08-2016 03:52 AM
Super User
Posts: 9,856

Re: Calculating a dynamic cluster average based on geo-coordinates

This could give you a start.



data have;		
infile cards truncover expandtabs;
id+1;			
input Latitude     Longitude     x;	
cards;		
36.0980903	-109.0386987	1000
36.0984013	-109.0385617	1500
36.0989923	-109.0300557	2000
36.0976343	-109.0301467	750
36.0969993	-109.0333207	850
36.0993153	-109.0354437	900
36.0997053	-109.0357997	347
36.0976233	-109.0287737	650
36.1013493	-109.0335057	1100
36.1023103	-109.0330507	1200
36.1018293	-109.0341797	952
36.0861703	-109.0128447	652
36.1056053	-109.0212337	311
36.1065093	-109.0217417	1000
36.1065093	-109.0217417	1500
36.0983103	-109.0084197	2000
36.1019023	-109.0176657	750
36.0975443	-109.0091287	850
36.0928063	-109.0243767	900
36.0990893	-109.0535547	347
36.0980763	-109.0530487	650
36.0962203	-109.0111137	1100
36.1026333	-109.0195427	1200
36.1030713	-109.0249407	952
36.0900623	-109.0227547	652
36.1044493	-109.0112987	311
36.1093803	-109.0201137	1000
36.1085193	-109.0194697	1500
36.1093893	-109.0194947	2000
36.0871543	-109.0235497	750
36.0880483	-109.0244707	850
36.1016473	-109.0191277	900
36.0939833	-109.0551347	347
36.0936903	-109.0535577	650
36.0988083	-109.0595137	1100
36.0920383	-109.0488857	1200
36.0916593	-109.0474277	952
36.0921533	-109.0462577	652
36.1099053	-109.0181187	311
36.0988563	-109.0462907	5000
;		
run;	
proc sql;
create table want as
select a.*,geodist(a.Latitude,a.Longitude,b.Latitude,b.Longitude)*1000 as distance
 from have as a, have as b 
  where a.id ne b.id and calculated distance le 250
   order by a.id;
quit;
  	

View solution in original post


All Replies
Solution
‎12-08-2016 03:52 AM
Super User
Posts: 9,856

Re: Calculating a dynamic cluster average based on geo-coordinates

This could give you a start.



data have;		
infile cards truncover expandtabs;
id+1;			
input Latitude     Longitude     x;	
cards;		
36.0980903	-109.0386987	1000
36.0984013	-109.0385617	1500
36.0989923	-109.0300557	2000
36.0976343	-109.0301467	750
36.0969993	-109.0333207	850
36.0993153	-109.0354437	900
36.0997053	-109.0357997	347
36.0976233	-109.0287737	650
36.1013493	-109.0335057	1100
36.1023103	-109.0330507	1200
36.1018293	-109.0341797	952
36.0861703	-109.0128447	652
36.1056053	-109.0212337	311
36.1065093	-109.0217417	1000
36.1065093	-109.0217417	1500
36.0983103	-109.0084197	2000
36.1019023	-109.0176657	750
36.0975443	-109.0091287	850
36.0928063	-109.0243767	900
36.0990893	-109.0535547	347
36.0980763	-109.0530487	650
36.0962203	-109.0111137	1100
36.1026333	-109.0195427	1200
36.1030713	-109.0249407	952
36.0900623	-109.0227547	652
36.1044493	-109.0112987	311
36.1093803	-109.0201137	1000
36.1085193	-109.0194697	1500
36.1093893	-109.0194947	2000
36.0871543	-109.0235497	750
36.0880483	-109.0244707	850
36.1016473	-109.0191277	900
36.0939833	-109.0551347	347
36.0936903	-109.0535577	650
36.0988083	-109.0595137	1100
36.0920383	-109.0488857	1200
36.0916593	-109.0474277	952
36.0921533	-109.0462577	652
36.1099053	-109.0181187	311
36.0988563	-109.0462907	5000
;		
run;	
proc sql;
create table want as
select a.*,geodist(a.Latitude,a.Longitude,b.Latitude,b.Longitude)*1000 as distance
 from have as a, have as b 
  where a.id ne b.id and calculated distance le 250
   order by a.id;
quit;
  	

Super User
Posts: 19,038

Re: Calculating a dynamic cluster average based on geo-coordinates

How big is your actual data since you have to make a lot of comparisons, efficiency may be very important. 

 

You can can look into a SQL cross join and filter based on

 

WHERE geodist(....., 'M') < 250;

Contributor
Posts: 23

Re: Calculating a dynamic cluster average based on geo-coordinates

@ReezaThe data is about 100k obs.

But will I be able to make a dynamic / different cluster for every observation with SQL?

It will surely be the most efficiebt, but Im not sure how to generate the code for that?

 

 

Contributor
Posts: 23

Re: Calculating a dynamic cluster average based on geo-coordinates

Thanks for you contributions.

 

I believe it can be solved with a proper SQL join and the geodist function.Anyone knows if geodist works with different latitude/longitude variables, i.e. degrees vs. GPS coordinates / decimal degrees ?

 

/ MikA

Super User
Posts: 9,856

Re: Calculating a dynamic cluster average based on geo-coordinates

Yes. I think so. Check the documentation and see how to use GEODIST().



☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 301 views
  • 1 like
  • 3 in conversation