I want to find avg latitude and longitude of as set of coordinates based on the distance they are from each other. Consider the following example set:
data test;
input lat long lat1 long1 date dt hdist;
format date dt date9.;
datalines;
55.173874 -118.86073 55.173763 -118.85976 20222 20230 0.062882
55.173786 -118.85968 55.173763 -118.85979 20223 20234 0.007439
55.173702 -118.85984 55.173775 -118.85979 20225 20235 0.008716
55.173775 -118.85976 54.685486 -118.77554 20300 20314 54.561229
55.173763 -118.85976 54.685551 -118.77554 20301 20315 54.552708
;
run;
I haven't figured out much SAS cdoe to achieve the results. Here is some psuedo code I have thought up.
if abs(hdist_i - hdist_{i+1}) < 5 then sum(lat) and sum(lat1) and sum(long) and sum(long1) and cnt + 1;
Therefore, I would sum lat, long, lat1, long1 for lines 1-3 and have a cnt of 3 and sum lat, long, lat1, long1 for lines 4-5 and have a cnt of 2 so I can take the sums and divide by their respective counts to obtain the average.
From that data, I want to return:
avg(lat) avg(long) avg(lat1) avg(long1) 20222 20235 for rows 1-3 avg(lat) avg(long) avg(lat1) avg(long1) 20300 20315 for rows 4-5
It sounds like you want to have a classification variable (distance) and then use Proc means or proc summary on the remaining variables to get a mean.
Easiest might be to create a custom format to group your distance variable into your desired groups. You don't provide a lot of example but something like:
proc format;
value DistGroup
0.005 - 0.07 = 'Group 1'
54.5 -54.6 = 'Group 2'
;
proc summary data=have;
class hdist;
format hdist DistGroup.;
var lat long lat1 long1 dt date;
output out= want mean( lat long lat1 long1 )= min(date)= max(dt)= ;
run;
guessing on the min and max for the date variables.
Setting groups would be impossible for this set of data. I just pulled out a quick example but we are talking about 100million rows. Also, I would rather have this in a table then a proc means output.
So, what are the actual rules for which records are used to calculate a mean? You did not provide any rules just an abitrary first 3 rows and second 2 rows.
@dwsmith Please recall that you're asking from free help for volunteers. If you review the code suggested, you'll notice that it also produces a dataset called, ironically, WANT.
Change the proc format to categorize the distance into less than 5 or higher than 5.
proc format;
value DistGroup
0 - 5 = 'Group 1'
5 - high = 'Group 2'
other = 'CHECKME'
;
proc means data=test nway noprint;
class hdist;
format hdist DistGroup.;
var lat long lat1 long1 dt date;
output out= want mean( lat long lat1 long1 )= min(date)= max(dt)= ;
run;
If this doesn't work please post your code, log and explain what is not working.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.