Contributor
Posts: 40

# Summing and counting based on a parameter difference

I want to find avg latitude and longitude of as set of coordinates based on the distance they are from each other. Consider the following example set:

``````data test;
input lat long lat1 long1 date dt hdist;
format date dt date9.;
datalines;
55.173874 -118.86073 55.173763 -118.85976 20222 20230 0.062882
55.173786 -118.85968 55.173763 -118.85979 20223 20234 0.007439
55.173702 -118.85984 55.173775 -118.85979 20225 20235 0.008716
55.173775 -118.85976 54.685486 -118.77554 20300 20314 54.561229
55.173763 -118.85976 54.685551 -118.77554 20301 20315 54.552708
;
run;``````

I haven't figured out much SAS cdoe to achieve the results. Here is some psuedo code I have thought up.

if abs(hdist_i - hdist_{i+1}) < 5 then sum(lat) and sum(lat1) and sum(long) and sum(long1) and cnt + 1;

Therefore, I would sum lat, long, lat1, long1 for lines 1-3 and have a cnt of 3 and sum lat, long, lat1, long1 for lines 4-5 and have a cnt of 2 so I can take the sums and divide by their respective counts to obtain the average.

From that data, I want to return:

```avg(lat) avg(long) avg(lat1) avg(long1) 20222 20235 for rows 1-3
avg(lat) avg(long) avg(lat1) avg(long1) 20300 20315 for rows 4-5```

Super User
Posts: 13,528

## Re: Summing and counting based on a parameter difference

It sounds like you want to have a classification variable (distance) and then use Proc means or proc summary on the remaining variables to get a mean.

Easiest might be to create a custom format to group your distance variable into your desired groups. You don't provide a lot of example but something like:

proc format;

value DistGroup

0.005 - 0.07 = 'Group 1'

54.5 -54.6   = 'Group 2'

;

proc summary data=have;

class hdist;

format hdist DistGroup.;

var lat long lat1 long1 dt date;

output out= want mean( lat long lat1 long1 )=   min(date)=   max(dt)= ;

run;

guessing on the min and max for the date variables.

Contributor
Posts: 40

## Re: Summing and counting based on a parameter difference

[ Edited ]

Setting groups would be impossible for this set of data. I just pulled out a quick example but we are talking about 100million rows. Also, I would rather have this in a table then a proc means output.

Super User
Posts: 13,528

## Re: Summing and counting based on a parameter difference

So, what are the actual rules for which records are used to calculate a mean? You did not provide any rules just an abitrary first 3 rows and second 2 rows.

Contributor
Posts: 40

## Re: Summing and counting based on a parameter difference

I did. I said if the absolute value of ith hdist - ith +1 hdist is less than 5
Super User
Posts: 23,703

## Re: Summing and counting based on a parameter difference

[ Edited ]

@dwsmith Please recall that you're asking from free help for volunteers. If you review the code suggested, you'll notice that it also produces a dataset called, ironically, WANT.

Change the proc format to categorize the distance into less than 5 or higher than 5.

``````proc format;
value DistGroup
0 - 5 = 'Group 1'
5 - high  = 'Group 2'
other = 'CHECKME'
;

proc means data=test nway noprint;
class hdist;
format hdist DistGroup.;
var lat long lat1 long1 dt date;
output out= want mean( lat long lat1 long1 )=   min(date)=   max(dt)= ;
run;``````

If this doesn't work please post your code, log and explain what is not working.

Super User
Posts: 13,528

## Re: Summing and counting based on a parameter difference

Unfortunately I am using Internet Explorer and code posted with braces as the start { does not render properly from this forum. And since your example did not have a variable hdist_i I couldn't tell what impact that had on the problem.
Discussion stats
• 6 replies
• 279 views
• 0 likes
• 3 in conversation