DATA Step, Macro, Functions and more

Summing and counting based on a parameter difference

Reply
Contributor
Posts: 40

Summing and counting based on a parameter difference

I want to find avg latitude and longitude of as set of coordinates based on the distance they are from each other. Consider the following example set:

 

data test;
     input lat long lat1 long1 date dt hdist;
     format date dt date9.;
     datalines;
     55.173874 -118.86073 55.173763 -118.85976 20222 20230 0.062882
     55.173786 -118.85968 55.173763 -118.85979 20223 20234 0.007439
     55.173702 -118.85984 55.173775 -118.85979 20225 20235 0.008716
     55.173775 -118.85976 54.685486 -118.77554 20300 20314 54.561229
     55.173763 -118.85976 54.685551 -118.77554 20301 20315 54.552708
     ;
run;

I haven't figured out much SAS cdoe to achieve the results. Here is some psuedo code I have thought up.

 

 

if abs(hdist_i - hdist_{i+1}) < 5 then sum(lat) and sum(lat1) and sum(long) and sum(long1) and cnt + 1;

 

Therefore, I would sum lat, long, lat1, long1 for lines 1-3 and have a cnt of 3 and sum lat, long, lat1, long1 for lines 4-5 and have a cnt of 2 so I can take the sums and divide by their respective counts to obtain the average.

 

From that data, I want to return:

 

avg(lat) avg(long) avg(lat1) avg(long1) 20222 20235 for rows 1-3
avg(lat) avg(long) avg(lat1) avg(long1) 20300 20315 for rows 4-5

 

Super User
Posts: 10,500

Re: Summing and counting based on a parameter difference

It sounds like you want to have a classification variable (distance) and then use Proc means or proc summary on the remaining variables to get a mean.

Easiest might be to create a custom format to group your distance variable into your desired groups. You don't provide a lot of example but something like:

 

proc format;

value DistGroup

0.005 - 0.07 = 'Group 1'

54.5 -54.6   = 'Group 2'

;

 

proc summary data=have;

   class hdist;

   format hdist DistGroup.;

   var lat long lat1 long1 dt date;

   output out= want mean( lat long lat1 long1 )=   min(date)=   max(dt)= ;

run;

 

guessing on the min and max for the date variables.

 

Contributor
Posts: 40

Re: Summing and counting based on a parameter difference

[ Edited ]

Setting groups would be impossible for this set of data. I just pulled out a quick example but we are talking about 100million rows. Also, I would rather have this in a table then a proc means output.

Super User
Posts: 10,500

Re: Summing and counting based on a parameter difference

So, what are the actual rules for which records are used to calculate a mean? You did not provide any rules just an abitrary first 3 rows and second 2 rows.

Contributor
Posts: 40

Re: Summing and counting based on a parameter difference

I did. I said if the absolute value of ith hdist - ith +1 hdist is less than 5
Super User
Posts: 17,837

Re: Summing and counting based on a parameter difference

[ Edited ]

@dwsmith Please recall that you're asking from free help for volunteers. If you review the code suggested, you'll notice that it also produces a dataset called, ironically, WANT. 

 

Change the proc format to categorize the distance into less than 5 or higher than 5. 

 

 

proc format;
value DistGroup
0 - 5 = 'Group 1'
5 - high  = 'Group 2'
other = 'CHECKME'
;
 
proc means data=test nway noprint;
   class hdist;
   format hdist DistGroup.;
   var lat long lat1 long1 dt date;
   output out= want mean( lat long lat1 long1 )=   min(date)=   max(dt)= ;
run;

If this doesn't work please post your code, log and explain what is not working. 

 

Super User
Posts: 10,500

Re: Summing and counting based on a parameter difference

Unfortunately I am using Internet Explorer and code posted with braces as the start { does not render properly from this forum. And since your example did not have a variable hdist_i I couldn't tell what impact that had on the problem.
Ask a Question
Discussion stats
  • 6 replies
  • 265 views
  • 0 likes
  • 3 in conversation