BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
RandoDando
Pyrite | Level 9

I have data which has groups and geographic coordinates.  It looks like this:

 

Group Location Date Lat Long
A 13 1/4/2024 32.58497 -95.5545
A 16 4/2/2024 34.07736 -99.9309
A 25 1/15/2023 35.6381 -104.508
A 256 8/1/2024 37.27033 -109.294
A 22 1/1/2025 . .
A 125 3/4/2023 40.76247 -119.535
A 126 1/8/2023 42.62939 -125.009
A 214 10/3/2023 44.58182 -130.735
B 532 11/4/2023 . .
B 6543 12/1/2024 48.75903 -142.984
B 235 1/5/2025 50.99219 -149.533
B 241 6/3/2023 53.32764 -156.382
B 524 5/12/2024 55.77004 -163.544

 

I have this code which sorts by Group and Date, and computes the cumulative distance between points for each group at each location, as well as the distance between locations (from prior) for each.  The wrench in this is the missing data.  I don't want to compute the distance between two locations if they are not successive.  For instance, if one location is missing coordinates, then the distance values for that AND the next location should be null, and cumulative held constant until 2 locations AFTER the one with the missing data.  How can I modify this code to do that?  I also do the same with the dates between locations but there is no missing data to deal with there.

 

Proc sort data=my_data;
by Group Date;
run;

data my_data_new; set my_data;
by group;

cum_dist+geodist(lat,long, lag(lat), lag(long));
if first.group then cum_dist = 0;
run;

Data my_data_new1; set My_data_new;

dist_prev = sum(cum_dist, -lag1(cum_dist));
if first.group then dist_prev = .;

days_prev = abs(dif(Date));
if first.group then days_prev = .;

run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @RandoDando,

 

I think you're almost there. Just store the value returned by the GEODIST function in variable dist_prev rather than computing dist_prev from cumulative distances. You can also omit the second DATA step (which is missing a BY statement!) and do all calculations in the first as shown below:

/* Create sample data for demonstration */

data have;
call streaminit(27182818);
do g=1 to 5;
  date=rand('integer','01JAN2022'd,'31DEC2022'd);
  do location=1 to 30;
    group=byte(64+g);
    date+rand('integer',50);
    if rand('bern',0.7) then do;
      lat=rand('uniform',30,60);
      long=rand('uniform',-165,-95);
    end;
    else call missing(lat, long);
    output;
  end;
end;
format date yymmdd10.;
run;

/* Compute distances, cumulative distances and time differences */

data want;
set have;
by group;
dist_prev=geodist(lat, long, lag(lat), lag(long));
cum_dist+dist_prev;
days_prev=dif(date);
if first.group then do;
  cum_dist = 0;
  call missing(dist_prev, days_prev);
end;
run;

View solution in original post

3 REPLIES 3
RandoDando
Pyrite | Level 9
I should note that the above code gives a prev_dist value of 0 for the location with missing data and the location after it. However, since some locations are the same in my actual data, I want to differentiate between 0 actual distance and 0 due to missing data. I hope that makes sense!
FreelanceReinh
Jade | Level 19

Hello @RandoDando,

 

I think you're almost there. Just store the value returned by the GEODIST function in variable dist_prev rather than computing dist_prev from cumulative distances. You can also omit the second DATA step (which is missing a BY statement!) and do all calculations in the first as shown below:

/* Create sample data for demonstration */

data have;
call streaminit(27182818);
do g=1 to 5;
  date=rand('integer','01JAN2022'd,'31DEC2022'd);
  do location=1 to 30;
    group=byte(64+g);
    date+rand('integer',50);
    if rand('bern',0.7) then do;
      lat=rand('uniform',30,60);
      long=rand('uniform',-165,-95);
    end;
    else call missing(lat, long);
    output;
  end;
end;
format date yymmdd10.;
run;

/* Compute distances, cumulative distances and time differences */

data want;
set have;
by group;
dist_prev=geodist(lat, long, lag(lat), lag(long));
cum_dist+dist_prev;
days_prev=dif(date);
if first.group then do;
  cum_dist = 0;
  call missing(dist_prev, days_prev);
end;
run;
RandoDando
Pyrite | Level 9

OK, was just handed another wrench. How about if I have TWO dates (a start_date and End_date) and I want the days_prev to be the difference between the start date and the end_date of the previous record?

 

EDIT:  got it. 

days_prev = start_date - lag(end_date);
if first.group then days_prev = .;

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 634 views
  • 1 like
  • 2 in conversation