I have data which has groups and geographic coordinates. It looks like this:
Group
Location
Date
Lat
Long
A
13
1/4/2024
32.58497
-95.5545
A
16
4/2/2024
34.07736
-99.9309
A
25
1/15/2023
35.6381
-104.508
A
256
8/1/2024
37.27033
-109.294
A
22
1/1/2025
.
.
A
125
3/4/2023
40.76247
-119.535
A
126
1/8/2023
42.62939
-125.009
A
214
10/3/2023
44.58182
-130.735
B
532
11/4/2023
.
.
B
6543
12/1/2024
48.75903
-142.984
B
235
1/5/2025
50.99219
-149.533
B
241
6/3/2023
53.32764
-156.382
B
524
5/12/2024
55.77004
-163.544
I have this code which sorts by Group and Date, and computes the cumulative distance between points for each group at each location, as well as the distance between locations (from prior) for each. The wrench in this is the missing data. I don't want to compute the distance between two locations if they are not successive. For instance, if one location is missing coordinates, then the distance values for that AND the next location should be null, and cumulative held constant until 2 locations AFTER the one with the missing data. How can I modify this code to do that? I also do the same with the dates between locations but there is no missing data to deal with there.
Proc sort data=my_data;
by Group Date;
run;
data my_data_new; set my_data;
by group;
cum_dist+geodist(lat,long, lag(lat), lag(long));
if first.group then cum_dist = 0;
run;
Data my_data_new1; set My_data_new;
dist_prev = sum(cum_dist, -lag1(cum_dist));
if first.group then dist_prev = .;
days_prev = abs(dif(Date));
if first.group then days_prev = .;
run;
... View more