I have data which has groups and geographic coordinates. It looks like this:
Group | Location | Date | Lat | Long |
A | 13 | 1/4/2024 | 32.58497 | -95.5545 |
A | 16 | 4/2/2024 | 34.07736 | -99.9309 |
A | 25 | 1/15/2023 | 35.6381 | -104.508 |
A | 256 | 8/1/2024 | 37.27033 | -109.294 |
A | 22 | 1/1/2025 | . | . |
A | 125 | 3/4/2023 | 40.76247 | -119.535 |
A | 126 | 1/8/2023 | 42.62939 | -125.009 |
A | 214 | 10/3/2023 | 44.58182 | -130.735 |
B | 532 | 11/4/2023 | . | . |
B | 6543 | 12/1/2024 | 48.75903 | -142.984 |
B | 235 | 1/5/2025 | 50.99219 | -149.533 |
B | 241 | 6/3/2023 | 53.32764 | -156.382 |
B | 524 | 5/12/2024 | 55.77004 | -163.544 |
I have this code which sorts by Group and Date, and computes the cumulative distance between points for each group at each location, as well as the distance between locations (from prior) for each. The wrench in this is the missing data. I don't want to compute the distance between two locations if they are not successive. For instance, if one location is missing coordinates, then the distance values for that AND the next location should be null, and cumulative held constant until 2 locations AFTER the one with the missing data. How can I modify this code to do that? I also do the same with the dates between locations but there is no missing data to deal with there.
Proc sort data=my_data;
by Group Date;
run;
data my_data_new; set my_data;
by group;
cum_dist+geodist(lat,long, lag(lat), lag(long));
if first.group then cum_dist = 0;
run;
Data my_data_new1; set My_data_new;
dist_prev = sum(cum_dist, -lag1(cum_dist));
if first.group then dist_prev = .;
days_prev = abs(dif(Date));
if first.group then days_prev = .;
run;
Hello @RandoDando,
I think you're almost there. Just store the value returned by the GEODIST function in variable dist_prev rather than computing dist_prev from cumulative distances. You can also omit the second DATA step (which is missing a BY statement!) and do all calculations in the first as shown below:
/* Create sample data for demonstration */
data have;
call streaminit(27182818);
do g=1 to 5;
date=rand('integer','01JAN2022'd,'31DEC2022'd);
do location=1 to 30;
group=byte(64+g);
date+rand('integer',50);
if rand('bern',0.7) then do;
lat=rand('uniform',30,60);
long=rand('uniform',-165,-95);
end;
else call missing(lat, long);
output;
end;
end;
format date yymmdd10.;
run;
/* Compute distances, cumulative distances and time differences */
data want;
set have;
by group;
dist_prev=geodist(lat, long, lag(lat), lag(long));
cum_dist+dist_prev;
days_prev=dif(date);
if first.group then do;
cum_dist = 0;
call missing(dist_prev, days_prev);
end;
run;
Hello @RandoDando,
I think you're almost there. Just store the value returned by the GEODIST function in variable dist_prev rather than computing dist_prev from cumulative distances. You can also omit the second DATA step (which is missing a BY statement!) and do all calculations in the first as shown below:
/* Create sample data for demonstration */
data have;
call streaminit(27182818);
do g=1 to 5;
date=rand('integer','01JAN2022'd,'31DEC2022'd);
do location=1 to 30;
group=byte(64+g);
date+rand('integer',50);
if rand('bern',0.7) then do;
lat=rand('uniform',30,60);
long=rand('uniform',-165,-95);
end;
else call missing(lat, long);
output;
end;
end;
format date yymmdd10.;
run;
/* Compute distances, cumulative distances and time differences */
data want;
set have;
by group;
dist_prev=geodist(lat, long, lag(lat), lag(long));
cum_dist+dist_prev;
days_prev=dif(date);
if first.group then do;
cum_dist = 0;
call missing(dist_prev, days_prev);
end;
run;
OK, was just handed another wrench. How about if I have TWO dates (a start_date and End_date) and I want the days_prev to be the difference between the start date and the end_date of the previous record?
EDIT: got it.
days_prev = start_date - lag(end_date);
if first.group then days_prev = .;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.