I have a travel dataset as follows:
travelerid orig dest traveltime time_spent_at_dest orig_stop dest_stop
1 A B 5 0 Y N
1 B C 2 2 N N
1 C D 3 1 N Y
1 D E 2 1 Y Y
...........
And I want to sum up the time spent between any two consecutive stopping stations
(not counting the time spent at the current stopping station),
for example, the output dataset should look like:
travlerid orig dest traveltime
1 A D 12
1 D E 2
...........
Can anyone help me with this question?
Thanks!
Make sure you stash the original data somewhere safe ... if you ever get this data out of order it will be a monstrous task to reassemble it. You would be better off creating new fields (TRIP_ID, LEG) that would let you put the data back in order if a problem ever arose.
All that being said, here's an approach you can try for your problem:
data want;
set have;
if orig_stop='Y' then do;
total_time=0;
starting_point = orig;
end;
retain total_time starting_point;
if dest_stop = 'N' then total_time = total_time + traveltime + time_spent_at_dest;
else do;
total_time = total_time + traveltime;
ending_point = dest;
output;
end;
keep travelerid starting_point ending_point total_time;
run;
It's untested code, but should be OK.
Make sure you stash the original data somewhere safe ... if you ever get this data out of order it will be a monstrous task to reassemble it. You would be better off creating new fields (TRIP_ID, LEG) that would let you put the data back in order if a problem ever arose.
All that being said, here's an approach you can try for your problem:
data want;
set have;
if orig_stop='Y' then do;
total_time=0;
starting_point = orig;
end;
retain total_time starting_point;
if dest_stop = 'N' then total_time = total_time + traveltime + time_spent_at_dest;
else do;
total_time = total_time + traveltime;
ending_point = dest;
output;
end;
keep travelerid starting_point ending_point total_time;
run;
It's untested code, but should be OK.
Here's one way:
data want (keep=travelerid orig dest traveltime);
set have;
retain cumtime 0 RealOrig " ";
if orig_stop='Y' then do;
cumtime=0;
RealOrig=Orig;
end;
If dest_stop = 'N' then cumtime= sum(cumtime,traveltime,time_spent_at_dest);
if dest_stop='Y' then do;
cumtime= sum(cumtime,traveltime);
orig= RealOrig;
TravelTime=CumTime;
output;
end;
run;
This assumes well formed data: No traveller id starting with something for Orig_stop other than 'Y', last traveller Id is Dest_stop='Y'.
If you travell id doesn't behave that way then you'll need to make some decisions about how to handle the exceptions. They might be amenable to BY Travellerid processing with First and Last but no promises.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.