I would like to link sas time period together for the same series of ids by certain conditions,
for example:
Ids start_time end_time start-previous_end
1 14280 16820 .
1 17000 19000 280
1 19100 19200 100
2 18000 18500 .
2 18900 19000 400
3 15000 15500 .
3 15900 16000 400
3 16200 16300 200
3 16400 16500 100
4 18000 19000 .
5 15000 16000 .
5 16300 17000 300
"start-previous_end" is the difference between start_time and lag(end_time), so first.id will have missing value.
My condition is, if "start-previous_end" is <= 300 then, previous and current id in the same series will be linked and return the whole range of the start_time and end_time.
eg,
for id 1, 280 and 100 are <= 300, so three records will be combined and the new start_time will be 14280 and new_end_time will be19200, the record can be still three and contains all the same new start and end time or, it can be one record for id 1.
for id 2, 400 is >= 300, so two records of id 2 will be remain the same. no link.
for id 3, first record will remain the same and the rest three records will be linked.
any help will be appreciated!
As long as you separate the tasks into separate steps, the programming isn't that difficult. First, calculate your new column, along with a GROUP variable that identifies when a new grouping begins:
data have2;
set have;
by ID;
difference = start_time - lag(end_time);
if first.id then difference = .;
if first.id or difference > 300 then group + 1;
run;
That may be enough for your needs. But if you want to combine multiple observations for the same ID into one observation, that can be done:
data want;
set have2;
by group;
if first.group then new_start_time = start_time;
retain new_start_time;
if last.group;
new_end_time = end_time;
drop start_time end_time;
run;
It seems the new_start_time and new_end_time are the same as the originals in the output.
I suppose there should be some handling for "end_time" right after the "new_start_time" has been assigned.
That sort of result (same values coming out as went in) depends on your data. You may have all large values for DIFFERENCE. You'll have to actually inspect the data for HAVE2 for two questions. (1) is GROUP being created properly? (2) What should the results look like once the data is combined by GROUP?
Without being able to see your data, there's no way to tell on this end.
First, you second record should have start_time-end_time=180, not 280.
The "trick" here is to read the next start_time in tandom with the current end_time to determine whether the record-in-hand is the last one within 300:
data have;
input Ids start_time end_time diff;
datalines;
1 14280 16820 .
1 17000 19000 180 /* was 280 */
1 19100 19200 100
2 18000 18500 .
2 18900 19000 400
3 15000 15500 .
3 15900 16000 400
3 16200 16300 200
3 16400 16500 100
4 18000 19000 .
5 15000 16000 .
5 16300 17000 300
;
data want (drop=_:);
do until (last.ids=1 or _next_strt>end_time+300);
set have (keep=ids);
by ids;
merge have
have (firstobs=2 keep=start_time rename=(start_time=_next_strt));
if _link_strt=. then _link_strt=start_time;
end;
start_time=_link_strt;
run;
Editted additional note: Note you don't need to establish a variable to contain start_time-lag(end_time).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.