BookmarkSubscribeRSS Feed
Calcite | Level 5

I would like to link sas time period together for the same series of ids by certain conditions,

for example:


Ids                                             start_time                                   end_time                            start-previous_end

1                                                  14280                                          16820                                    .              

1                                                  17000                                          19000                                    280

1                                                  19100                                          19200                                    100

2                                                  18000                                          18500                                    .

2                                                  18900                                          19000                                    400

3                                                  15000                                          15500                                    .

3                                                  15900                                          16000                                    400

3                                                  16200                                          16300                                    200

3                                                  16400                                          16500                                    100

4                                                  18000                                          19000                                    .

5                                                  15000                                          16000                                    .

5                                                  16300                                          17000                                    300



"start-previous_end" is the difference between start_time and lag(end_time), so will have missing value.

My condition is, if "start-previous_end" is <= 300 then, previous and current id in the same series will be linked and return the whole range of the start_time and end_time.


for id 1, 280 and 100 are <= 300, so three records will be combined and the new start_time will be 14280 and new_end_time will be19200, the record can be still three and contains all the same new start and end time or, it can be one record for id 1.



for id 2, 400 is >= 300, so two records of id 2 will be remain the same. no link.


for id 3, first record will remain the same and the rest three records will be linked.



any help will be appreciated!


As long as you separate the tasks into separate steps, the programming isn't that difficult.  First, calculate your new column, along with a GROUP variable that identifies when a new grouping begins:


data have2;

set have;

by ID;

difference = start_time - lag(end_time);

if then difference = .;

if or difference > 300 then group + 1;



That may be enough for your needs.  But if you want to combine multiple observations for the same ID into one observation, that can be done:


data want;

set have2;

by group;

if then new_start_time = start_time;

retain new_start_time;


new_end_time = end_time;

drop start_time end_time;


Calcite | Level 5

It seems the new_start_time and new_end_time are the same as the originals in the output.


I suppose there should be some handling for "end_time" right after the "new_start_time" has been assigned. 



That sort of result (same values coming out as went in) depends on your data.  You may have all large values for DIFFERENCE.  You'll have to actually inspect the data for HAVE2 for two questions.  (1) is GROUP being created properly?  (2) What should the results look like once the data is combined by GROUP? 


Without being able to see your data, there's no way to tell on this end.


First, you second record should have  start_time-end_time=180, not 280.


The "trick" here is to read the next start_time in tandom with the current end_time to determine whether the record-in-hand is the last one within 300:


data have;
  input Ids start_time end_time  diff;
1 14280 16820 .              
1 17000 19000 180    /* was 280 */
1 19100 19200 100
2 18000 18500 .
2 18900 19000 400
3 15000 15500 .
3 15900 16000 400
3 16200 16300 200
3 16400 16500 100
4 18000 19000 .
5 15000 16000 .
5 16300 17000 300

data want (drop=_:);
  do until (last.ids=1 or _next_strt>end_time+300);
    set have (keep=ids);
    by ids;
	merge have 
          have (firstobs=2 keep=start_time rename=(start_time=_next_strt));
	if _link_strt=. then _link_strt=start_time;


Editted additional note:  Note you don't need to establish a variable to contain start_time-lag(end_time).

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets


Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 3 in conversation