BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Alice_SAS
Calcite | Level 5

I have the table like:

sourcetargetstartend
AX2005010620070506
BY2003060120060704
BY2006070420080905
BY2008090520091211
BP2004070420080905
BP2006100120100806
CL2007091120081120

I hope to merge the time range of the target corresponding to each source, and get this table:

Can you help me? Thank you!

sourcetargetstartend
AX2005010620070506
BY2003060120091211
BP2004070420100806
CL2007091120081120
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User
data have;
  input source:$1. target:$1. start: YYMMDD8. end: YYMMDD8.;
  format start YYMMDDn8. end YYMMDDn8.;
  datalines;
A X 20050106 20070506
B Y 20030601 20060704
B Y 20060704 20080905
B Y 20080905 20091211
B P 20040704 20080905
B P 20061001 20100806
C L 20070911 20081120
D M 20030507 20040506
D M 20050809 20071001
;

data want;
set have end=done;
set
  have (
    firstobs=2
    keep=source target start
    rename=(source=_source target=_target start=_start)
  )
  have (
    obs=1
    keep=source target start
    rename=(source=_source target=_target start=_start)
  )
;
*by source target;
retain __start;
if _n_ = 1 then __start = start;
*if first.source then __start = start;
if
  source ne _source or target ne _target 
  or _start > end + 1
  or done
then do;
  start = __start;
  output;
  __start = _start;
end;
drop _:;
run;

proc print data=want noobs;
run;

The output:

source	target	start	end
A	X	20050106	20070506
B	Y	20030601	20091211
B	P	20040704	20100806
C	L	20070911	20081120
D	M	20030507	20040506
D	M	20050809	20071001

matches your "want" example.

View solution in original post

12 REPLIES 12
Alice_SAS
Calcite | Level 5

Thanks for your answer. But I forgot one situation. I don't want to merge non-overlapping times. For example, what should I do in the following situation?

sourcetargetstartend
DM2003050720040506
DM2005080920071001

In this situation, I don't want to do any merge, what should I do? Thank you!

andreas_lds
Jade | Level 19

Could be solved by a data step, too, but i don't think, that it is better.

Just to clarify: you don want one obs for each source/target combination with the earliest start date and the latest end date?

Alice_SAS
Calcite | Level 5

For time ranges that are connected or overlap, I want to save the start and end times. For non-overlapping times, I don't want to merge them.

Astounding
PROC Star
Is it possible that a SOURCE/TARGET might contain two separate date ranges that do not overlap? If so, what should the result be?
Alice_SAS
Calcite | Level 5

Yes, that situation exists. If the two separate date ranges don't overlap, I prefer not to merge them and leave them as they are. 

andreas_lds
Jade | Level 19

@Alice_SAS wrote:

Yes, that situation exists. If the two separate date ranges don't overlap, I prefer not to merge them and leave them as they are. 


Please post data containing this case and post the data in usable form.

Alice_SAS
Calcite | Level 5

I have the data:


data have;
input source:$1. target:$1. start: YYMMDD8. end: YYMMDD8.;
format start YYMMDDn8. end YYMMDDn8.;
datalines;
A X 20050106 20070506
B Y 20030601 20060704
B Y 20060704 20080905
B Y 20080905 20091211
B P 20040704 20080905
B P 20061001 20100806
C L 20070911 20081120
D M 20030507 20040506
D M 20050809 20071001
;
run;

 

I want to obtain the form:

data want;
input source:$1. target:$1. start: YYMMDD8. end: YYMMDD8.;
format start YYMMDDn8. end YYMMDDn8.;
datalines;
A X 20050106 20070506
B Y 20030601 20091211
B P 20040704 20100806
C L 20070911 20081120
D M 20030507 20040506
D M 20050809 20071001
;
run;

 

Thank you!

Patrick
Opal | Level 21

For the sample data you shared something like below should work.

data have;
  input source:$1. target:$1. start: YYMMDD8. end: YYMMDD8.;
  format start YYMMDDn8. end YYMMDDn8.;
  datalines;
A X 20050106 20070506
B Y 20030601 20060704
B Y 20060704 20080905
B Y 20080905 20091211
B P 20040704 20080905
B P 20061001 20100806
C L 20070911 20081120
D M 20030507 20040506
D M 20050809 20071001
;
run;

data want;
  input source:$1. target:$1. start: YYMMDD8. end: YYMMDD8.;
  format start YYMMDDn8. end YYMMDDn8.;
  datalines;
A X 20050106 20070506
B Y 20030601 20091211
B P 20040704 20100806
C L 20070911 20081120
D M 20030507 20040506
D M 20050809 20071001
;
run;

proc sort data=have out=have_sorted;
  by source target start end;
run;

data want2;
  set have_sorted;
  by source target start end;

  _next_obs=min(_n_+1,_nobs);
  set have_sorted(keep=start rename=(start=_next_start)) point=_next_obs nobs=_nobs;
  if last.target or end<_next_start then output;
  drop _next_start;
run;

proc print data=want2;
run;

 

Alice_SAS
Calcite | Level 5

Thank you for your help. However, I found an error in the output of your code regarding the results of B Y 20030601  20091211 and B P 20040704 20100806 in the Want table I provided. How can I modify the code?

mkeintz
PROC Star
data want (drop=_:);
  set have (keep=source target);
  by source target  notsorted;

  merge have (rename=(start=_current_start))
        have (firstobs=2 keep=start rename=(start=_nxt_start));

  retain start ;  
  format start yymmddn8. ;

  if first.target=1 or _current_start-1>lag(end) then start=_current_start;
  if last.target=1  or end+1<_nxt_start;
run;

The SET and MERGE statement set up two streams of data, read in parallel. 

The first SET is accompanied by the BY statement, allowing use of first.target and last.target dummies.  The data are assumed to be grouped by source*target, and within each source*target group, to be sorted by start.  (and also sorted by END within each group).

 

The MERGE statement has two "substreams" of data, also read in parallel, but because of the FIRSTOBS=2 option provides a way to look ahead at the upcoming START value (in variable _nxt_start), so upcoming date gaps can easily be found.

 

The first IF identifies the beginning of a continuous sequence of observations.

The second if identified the end of same.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User
data have;
  input source:$1. target:$1. start: YYMMDD8. end: YYMMDD8.;
  format start YYMMDDn8. end YYMMDDn8.;
  datalines;
A X 20050106 20070506
B Y 20030601 20060704
B Y 20060704 20080905
B Y 20080905 20091211
B P 20040704 20080905
B P 20061001 20100806
C L 20070911 20081120
D M 20030507 20040506
D M 20050809 20071001
;

data want;
set have end=done;
set
  have (
    firstobs=2
    keep=source target start
    rename=(source=_source target=_target start=_start)
  )
  have (
    obs=1
    keep=source target start
    rename=(source=_source target=_target start=_start)
  )
;
*by source target;
retain __start;
if _n_ = 1 then __start = start;
*if first.source then __start = start;
if
  source ne _source or target ne _target 
  or _start > end + 1
  or done
then do;
  start = __start;
  output;
  __start = _start;
end;
drop _:;
run;

proc print data=want noobs;
run;

The output:

source	target	start	end
A	X	20050106	20070506
B	Y	20030601	20091211
B	P	20040704	20100806
C	L	20070911	20081120
D	M	20030507	20040506
D	M	20050809	20071001

matches your "want" example.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2166 views
  • 1 like
  • 6 in conversation