BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
soomx
Fluorite | Level 6

Hi all,

 

I have a dataset where there are multiple lines per ID, with dates indicating exposure during a period of time. Some of the dates overlap, some do not. Some dates are completely within the time period of earlier or later datalines, while some only overlap certain months. Also, for some people there may be gaps before becoming exposed again.

 

What I would like to do is combine the overlapping periods so that I keep the earliest start and latest end dates for intervals that are continuously overlapping. However, if there is a gap (even if just one day) between the intervals, I would like to keep them separate. Therefore, people can still have multiple lines if they have multiple gaps. I just want to make sure I don't attribute non-exposure periods as being exposed.

 

I've searched on this board and I can't seem to keep the gaps. 

 

Thanks!

 

 

HAVE:

 

ID                    Start                End

ID_A
10/24/131/21/15
ID_A
9/25/1511/19/16
ID_A
1/22/161/21/17
ID_A
12/12/1612/31/17
ID_A
1/22/171/21/18
ID_A
3/30/172/6/18
ID_B6/30/179/27/18
ID_B9/20/179/19/18
ID_B9/24/179/23/18
ID_B10/13/1712/31/18
ID_B10/23/1710/23/18

 

WANT:

ID_A10/24/131/21/15
ID_A9/25/152/6/18
   
ID_B6/30/1712/31/18
1 ACCEPTED SOLUTION

Accepted Solutions
yabwon
Onyx | Level 15

HI @soomx ,

 

try this step by step approach:

data have;
input ID $ Start : mmddyy8. End : mmddyy8.;
format Start mmddyy10. End mmddyy10.;
cards;
ID_A 10/24/13 1/21/15
ID_A 9/25/15 11/19/16
ID_A 1/22/16 1/21/17
ID_A 12/12/16 12/31/17
ID_A 1/22/17 1/21/18
ID_A 3/30/17 2/6/18
ID_B 6/30/17 9/27/18
ID_B 9/20/17 9/19/18
ID_B 9/24/17 9/23/18
ID_B 10/13/17 12/31/18
ID_B 10/23/17 10/23/18
; 
run;

data have2;
set
  have(keep = ID Start in=inStart rename=(Start=date))
  have(keep = ID End   in=inEnd   rename=(End  =date))
;
if inStart then marker = 1;
           else marker =-1;

run;

proc sort data = have2;
  by ID Date;
run;

data have3;
  set have2;
  by ID;
  if first.ID then cumulator = 0;
  cumulator + marker;
run;

data have4;
  set have3;
  by ID;
 
  if first.ID or lag(cumulator) = 0 then
    do;
      retain Start End;
      format Start mmddyy10. End mmddyy10.;
      Start = Date;
      End   = Date;
    end;

  Start = Start >< Date;
  End   = End   <> Date;

  if cumulator = 0 then output;
  keep ID Start End;
run;
proc print;
run;

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



View solution in original post

11 REPLIES 11
yabwon
Onyx | Level 15

HI @soomx ,

 

try this step by step approach:

data have;
input ID $ Start : mmddyy8. End : mmddyy8.;
format Start mmddyy10. End mmddyy10.;
cards;
ID_A 10/24/13 1/21/15
ID_A 9/25/15 11/19/16
ID_A 1/22/16 1/21/17
ID_A 12/12/16 12/31/17
ID_A 1/22/17 1/21/18
ID_A 3/30/17 2/6/18
ID_B 6/30/17 9/27/18
ID_B 9/20/17 9/19/18
ID_B 9/24/17 9/23/18
ID_B 10/13/17 12/31/18
ID_B 10/23/17 10/23/18
; 
run;

data have2;
set
  have(keep = ID Start in=inStart rename=(Start=date))
  have(keep = ID End   in=inEnd   rename=(End  =date))
;
if inStart then marker = 1;
           else marker =-1;

run;

proc sort data = have2;
  by ID Date;
run;

data have3;
  set have2;
  by ID;
  if first.ID then cumulator = 0;
  cumulator + marker;
run;

data have4;
  set have3;
  by ID;
 
  if first.ID or lag(cumulator) = 0 then
    do;
      retain Start End;
      format Start mmddyy10. End mmddyy10.;
      Start = Date;
      End   = Date;
    end;

  Start = Start >< Date;
  End   = End   <> Date;

  if cumulator = 0 then output;
  keep ID Start End;
run;
proc print;
run;

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



soomx
Fluorite | Level 6

@yabwon 

 

Thank you for checking the code. Your code works perfectly.

 

I can't seem to respond to your message, but I had a quick follow-up question. 

 

One question I had was, what to do if I allow for a gap of 1 day between two intervals and count that as one continuous one. 

For example:

 
 ID Start End
ID_A30JUN201727SEP2018
ID_A20SEP201719SEP2017
ID_A20SEP201719SEP2018
ID_A24SEP201723SEP2018
ID_A13OCT201731DEC2018
ID_A23OCT201723OCT2018

 

Using your code gives me:

 ID_A30JUN201719SEP2017
 ID_A20SEP201731DEC2018

 

 

If the end date is <=1 day of the next start date, I would like to combine into this:

ID_A30JUN201731DEC2018

 

Is this possible to do within the last data step?

 

Thanks!

PhilC
Rhodochrosite | Level 12

Thanks for the inspiration!  This helps me too.  But I'm pretty sure, so math works in every case, that we should sort by your marker variable.  Incase we have Start and End dates that coincide and are given in HAVE in the wrong order, we do want the accumulator to be added to before being subtracted.

proc sort data = have2;
  by ID Date /**/ descending marker /**/;
run;
Ksharp
Super User
data have;
input ID $ Start : mmddyy8. End : mmddyy8.;
format Start mmddyy10. End mmddyy10.;
cards;
ID_A 10/24/13 1/21/15
ID_A 9/25/15 11/19/16
ID_A 1/22/16 1/21/17
ID_A 12/12/16 12/31/17
ID_A 1/22/17 1/21/18
ID_A 3/30/17 2/6/18
ID_B 6/30/17 9/27/18
ID_B 9/20/17 9/19/18
ID_B 9/24/17 9/23/18
ID_B 10/13/17 12/31/18
ID_B 10/23/17 10/23/18
; 
run;
data temp;
 set have;
 by id;
 dif=start-lag(end);
 if first.id then dif=.;
 if first.id or dif>1 then group+1;
run;
data want;
do until(last.group);
  set temp;
  by group;
  if first.group then s=start;
end;
e=end;
format s e mmddyy10.;
drop start end dif;
run;
yabwon
Onyx | Level 15

Hi @soomx ,

 

You already have an answer for your follow up question. It was @Ksharp who already gave you the code, which requires to replace only one one to two 😉

 

data temp;
 set have;
 by id;
 dif=start-lag(end);
 if first.id then dif=.;
 if first.id or dif>2 then group+1; /* here is the line to change */
run;
proc summary data=temp nway;
class id group;
var start end;
output out=want min(start)= max(end)=;
run;

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Ksharp
Super User

OK. No problem.

 

 

data have;
input ID $ Start : mmddyy8. End : mmddyy8.;
format Start mmddyy10. End mmddyy10.;
cards;
ID_A 10/24/13 1/21/15
ID_A 11/24/13 1/21/14
ID_A 9/25/15 11/19/16
ID_A 1/22/16 1/21/17
ID_A 12/12/16 12/31/17
ID_A 1/22/17 1/21/18
ID_A 3/30/17 2/6/18
ID_B 6/30/17 9/27/18
ID_B 9/20/17 9/19/18
ID_B 9/24/17 9/23/18
ID_B 10/13/17 12/31/18
ID_B 10/23/17 10/23/18
; 
run;
data temp;
 set have;
 by id;
 dif=start-lag(end);
 if first.id then dif=.;
 if first.id or dif>1 then group+1;
run;
proc summary data=temp nway;
class id group;
var start end;
output out=want min(start)= max(end)=;
run;
soomx
Fluorite | Level 6

Thank you so much! It worked perfectly.

 

EDIT: Revised. The code by Ksharp didn't quite work when dates were entirely within other intervals.

Ksharp
Super User

Hi. @yabwon  thanks keep finding inside/data error .

I think this could include everything , but not efficient.

 

 

data have;
input ID $ Start : mmddyy8. End : mmddyy8.;
format Start mmddyy10. End mmddyy10.;
cards;
ID_A 10/24/13 1/21/18
ID_A 11/24/13 1/21/14
ID_A 11/24/15 1/21/17
ID_A 9/25/15 11/19/16
ID_A 1/22/16 1/21/17
ID_A 12/12/16 12/31/17
ID_A 1/22/17 1/21/18
ID_A 1/30/18 2/6/18
ID_B 6/30/17 9/27/18
ID_B 9/20/17 9/19/18
ID_B 9/24/17 9/23/18
ID_B 10/13/17 12/31/18
ID_B 10/23/17 10/23/18
; 
run;
data temp;
 set have;
 do date=start to end;
  output;
 end;
keep id date;
format date mmddyy10.;
run;
proc sort data=temp nodupkey sortsize=max noequals;
by id date;
run;
data temp1;
 set temp;
 by id;
 if first.id or dif(date)>1 then group+1;
run;

proc summary data=temp1 ;
by id group;
var date;
output out=want min=start max=end;
run;
yabwon
Onyx | Level 15

Hi @Ksharp ,

 

Few years ago I was exposed to a similar problem on >1MLN customer's base, back then it punched me all possible cases. Learning is pain 😜

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Ksharp
Super User

Hi @yabwon  . That is awful task , but I like challenge . Hope I have some time to keep researching more BETTER code .

Ksharp
Super User

Hi. @yabwon  ,

I figure out a Hash solution.

What do you think? Or you could test some data ?

 

data have;
input ID $ Start : mmddyy8. End : mmddyy8.;
format Start mmddyy10. End mmddyy10.;
cards;
ID_A 10/24/13 1/21/18
ID_A 11/24/13 1/21/14
ID_A 11/24/15 1/21/17
ID_A 9/25/15 11/19/16
ID_A 1/22/16 1/21/17
ID_A 12/12/16 12/31/17
ID_A 1/22/17 1/21/18
ID_A 1/30/18 2/6/18
ID_B 6/30/17 9/27/18
ID_B 9/20/17 9/19/18
ID_B 9/24/17 9/23/18
ID_B 10/13/17 12/31/18
ID_B 10/23/17 10/23/18
; 

data want;
 if _n_=1 then do;
  declare hash h(ordered:'y');
  declare hiter hi('h');
  h.definekey('date');
  h.definedata('date');
  h.definedone();
 end;
do until(last.id);
 set have;
 by id;
 do date=start to end;
   h.ref();
 end;
end;

rc=hi.first();
do i=1 by 1 while(rc=0);
 if i=1 then do;_start=date;_date=date;end;
 else do;
   if _date+1 ne date then do;
    _end=_date;output;
    _start=date;
   end;
    _date=date;
  end;
  rc=hi.next();
end;
 _end=_date;output;

h.clear();

keep id _start _end;
format _start _end mmddyy10.;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 5246 views
  • 4 likes
  • 4 in conversation