DATA Step, Macro, Functions and more

Align overlapping time lines across observations by filling gaps.

Accepted Solution Solved
Reply
Contributor ven
Contributor
Posts: 37
Accepted Solution

Align overlapping time lines across observations by filling gaps.

I am having a large dataset where I need to avoid overlapping time lines for each data group. My input dataset is as follows for data group A.

DATA iddates;

INFORMAT gid $1. pid $1. startdt enddt Date11.;

INPUT gid pid startdt enddt;

FORMAT startdt enddt DDMMYY10.;

DATALINES;

A 1 26-Feb-15 26-Apr-2015

A 2 26-Feb-15 30-Apr-2015

A 3 13-Apr-15 23-Apr-2015

A 4 24-Apr-15 01-Jan-9999

A 5 24-Apr-15 01-Jan-9999

run;


I need to avoid any intersections of timelines and arrange all time s in the ascending order having inserting new records to fill the gaps as follows. Any insights into a solution appreciated. Thank you all for any help.

Once done for Group A there can be one row with sequently aligned start times and end times.

DATA fillgaps;

INFORMAT pid $1. startdt enddt Date11.;

INPUT pid startdt enddt;

FORMAT startdt enddt DDMMYY10.;

DATALINES;

1 26-Feb-15 12-Apr-2015

1 13-Apr-15 23-Apr-2015

1 24-Apr-15 26-Apr-2015

2 26-Feb-15 12-Apr-2015

2 13-Apr-15 23-Apr-2015

2 24-Apr-15 26-Apr-2015

2 27-Apr-15 30-Apr-2015

3 13-Apr-15 23-Apr-2015

4 24-Apr-15 26-Apr-2015

4 27-Apr-15 30-Apr-2015

4 01-May-15 01-Jan-9999

5 24-Apr-15 26-Apr-2015

5 27-Apr-15 30-Apr-2015

5 01-May-15 01-Jan-9999

run;


Accepted Solutions
Solution
‎05-29-2015 08:20 AM
Super User
Posts: 9,681

Re: Align overlapping time lines across observations by filling gaps.

The simplest way is to change '01-Jan-9999'd  into a quit large value like  '01-Jan-3000'd , then run my code .

Another way is assuming there are no date greater than current day - TODAY() , except '01-Jan-3000'd  . then try this one :

data key;

set iddates(keep=startdt rename=(startdt=date))

     iddates(keep=enddt rename=(enddt=date) in=inb);

if inb then date=date+1;

run;

data temp;

if _n_ eq 1 then do;

  declare hash h(dataset:'key');

  h.definekey('date');

  h.definedone();

end;

set iddates;

do date = startdt to enddt;

if h.check()=0 then group+1;

if date le today() or year(date)=9999 then output;

end;

keep pid group date;

run;

data want;

set temp;

by group;

retain start;

if first.group then start=date;

if last.group then do;end=date;output;end;

format start end date9.;

drop group date;

run;

Xia Keshan

View solution in original post


All Replies
Super User
Super User
Posts: 7,401

Re: Align overlapping time lines across observations by filling gaps.

Hi,

I am not sure about your logic.  Why does ID 3 overlap with one, but ID 2 does not?  Here is a program which will give you a transposed table, one column per id, each row being one day.  It is then a simple matter of going through the dataset to find start stop dates.

DATA iddates;

INFORMAT pid $1. startdt enddt Date9.;

INPUT pid startdt enddt;

FORMAT startdt enddt DDMMYY8.;

DATALINES;

1 26-Feb-15 26-Apr-15

2 26-Feb-15 30-Apr-15

3 13-Apr-15 23-Apr-15

4 24-Apr-15 01-Jan-99

5 24-Apr-15 01-Jan-99

run;

data inter;

  set iddates;

  do i=startdt to enddt;

    output;

  end;

  format i date9.;

run;

proc sort data=inter;

  by i pid;

run;

proc transpose data=inter out=i_inter;

  by i;

  var pid;

run;

Contributor ven
Contributor
Posts: 37

Re: Align overlapping time lines across observations by filling gaps.

Hi,

Actually any date range can overlap another, I just update the question to specify the point this is sample data for one group A. once dates are aligned for group A there are only sequently aligned start times and end times without overlapping each other.

in your solution when we do startdt to enddt , for observations where enddt=01-Jan-9999 gives extremely large number of observations to work out.

Super User
Super User
Posts: 7,401

Re: Align overlapping time lines across observations by filling gaps.

Why are you giving an end date of 01jan9999 then?  Why not just replace that:

data inter;

  set iddates;

  if enddt="01jan9999"d then enddt=today();

  do i=startdt to enddt;

    output;

  end;

  format i date9.;

run;

I didn't understand your other point.

Contributor ven
Contributor
Posts: 37

Re: Align overlapping time lines across observations by filling gaps.

Unable to change the dates, since it will affect the validity of other ids, I need to find out valid ids on a given date by creating time lines as below

id
126-Feb-1512-Apr-15
113-Apr-1523-Apr-15
124-Apr-1526-Apr-15
226-Feb-1512-Apr-15
213-Apr-1523-Apr-15
224-Apr-1526-Apr-15
227-Apr-1530-Apr-15
313-Apr-1523-Apr-15
424-Apr-1526-Apr-15
427-Apr-1530-Apr-15
41-May-151-Jan-99
524-Apr-1526-Apr-15
527-Apr-1530-Apr-151-May-151-Jan-99
5
validityids 1,2 are validids 1,2,3 are validids 1,2,3,4 are validids 2,4,5 are validids 4, 5 are vaid
Super User
Posts: 9,681

Re: Align overlapping time lines across observations by filling gaps.

En, Interesting question. 01-Jan-99 is 01-Jan-1999  ?

DATA iddates;
INFORMAT pid $1. startdt enddt Date9.;
INPUT pid startdt enddt;
FORMAT startdt enddt DDMMYY8.;
DATALINES;
1 26-Feb-15 26-Apr-15
2 26-Feb-15 30-Apr-15
3 13-Apr-15 23-Apr-15
4 24-Apr-15 01-Jan-99
5 24-Apr-15 01-Jan-99
;
run;
data key;
 set iddates(keep=startdt rename=(startdt=date))
     iddates(keep=enddt rename=(enddt=date) in=inb);
 if inb then date=date+1;
run;
data temp;
 if _n_ eq 1 then do;
  declare hash h(dataset:'key');
  h.definekey('date');
  h.definedone();
 end;
set iddates;
do date = startdt to enddt;
 if h.check()=0 then group+1;output;
end;
keep pid group date;
run;
data want;
 set temp;
 by group;
 retain start;
 if first.group then start=date;
 if last.group then do;end=date;output;end;
format start end date9.;
drop group date;
run;








Xia Keshan

Contributor ven
Contributor
Posts: 37

Re: Align overlapping time lines across observations by filling gaps.

Hi Keshan,

99 is actually 9999, I've updated the question.

Your answer looks very valid, will check more and update.

Thanks a lot.

Super User
Posts: 9,681

Re: Align overlapping time lines across observations by filling gaps.

En. I think my code could work. Did you run my code on your new data ?

Contributor ven
Contributor
Posts: 37

Re: Align overlapping time lines across observations by filling gaps.

It does, but the problem is when it does startdt to enddt where enddt bcomes 9999-01-01 it creates huge temp dataset which makes the solution unnecessarily data intensive. Trying to find a way to resolve the issue.

Solution
‎05-29-2015 08:20 AM
Super User
Posts: 9,681

Re: Align overlapping time lines across observations by filling gaps.

The simplest way is to change '01-Jan-9999'd  into a quit large value like  '01-Jan-3000'd , then run my code .

Another way is assuming there are no date greater than current day - TODAY() , except '01-Jan-3000'd  . then try this one :

data key;

set iddates(keep=startdt rename=(startdt=date))

     iddates(keep=enddt rename=(enddt=date) in=inb);

if inb then date=date+1;

run;

data temp;

if _n_ eq 1 then do;

  declare hash h(dataset:'key');

  h.definekey('date');

  h.definedone();

end;

set iddates;

do date = startdt to enddt;

if h.check()=0 then group+1;

if date le today() or year(date)=9999 then output;

end;

keep pid group date;

run;

data want;

set temp;

by group;

retain start;

if first.group then start=date;

if last.group then do;end=date;output;end;

format start end date9.;

drop group date;

run;

Xia Keshan

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 400 views
  • 0 likes
  • 3 in conversation