BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ven
Calcite | Level 5 ven
Calcite | Level 5

I am having a large dataset where I need to avoid overlapping time lines for each data group. My input dataset is as follows for data group A.

DATA iddates;

INFORMAT gid $1. pid $1. startdt enddt Date11.;

INPUT gid pid startdt enddt;

FORMAT startdt enddt DDMMYY10.;

DATALINES;

A 1 26-Feb-15 26-Apr-2015

A 2 26-Feb-15 30-Apr-2015

A 3 13-Apr-15 23-Apr-2015

A 4 24-Apr-15 01-Jan-9999

A 5 24-Apr-15 01-Jan-9999

run;


I need to avoid any intersections of timelines and arrange all time s in the ascending order having inserting new records to fill the gaps as follows. Any insights into a solution appreciated. Thank you all for any help.

Once done for Group A there can be one row with sequently aligned start times and end times.

DATA fillgaps;

INFORMAT pid $1. startdt enddt Date11.;

INPUT pid startdt enddt;

FORMAT startdt enddt DDMMYY10.;

DATALINES;

1 26-Feb-15 12-Apr-2015

1 13-Apr-15 23-Apr-2015

1 24-Apr-15 26-Apr-2015

2 26-Feb-15 12-Apr-2015

2 13-Apr-15 23-Apr-2015

2 24-Apr-15 26-Apr-2015

2 27-Apr-15 30-Apr-2015

3 13-Apr-15 23-Apr-2015

4 24-Apr-15 26-Apr-2015

4 27-Apr-15 30-Apr-2015

4 01-May-15 01-Jan-9999

5 24-Apr-15 26-Apr-2015

5 27-Apr-15 30-Apr-2015

5 01-May-15 01-Jan-9999

run;

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

The simplest way is to change '01-Jan-9999'd  into a quit large value like  '01-Jan-3000'd , then run my code .

Another way is assuming there are no date greater than current day - TODAY() , except '01-Jan-3000'd  . then try this one :

data key;

set iddates(keep=startdt rename=(startdt=date))

     iddates(keep=enddt rename=(enddt=date) in=inb);

if inb then date=date+1;

run;

data temp;

if _n_ eq 1 then do;

  declare hash h(dataset:'key');

  h.definekey('date');

  h.definedone();

end;

set iddates;

do date = startdt to enddt;

if h.check()=0 then group+1;

if date le today() or year(date)=9999 then output;

end;

keep pid group date;

run;

data want;

set temp;

by group;

retain start;

if first.group then start=date;

if last.group then do;end=date;output;end;

format start end date9.;

drop group date;

run;

Xia Keshan

View solution in original post

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

I am not sure about your logic.  Why does ID 3 overlap with one, but ID 2 does not?  Here is a program which will give you a transposed table, one column per id, each row being one day.  It is then a simple matter of going through the dataset to find start stop dates.

DATA iddates;

INFORMAT pid $1. startdt enddt Date9.;

INPUT pid startdt enddt;

FORMAT startdt enddt DDMMYY8.;

DATALINES;

1 26-Feb-15 26-Apr-15

2 26-Feb-15 30-Apr-15

3 13-Apr-15 23-Apr-15

4 24-Apr-15 01-Jan-99

5 24-Apr-15 01-Jan-99

run;

data inter;

  set iddates;

  do i=startdt to enddt;

    output;

  end;

  format i date9.;

run;

proc sort data=inter;

  by i pid;

run;

proc transpose data=inter out=i_inter;

  by i;

  var pid;

run;

ven
Calcite | Level 5 ven
Calcite | Level 5

Hi,

Actually any date range can overlap another, I just update the question to specify the point this is sample data for one group A. once dates are aligned for group A there are only sequently aligned start times and end times without overlapping each other.

in your solution when we do startdt to enddt , for observations where enddt=01-Jan-9999 gives extremely large number of observations to work out.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Why are you giving an end date of 01jan9999 then?  Why not just replace that:

data inter;

  set iddates;

  if enddt="01jan9999"d then enddt=today();

  do i=startdt to enddt;

    output;

  end;

  format i date9.;

run;

I didn't understand your other point.

ven
Calcite | Level 5 ven
Calcite | Level 5

Unable to change the dates, since it will affect the validity of other ids, I need to find out valid ids on a given date by creating time lines as below

id
126-Feb-1512-Apr-15
113-Apr-1523-Apr-15
124-Apr-1526-Apr-15
226-Feb-1512-Apr-15
213-Apr-1523-Apr-15
224-Apr-1526-Apr-15
227-Apr-1530-Apr-15
313-Apr-1523-Apr-15
424-Apr-1526-Apr-15
427-Apr-1530-Apr-15
41-May-151-Jan-99
524-Apr-1526-Apr-15
527-Apr-1530-Apr-151-May-151-Jan-99
5
validityids 1,2 are validids 1,2,3 are validids 1,2,3,4 are validids 2,4,5 are validids 4, 5 are vaid
Ksharp
Super User

En, Interesting question. 01-Jan-99 is 01-Jan-1999  ?

DATA iddates;
INFORMAT pid $1. startdt enddt Date9.;
INPUT pid startdt enddt;
FORMAT startdt enddt DDMMYY8.;
DATALINES;
1 26-Feb-15 26-Apr-15
2 26-Feb-15 30-Apr-15
3 13-Apr-15 23-Apr-15
4 24-Apr-15 01-Jan-99
5 24-Apr-15 01-Jan-99
;
run;
data key;
 set iddates(keep=startdt rename=(startdt=date))
     iddates(keep=enddt rename=(enddt=date) in=inb);
 if inb then date=date+1;
run;
data temp;
 if _n_ eq 1 then do;
  declare hash h(dataset:'key');
  h.definekey('date');
  h.definedone();
 end;
set iddates;
do date = startdt to enddt;
 if h.check()=0 then group+1;output;
end;
keep pid group date;
run;
data want;
 set temp;
 by group;
 retain start;
 if first.group then start=date;
 if last.group then do;end=date;output;end;
format start end date9.;
drop group date;
run;








Xia Keshan

ven
Calcite | Level 5 ven
Calcite | Level 5

Hi Keshan,

99 is actually 9999, I've updated the question.

Your answer looks very valid, will check more and update.

Thanks a lot.

Ksharp
Super User

En. I think my code could work. Did you run my code on your new data ?

ven
Calcite | Level 5 ven
Calcite | Level 5

It does, but the problem is when it does startdt to enddt where enddt bcomes 9999-01-01 it creates huge temp dataset which makes the solution unnecessarily data intensive. Trying to find a way to resolve the issue.

Ksharp
Super User

The simplest way is to change '01-Jan-9999'd  into a quit large value like  '01-Jan-3000'd , then run my code .

Another way is assuming there are no date greater than current day - TODAY() , except '01-Jan-3000'd  . then try this one :

data key;

set iddates(keep=startdt rename=(startdt=date))

     iddates(keep=enddt rename=(enddt=date) in=inb);

if inb then date=date+1;

run;

data temp;

if _n_ eq 1 then do;

  declare hash h(dataset:'key');

  h.definekey('date');

  h.definedone();

end;

set iddates;

do date = startdt to enddt;

if h.check()=0 then group+1;

if date le today() or year(date)=9999 then output;

end;

keep pid group date;

run;

data want;

set temp;

by group;

retain start;

if first.group then start=date;

if last.group then do;end=date;output;end;

format start end date9.;

drop group date;

run;

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1328 views
  • 0 likes
  • 3 in conversation