I am having a large dataset where I need to avoid overlapping time lines for each data group. My input dataset is as follows for data group A.
DATA iddates;
INFORMAT gid $1. pid $1. startdt enddt Date11.;
INPUT gid pid startdt enddt;
FORMAT startdt enddt DDMMYY10.;
DATALINES;
A 1 26-Feb-15 26-Apr-2015
A 2 26-Feb-15 30-Apr-2015
A 3 13-Apr-15 23-Apr-2015
A 4 24-Apr-15 01-Jan-9999
A 5 24-Apr-15 01-Jan-9999
run;
I need to avoid any intersections of timelines and arrange all time s in the ascending order having inserting new records to fill the gaps as follows. Any insights into a solution appreciated. Thank you all for any help.
Once done for Group A there can be one row with sequently aligned start times and end times.
DATA fillgaps;
INFORMAT pid $1. startdt enddt Date11.;
INPUT pid startdt enddt;
FORMAT startdt enddt DDMMYY10.;
DATALINES;
1 26-Feb-15 12-Apr-2015
1 13-Apr-15 23-Apr-2015
1 24-Apr-15 26-Apr-2015
2 26-Feb-15 12-Apr-2015
2 13-Apr-15 23-Apr-2015
2 24-Apr-15 26-Apr-2015
2 27-Apr-15 30-Apr-2015
3 13-Apr-15 23-Apr-2015
4 24-Apr-15 26-Apr-2015
4 27-Apr-15 30-Apr-2015
4 01-May-15 01-Jan-9999
5 24-Apr-15 26-Apr-2015
5 27-Apr-15 30-Apr-2015
5 01-May-15 01-Jan-9999
run;
The simplest way is to change '01-Jan-9999'd into a quit large value like '01-Jan-3000'd , then run my code .
Another way is assuming there are no date greater than current day - TODAY() , except '01-Jan-3000'd . then try this one :
data key;
set iddates(keep=startdt rename=(startdt=date))
iddates(keep=enddt rename=(enddt=date) in=inb);
if inb then date=date+1;
run;
data temp;
if _n_ eq 1 then do;
declare hash h(dataset:'key');
h.definekey('date');
h.definedone();
end;
set iddates;
do date = startdt to enddt;
if h.check()=0 then group+1;
if date le today() or year(date)=9999 then output;
end;
keep pid group date;
run;
data want;
set temp;
by group;
retain start;
if first.group then start=date;
if last.group then do;end=date;output;end;
format start end date9.;
drop group date;
run;
Xia Keshan
Hi,
I am not sure about your logic. Why does ID 3 overlap with one, but ID 2 does not? Here is a program which will give you a transposed table, one column per id, each row being one day. It is then a simple matter of going through the dataset to find start stop dates.
DATA iddates;
INFORMAT pid $1. startdt enddt Date9.;
INPUT pid startdt enddt;
FORMAT startdt enddt DDMMYY8.;
DATALINES;
1 26-Feb-15 26-Apr-15
2 26-Feb-15 30-Apr-15
3 13-Apr-15 23-Apr-15
4 24-Apr-15 01-Jan-99
5 24-Apr-15 01-Jan-99
run;
data inter;
set iddates;
do i=startdt to enddt;
output;
end;
format i date9.;
run;
proc sort data=inter;
by i pid;
run;
proc transpose data=inter out=i_inter;
by i;
var pid;
run;
Hi,
Actually any date range can overlap another, I just update the question to specify the point this is sample data for one group A. once dates are aligned for group A there are only sequently aligned start times and end times without overlapping each other.
in your solution when we do startdt to enddt , for observations where enddt=01-Jan-9999 gives extremely large number of observations to work out.
Why are you giving an end date of 01jan9999 then? Why not just replace that:
data inter;
set iddates;
if enddt="01jan9999"d then enddt=today();
do i=startdt to enddt;
output;
end;
format i date9.;
run;
I didn't understand your other point.
Unable to change the dates, since it will affect the validity of other ids, I need to find out valid ids on a given date by creating time lines as below
id | ||||||||||
1 | 26-Feb-15 | 12-Apr-15 | ||||||||
1 | 13-Apr-15 | 23-Apr-15 | ||||||||
1 | 24-Apr-15 | 26-Apr-15 | ||||||||
2 | 26-Feb-15 | 12-Apr-15 | ||||||||
2 | 13-Apr-15 | 23-Apr-15 | ||||||||
2 | 24-Apr-15 | 26-Apr-15 | ||||||||
2 | 27-Apr-15 | 30-Apr-15 | ||||||||
3 | 13-Apr-15 | 23-Apr-15 | ||||||||
4 | 24-Apr-15 | 26-Apr-15 | ||||||||
4 | 27-Apr-15 | 30-Apr-15 | ||||||||
4 | 1-May-15 | 1-Jan-99 | ||||||||
5 | 24-Apr-15 | 26-Apr-15 | ||||||||
5 | 27-Apr-15 | 30-Apr-15 | 1-May-15 | 1-Jan-99 | ||||||
5 | ||||||||||
validity | ids 1,2 are valid | ids 1,2,3 are valid | ids 1,2,3,4 are valid | ids 2,4,5 are valid | ids 4, 5 are vaid |
En, Interesting question. 01-Jan-99 is 01-Jan-1999 ?
DATA iddates; INFORMAT pid $1. startdt enddt Date9.; INPUT pid startdt enddt; FORMAT startdt enddt DDMMYY8.; DATALINES; 1 26-Feb-15 26-Apr-15 2 26-Feb-15 30-Apr-15 3 13-Apr-15 23-Apr-15 4 24-Apr-15 01-Jan-99 5 24-Apr-15 01-Jan-99 ; run; data key; set iddates(keep=startdt rename=(startdt=date)) iddates(keep=enddt rename=(enddt=date) in=inb); if inb then date=date+1; run; data temp; if _n_ eq 1 then do; declare hash h(dataset:'key'); h.definekey('date'); h.definedone(); end; set iddates; do date = startdt to enddt; if h.check()=0 then group+1;output; end; keep pid group date; run; data want; set temp; by group; retain start; if first.group then start=date; if last.group then do;end=date;output;end; format start end date9.; drop group date; run;
Xia Keshan
Hi Keshan,
99 is actually 9999, I've updated the question.
Your answer looks very valid, will check more and update.
Thanks a lot.
En. I think my code could work. Did you run my code on your new data ?
It does, but the problem is when it does startdt to enddt where enddt bcomes 9999-01-01 it creates huge temp dataset which makes the solution unnecessarily data intensive. Trying to find a way to resolve the issue.
The simplest way is to change '01-Jan-9999'd into a quit large value like '01-Jan-3000'd , then run my code .
Another way is assuming there are no date greater than current day - TODAY() , except '01-Jan-3000'd . then try this one :
data key;
set iddates(keep=startdt rename=(startdt=date))
iddates(keep=enddt rename=(enddt=date) in=inb);
if inb then date=date+1;
run;
data temp;
if _n_ eq 1 then do;
declare hash h(dataset:'key');
h.definekey('date');
h.definedone();
end;
set iddates;
do date = startdt to enddt;
if h.check()=0 then group+1;
if date le today() or year(date)=9999 then output;
end;
keep pid group date;
run;
data want;
set temp;
by group;
retain start;
if first.group then start=date;
if last.group then do;end=date;output;end;
format start end date9.;
drop group date;
run;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.