I am having a large dataset where I need to avoid overlapping time lines for each data group. My input dataset is as follows for data group A.
DATA iddates;
INFORMAT gid $1. pid $1. startdt enddt Date11.;
INPUT gid pid startdt enddt;
FORMAT startdt enddt DDMMYY10.;
DATALINES;
A 1 26-Feb-15 26-Apr-2015
A 2 26-Feb-15 30-Apr-2015
A 3 13-Apr-15 23-Apr-2015
A 4 24-Apr-15 01-Jan-9999
A 5 24-Apr-15 01-Jan-9999
run;
I need to avoid any intersections of timelines and arrange all time s in the ascending order having inserting new records to fill the gaps as follows. Any insights into a solution appreciated. Thank you all for any help.
Once done for Group A there can be one row with sequently aligned start times and end times.
DATA fillgaps;
INFORMAT pid $1. startdt enddt Date11.;
INPUT pid startdt enddt;
FORMAT startdt enddt DDMMYY10.;
DATALINES;
1 26-Feb-15 12-Apr-2015
1 13-Apr-15 23-Apr-2015
1 24-Apr-15 26-Apr-2015
2 26-Feb-15 12-Apr-2015
2 13-Apr-15 23-Apr-2015
2 24-Apr-15 26-Apr-2015
2 27-Apr-15 30-Apr-2015
3 13-Apr-15 23-Apr-2015
4 24-Apr-15 26-Apr-2015
4 27-Apr-15 30-Apr-2015
4 01-May-15 01-Jan-9999
5 24-Apr-15 26-Apr-2015
5 27-Apr-15 30-Apr-2015
5 01-May-15 01-Jan-9999
run;
The simplest way is to change '01-Jan-9999'd into a quit large value like '01-Jan-3000'd , then run my code .
Another way is assuming there are no date greater than current day - TODAY() , except '01-Jan-3000'd . then try this one :
data key;
set iddates(keep=startdt rename=(startdt=date))
iddates(keep=enddt rename=(enddt=date) in=inb);
if inb then date=date+1;
run;
data temp;
if _n_ eq 1 then do;
declare hash h(dataset:'key');
h.definekey('date');
h.definedone();
end;
set iddates;
do date = startdt to enddt;
if h.check()=0 then group+1;
if date le today() or year(date)=9999 then output;
end;
keep pid group date;
run;
data want;
set temp;
by group;
retain start;
if first.group then start=date;
if last.group then do;end=date;output;end;
format start end date9.;
drop group date;
run;
Xia Keshan
Hi,
I am not sure about your logic. Why does ID 3 overlap with one, but ID 2 does not? Here is a program which will give you a transposed table, one column per id, each row being one day. It is then a simple matter of going through the dataset to find start stop dates.
DATA iddates;
INFORMAT pid $1. startdt enddt Date9.;
INPUT pid startdt enddt;
FORMAT startdt enddt DDMMYY8.;
DATALINES;
1 26-Feb-15 26-Apr-15
2 26-Feb-15 30-Apr-15
3 13-Apr-15 23-Apr-15
4 24-Apr-15 01-Jan-99
5 24-Apr-15 01-Jan-99
run;
data inter;
set iddates;
do i=startdt to enddt;
output;
end;
format i date9.;
run;
proc sort data=inter;
by i pid;
run;
proc transpose data=inter out=i_inter;
by i;
var pid;
run;
Hi,
Actually any date range can overlap another, I just update the question to specify the point this is sample data for one group A. once dates are aligned for group A there are only sequently aligned start times and end times without overlapping each other.
in your solution when we do startdt to enddt , for observations where enddt=01-Jan-9999 gives extremely large number of observations to work out.
Why are you giving an end date of 01jan9999 then? Why not just replace that:
data inter;
set iddates;
if enddt="01jan9999"d then enddt=today();
do i=startdt to enddt;
output;
end;
format i date9.;
run;
I didn't understand your other point.
Unable to change the dates, since it will affect the validity of other ids, I need to find out valid ids on a given date by creating time lines as below
id | ||||||||||
1 | 26-Feb-15 | 12-Apr-15 | ||||||||
1 | 13-Apr-15 | 23-Apr-15 | ||||||||
1 | 24-Apr-15 | 26-Apr-15 | ||||||||
2 | 26-Feb-15 | 12-Apr-15 | ||||||||
2 | 13-Apr-15 | 23-Apr-15 | ||||||||
2 | 24-Apr-15 | 26-Apr-15 | ||||||||
2 | 27-Apr-15 | 30-Apr-15 | ||||||||
3 | 13-Apr-15 | 23-Apr-15 | ||||||||
4 | 24-Apr-15 | 26-Apr-15 | ||||||||
4 | 27-Apr-15 | 30-Apr-15 | ||||||||
4 | 1-May-15 | 1-Jan-99 | ||||||||
5 | 24-Apr-15 | 26-Apr-15 | ||||||||
5 | 27-Apr-15 | 30-Apr-15 | 1-May-15 | 1-Jan-99 | ||||||
5 | ||||||||||
validity | ids 1,2 are valid | ids 1,2,3 are valid | ids 1,2,3,4 are valid | ids 2,4,5 are valid | ids 4, 5 are vaid |
En, Interesting question. 01-Jan-99 is 01-Jan-1999 ?
DATA iddates; INFORMAT pid $1. startdt enddt Date9.; INPUT pid startdt enddt; FORMAT startdt enddt DDMMYY8.; DATALINES; 1 26-Feb-15 26-Apr-15 2 26-Feb-15 30-Apr-15 3 13-Apr-15 23-Apr-15 4 24-Apr-15 01-Jan-99 5 24-Apr-15 01-Jan-99 ; run; data key; set iddates(keep=startdt rename=(startdt=date)) iddates(keep=enddt rename=(enddt=date) in=inb); if inb then date=date+1; run; data temp; if _n_ eq 1 then do; declare hash h(dataset:'key'); h.definekey('date'); h.definedone(); end; set iddates; do date = startdt to enddt; if h.check()=0 then group+1;output; end; keep pid group date; run; data want; set temp; by group; retain start; if first.group then start=date; if last.group then do;end=date;output;end; format start end date9.; drop group date; run;
Xia Keshan
Hi Keshan,
99 is actually 9999, I've updated the question.
Your answer looks very valid, will check more and update.
Thanks a lot.
En. I think my code could work. Did you run my code on your new data ?
It does, but the problem is when it does startdt to enddt where enddt bcomes 9999-01-01 it creates huge temp dataset which makes the solution unnecessarily data intensive. Trying to find a way to resolve the issue.
The simplest way is to change '01-Jan-9999'd into a quit large value like '01-Jan-3000'd , then run my code .
Another way is assuming there are no date greater than current day - TODAY() , except '01-Jan-3000'd . then try this one :
data key;
set iddates(keep=startdt rename=(startdt=date))
iddates(keep=enddt rename=(enddt=date) in=inb);
if inb then date=date+1;
run;
data temp;
if _n_ eq 1 then do;
declare hash h(dataset:'key');
h.definekey('date');
h.definedone();
end;
set iddates;
do date = startdt to enddt;
if h.check()=0 then group+1;
if date le today() or year(date)=9999 then output;
end;
keep pid group date;
run;
data want;
set temp;
by group;
retain start;
if first.group then start=date;
if last.group then do;end=date;output;end;
format start end date9.;
drop group date;
run;
Xia Keshan
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.