## DATA Step, Macro, Functions and more

Solved
Occasional Contributor
Posts: 7

Hi all,

If i have a table with the following records,

ID     Start Date     End Date

1     1-July-2011     5-july-2011

1     6-july-2011      10-july-2011

1     11-july-2011    16-july-2011

2     1-july-2011      15-july-2011

2     17-july-2011     20-july-2011

How best should i code SAS to return only 3 rows with min and max date for the 2 ids above under the condition that only where dates are connected. I am expecting the following results.

1     1-july-2011     16-july-2011

2     1-july-2011     15-july-2011

2     17-july-2011     20-july-2011

Stuck!

Accepted Solutions
Solution
‎09-11-2011 11:40 PM
PROC Star
Posts: 8,164

It's probably too late at night for me to be looking at your problem, and I've probably made it far too complex, and I may not have included all possible conditions, but the following appears to do what you want.

I'd REALLY suggest testing it with your real data and looking over the results quite carefully:

data have;

informat Start_Date End_Date anydtdte21.;

format Start_Date End_Date date9.;

input id Start_Date End_Date;

cards;

1     1-Jul-2011     5-jul-2011

1     6-jul-2011      10-jul-2011

1     11-jul-2011    16-jul-2011

2     1-jul-2011      15-jul-2011

2     17-jul-2011     20-jul-2011

;

data want (keep=id Start_Date End_Date);

set have (rename=(Start_Date=S_Date

End_Date=E_Date));

format Start_Date End_Date date9.;

by id;

retain Start_Date End_Date;

last_E_Date=lag(E_Date);

if first.id then do;

Start_Date=S_Date;

End_Date=E_Date;

end;

else if last_E_Date+1 eq S_Date then do;

End_Date=E_Date;

if last.id then output;

end;

else if last_E_Date+1 ne S_date then do;

output;

Start_Date=S_Date;

End_Date=E_Date;

if last.id then output;

end;

run;

All Replies
Solution
‎09-11-2011 11:40 PM
PROC Star
Posts: 8,164

It's probably too late at night for me to be looking at your problem, and I've probably made it far too complex, and I may not have included all possible conditions, but the following appears to do what you want.

I'd REALLY suggest testing it with your real data and looking over the results quite carefully:

data have;

informat Start_Date End_Date anydtdte21.;

format Start_Date End_Date date9.;

input id Start_Date End_Date;

cards;

1     1-Jul-2011     5-jul-2011

1     6-jul-2011      10-jul-2011

1     11-jul-2011    16-jul-2011

2     1-jul-2011      15-jul-2011

2     17-jul-2011     20-jul-2011

;

data want (keep=id Start_Date End_Date);

set have (rename=(Start_Date=S_Date

End_Date=E_Date));

format Start_Date End_Date date9.;

by id;

retain Start_Date End_Date;

last_E_Date=lag(E_Date);

if first.id then do;

Start_Date=S_Date;

End_Date=E_Date;

end;

else if last_E_Date+1 eq S_Date then do;

End_Date=E_Date;

if last.id then output;

end;

else if last_E_Date+1 ne S_date then do;

output;

Start_Date=S_Date;

End_Date=E_Date;

if last.id then output;

end;

run;

Occasional Contributor
Posts: 7

Thanks Art!,

that would work if i have only 1 broken date series (i.e. dates 1-5 and 7-10). Would that still work if i multiple broken date series? (i.e. 1-5, 7-10, 11-16, 18-20, 21-25).

I suspect I may need to throw in a do-loop counter in there... but where?

Art may already be asleep, anyone else has any ideas?

thanks heaps!,

Almost there...

PROC Star
Posts: 8,164

It worked for all of the cases in your new example, but you could have tested that yourself.

Frequent Contributor
Posts: 94

Something you might want to watch for is missing or overlapping date ranges.  I often use similarly structured data to your example and this is something I have to keep an eye out for.

Also Art's code would required your input data be correctly sorted.

Posts: 3,852

I usually like to take a data intensive approach.  If you expand the data to one obs per day you can easily handle overlaps, by discarding duplicate days.  You can also easily adjust for a GAP size other than 1 day.

`data have;   input ID  (Start End) (:date.);   format S: E: date11.;   cards;1  1-jul-2011  5-jul-20111  6-jul-2011 10-jul-20111 11-jul-2011 16-jul-20112  1-jul-2011 15-jul-20112 17-jul-2011 20-jul-2011;;;;   run;proc print;   run;data expandV(keep=id date) / view=expandV;   set have;   do date = start to end;      output;      end;   format date date11.;   run;proc sort data=expandV out=expand nodupkey;   by id date;   run;data expandV(keep=id date run) / view=expandV;   set expand;   by id date;   dif = dif(date);   if first.id then do;      dif=1;      run=1;      end;   if dif gt 1 then run + 1;   run;proc summary nway data=expandV;   by id run;   output out=flat(drop=_type_) min(date)=Start max(date)=End;   run;proc print;   run;`
Contributor
Posts: 70

data test;

input id startdt enddt;

informat startdt enddt date9.;

format startdt enddt date9.;

datalines;

1     01Jul2011     05jul2011

1     06jul2011     10jul2011

1     11jul2011     16jul2011

2     01jul2011     15jul2011

2     017jul2011    20jul2011

;

run;

data test1;

set test;

by id;

flag = (startdt - lag(enddt)=1);

run;

proc sql;

create table test3 as

select id, min(startdt) as startdt format date9., max(enddt) as enddt format date9.

from test1

group by id

having count(id)-sum(flag)=1

union

select id, startdt, enddt from test1

where id not in (select distinct id from test1 group by id having count(id)-sum(flag)=1);

quit;

Super User
Posts: 10,778

```data have;
input ID  (Start End) (:date12.);
format S: E: date9.;
cards;
1  1-jul-2011  5-jul-2011
1  6-jul-2011 10-jul-2011
1 11-jul-2011 16-jul-2011
2  1-jul-2011 15-jul-2011
2 17-jul-2011 20-jul-2011
;;;;
run;
data want;
set have;
if (id eq lag(id) and start-lag(end) gt 1 ) or (id ne lag(id)) then flag+1;
run;
data want(keep=id _start end);
set want;
by flag;
retain _start;
if first.flag then _start=start;
if last.flag then output;
format _start date9.;
run;
```

Ksharp

Sorry Find a problem in my origin code.

Occasional Contributor
Posts: 7

Thank you all,

appreciate your responses. Will try these codes when I return back to my desk!

🔒 This topic is solved and locked.