Solved: Re: Please Help! - Min Max dates only when Records have connecting da...

tsengj · Posted 09-11-2011 10:54 PM

Hi all,

been thinking about this the whole week. your thoughts or help is much appreciated.

If i have a table with the following records,

ID Start Date End Date

1 1-July-2011 5-july-2011

1 6-july-2011 10-july-2011

1 11-july-2011 16-july-2011

2 1-july-2011 15-july-2011

2 17-july-2011 20-july-2011

How best should i code SAS to return only 3 rows with min and max date for the 2 ids above under the condition that only where dates are connected. I am expecting the following results.

1 1-july-2011 16-july-2011

2 1-july-2011 15-july-2011

2 17-july-2011 20-july-2011

Please help!,

Stuck!

art297 · Posted 09-11-2011 11:40 PM

It's probably too late at night for me to be looking at your problem, and I've probably made it far too complex, and I may not have included all possible conditions, but the following appears to do what you want.

I'd REALLY suggest testing it with your real data and looking over the results quite carefully:

data have;

informat Start_Date End_Date anydtdte21.;

format Start_Date End_Date date9.;

input id Start_Date End_Date;

cards;

1 1-Jul-2011 5-jul-2011

1 6-jul-2011 10-jul-2011

1 11-jul-2011 16-jul-2011

2 1-jul-2011 15-jul-2011

2 17-jul-2011 20-jul-2011

;

data want (keep=id Start_Date End_Date);

set have (rename=(Start_Date=S_Date

End_Date=E_Date));

format Start_Date End_Date date9.;

by id;

retain Start_Date End_Date;

last_E_Date=lag(E_Date);

if first.id then do;

Start_Date=S_Date;

End_Date=E_Date;

end;

else if last_E_Date+1 eq S_Date then do;

End_Date=E_Date;

if last.id then output;

end;

else if last_E_Date+1 ne S_date then do;

output;

Start_Date=S_Date;

End_Date=E_Date;

if last.id then output;

end;

run;

View solution in original post

art297 · Posted 09-11-2011 11:40 PM

It's probably too late at night for me to be looking at your problem, and I've probably made it far too complex, and I may not have included all possible conditions, but the following appears to do what you want.

I'd REALLY suggest testing it with your real data and looking over the results quite carefully:

data have;

informat Start_Date End_Date anydtdte21.;

format Start_Date End_Date date9.;

input id Start_Date End_Date;

cards;

1 1-Jul-2011 5-jul-2011

1 6-jul-2011 10-jul-2011

1 11-jul-2011 16-jul-2011

2 1-jul-2011 15-jul-2011

2 17-jul-2011 20-jul-2011

;

data want (keep=id Start_Date End_Date);

set have (rename=(Start_Date=S_Date

End_Date=E_Date));

format Start_Date End_Date date9.;

by id;

retain Start_Date End_Date;

last_E_Date=lag(E_Date);

if first.id then do;

Start_Date=S_Date;

End_Date=E_Date;

end;

else if last_E_Date+1 eq S_Date then do;

End_Date=E_Date;

if last.id then output;

end;

else if last_E_Date+1 ne S_date then do;

output;

Start_Date=S_Date;

End_Date=E_Date;

if last.id then output;

end;

run;

tsengj · Posted 09-12-2011 12:20 AM

Thanks Art!,

that would work if i have only 1 broken date series (i.e. dates 1-5 and 7-10). Would that still work if i multiple broken date series? (i.e. 1-5, 7-10, 11-16, 18-20, 21-25).

I suspect I may need to throw in a do-loop counter in there... but where?

Art may already be asleep, anyone else has any ideas?

thanks heaps!,

Almost there...

art297 · Posted 09-12-2011 05:29 AM

It worked for all of the cases in your new example, but you could have tested that yourself.

DF · Posted 09-13-2011 09:33 AM

Something you might want to watch for is missing or overlapping date ranges. I often use similarly structured data to your example and this is something I have to keep an eye out for.

Also Art's code would required your input data be correctly sorted.

data_null__ · Posted 09-13-2011 10:53 AM

I usually like to take a data intensive approach. If you expand the data to one obs per day you can easily handle overlaps, by discarding duplicate days. You can also easily adjust for a GAP size other than 1 day.

data have;
   input ID  (Start End) (:date.);
   format S: E: date11.;
   cards;
1  1-jul-2011  5-jul-2011
1  6-jul-2011 10-jul-2011
1 11-jul-2011 16-jul-2011
2  1-jul-2011 15-jul-2011
2 17-jul-2011 20-jul-2011
;;;;
   run;
proc print;
   run;
data expandV(keep=id date) / view=expandV;
   set have;
   do date = start to end;
      output;
      end;
   format date date11.;
   run;
proc sort data=expandV out=expand nodupkey;
   by id date;
   run;
data expandV(keep=id date run) / view=expandV;
   set expand;
   by id date;
   dif = dif(date);
   if first.id then do;
      dif=1;
      run=1;
      end;
   if dif gt 1 then run + 1;
   run;
proc summary nway data=expandV;
   by id run;
   output out=flat(drop=_type_) min(date)=Start max(date)=End;
   run;
proc print;
   run;

Fisher · Posted 09-13-2011 01:59 PM

data test;

input id startdt enddt;

informat startdt enddt date9.;

format startdt enddt date9.;

datalines;

1 01Jul2011 05jul2011

1 06jul2011 10jul2011

1 11jul2011 16jul2011

2 01jul2011 15jul2011

2 017jul2011 20jul2011

;

run;

data test1;

set test;

by id;

flag = (startdt - lag(enddt)=1);

run;

proc sql;

create table test3 as

select id, min(startdt) as startdt format date9., max(enddt) as enddt format date9.

from test1

group by id

having count(id)-sum(flag)=1

union

select id, startdt, enddt from test1

where id not in (select distinct id from test1 group by id having count(id)-sum(flag)=1);

quit;

Ksharp · Posted 09-14-2011 07:25 AM

How about:

data have;
   input ID  (Start End) (:date12.);
   format S: E: date9.;
   cards;
1  1-jul-2011  5-jul-2011
1  6-jul-2011 10-jul-2011
1 11-jul-2011 16-jul-2011
2  1-jul-2011 15-jul-2011
2 17-jul-2011 20-jul-2011
;;;;
   run;
data want;
 set have;
 if (id eq lag(id) and start-lag(end) gt 1 ) or (id ne lag(id)) then flag+1;
run;
data want(keep=id _start end);
 set want;
 by flag;
 retain _start;
 if first.flag then _start=start;
 if last.flag then output;
 format _start date9.;
run;

Ksharp

Sorry Find a problem in my origin code.

消息编辑者为：xia keshan

消息编辑者为：xia keshan still a problem

tsengj · Posted 09-16-2011 12:41 AM

Thank you all,

appreciate your responses. Will try these codes when I return back to my desk!

Please Help! - Min Max dates only when Records have connecting dates

Please Help! - Min Max dates only when Records have connecting dates

Please Help! - Min Max dates only when Records have connecting dates

Re: Please Help! - Min Max dates only when Records have connecting dates

Re: Please Help! - Min Max dates only when Records have connecting dates

Please Help! - Min Max dates only when Records have connecting dates

Re: Please Help! - Min Max dates only when Records have connecting dates

Re: Please Help! - Min Max dates only when Records have connecting dates

Re: Please Help! - Min Max dates only when Records have connecting dates

Re: Please Help! - Min Max dates only when Records have connecting dates

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away