DATA Step, Macro, Functions and more

Please Help! - Min Max dates only when Records have connecting dates

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Please Help! - Min Max dates only when Records have connecting dates

Hi all,

been thinking about this the whole week. your thoughts or help is much appreciated.

If i have a table with the following records,

ID     Start Date     End Date

1     1-July-2011     5-july-2011

1     6-july-2011      10-july-2011

1     11-july-2011    16-july-2011

2     1-july-2011      15-july-2011

2     17-july-2011     20-july-2011

How best should i code SAS to return only 3 rows with min and max date for the 2 ids above under the condition that only where dates are connected. I am expecting the following results.

1     1-july-2011     16-july-2011

2     1-july-2011     15-july-2011

2     17-july-2011     20-july-2011

Please help!,

Stuck!


Accepted Solutions
Solution
‎09-11-2011 11:40 PM
PROC Star
Posts: 7,363

Please Help! - Min Max dates only when Records have connecting dates

It's probably too late at night for me to be looking at your problem, and I've probably made it far too complex, and I may not have included all possible conditions, but the following appears to do what you want.

I'd REALLY suggest testing it with your real data and looking over the results quite carefully:

data have;

  informat Start_Date End_Date anydtdte21.;

  format Start_Date End_Date date9.;

  input id Start_Date End_Date;

  cards;

1     1-Jul-2011     5-jul-2011

1     6-jul-2011      10-jul-2011

1     11-jul-2011    16-jul-2011

2     1-jul-2011      15-jul-2011

2     17-jul-2011     20-jul-2011

;

data want (keep=id Start_Date End_Date);

  set have (rename=(Start_Date=S_Date

                    End_Date=E_Date));

  format Start_Date End_Date date9.;

  by id;

  retain Start_Date End_Date;

  last_E_Date=lag(E_Date);

  if first.id then do;

    Start_Date=S_Date;

    End_Date=E_Date;

  end;

  else if last_E_Date+1 eq S_Date then do;

    End_Date=E_Date;

    if last.id then output;

  end;

  else if last_E_Date+1 ne S_date then do;

    output;

    Start_Date=S_Date;

    End_Date=E_Date;

    if last.id then output;

  end;

run;

View solution in original post


All Replies
Solution
‎09-11-2011 11:40 PM
PROC Star
Posts: 7,363

Please Help! - Min Max dates only when Records have connecting dates

It's probably too late at night for me to be looking at your problem, and I've probably made it far too complex, and I may not have included all possible conditions, but the following appears to do what you want.

I'd REALLY suggest testing it with your real data and looking over the results quite carefully:

data have;

  informat Start_Date End_Date anydtdte21.;

  format Start_Date End_Date date9.;

  input id Start_Date End_Date;

  cards;

1     1-Jul-2011     5-jul-2011

1     6-jul-2011      10-jul-2011

1     11-jul-2011    16-jul-2011

2     1-jul-2011      15-jul-2011

2     17-jul-2011     20-jul-2011

;

data want (keep=id Start_Date End_Date);

  set have (rename=(Start_Date=S_Date

                    End_Date=E_Date));

  format Start_Date End_Date date9.;

  by id;

  retain Start_Date End_Date;

  last_E_Date=lag(E_Date);

  if first.id then do;

    Start_Date=S_Date;

    End_Date=E_Date;

  end;

  else if last_E_Date+1 eq S_Date then do;

    End_Date=E_Date;

    if last.id then output;

  end;

  else if last_E_Date+1 ne S_date then do;

    output;

    Start_Date=S_Date;

    End_Date=E_Date;

    if last.id then output;

  end;

run;

Occasional Contributor
Posts: 7

Re: Please Help! - Min Max dates only when Records have connecting dates

Thanks Art!,

that would work if i have only 1 broken date series (i.e. dates 1-5 and 7-10). Would that still work if i multiple broken date series? (i.e. 1-5, 7-10, 11-16, 18-20, 21-25).

I suspect I may need to throw in a do-loop counter in there... but where?

Art may already be asleep, anyone else has any ideas?

thanks heaps!,

Almost there...

PROC Star
Posts: 7,363

Re: Please Help! - Min Max dates only when Records have connecting dates

It worked for all of the cases in your new example, but you could have tested that yourself.

Frequent Contributor
Frequent Contributor
Posts: 94

Please Help! - Min Max dates only when Records have connecting dates

Something you might want to watch for is missing or overlapping date ranges.  I often use similarly structured data to your example and this is something I have to keep an eye out for.

Also Art's code would required your input data be correctly sorted.

Respected Advisor
Posts: 3,777

Re: Please Help! - Min Max dates only when Records have connecting dates

I usually like to take a data intensive approach.  If you expand the data to one obs per day you can easily handle overlaps, by discarding duplicate days.  You can also easily adjust for a GAP size other than 1 day. 

data have;

   input ID  (Start End) (:date.);

   format S: E: date11.;

   cards;

1  1-jul-2011  5-jul-2011

1  6-jul-2011 10-jul-2011

1 11-jul-2011 16-jul-2011

2  1-jul-2011 15-jul-2011

2 17-jul-2011 20-jul-2011

;;;;

   run;

proc print;

   run;

data expandV(keep=id date) / view=expandV;

   set have;

   do date = start to end;

      output;

      end;

   format date date11.;

   run;

proc sort data=expandV out=expand nodupkey;

   by id date;

   run;

data expandV(keep=id date run) / view=expandV;

   set expand;

   by id date;

   dif = dif(date);

   if first.id then do;

      dif=1;

      run=1;

      end;

   if dif gt 1 then run + 1;

   run;

proc summary nway data=expandV;

   by id run;

   output out=flat(drop=_type_) min(date)=Start max(date)=End;

   run;

proc print;

   run;

Contributor
Posts: 70

Re: Please Help! - Min Max dates only when Records have connecting dates

data test;

input id startdt enddt;

informat startdt enddt date9.;

format startdt enddt date9.;

datalines;

  1     01Jul2011     05jul2011

  1     06jul2011     10jul2011

  1     11jul2011     16jul2011

  2     01jul2011     15jul2011

  2     017jul2011    20jul2011

;

run;

data test1;

  set test;

  by id;

  flag = (startdt - lag(enddt)=1);

 

run;

proc sql;

  create table test3 as

  select id, min(startdt) as startdt format date9., max(enddt) as enddt format date9.

  from test1 

  group by id

  having count(id)-sum(flag)=1

  union

  select id, startdt, enddt from test1

  where id not in (select distinct id from test1 group by id having count(id)-sum(flag)=1); 

quit;

Super User
Posts: 9,681

Re: Please Help! - Min Max dates only when Records have connecting dates

How about:

data have;
   input ID  (Start End) (:date12.);
   format S: E: date9.;
   cards;
1  1-jul-2011  5-jul-2011
1  6-jul-2011 10-jul-2011
1 11-jul-2011 16-jul-2011
2  1-jul-2011 15-jul-2011
2 17-jul-2011 20-jul-2011
;;;;
   run;
data want;
 set have;
 if (id eq lag(id) and start-lag(end) gt 1 ) or (id ne lag(id)) then flag+1;
run;
data want(keep=id _start end);
 set want;
 by flag;
 retain _start;
 if first.flag then _start=start;
 if last.flag then output;
 format _start date9.;
run;

Ksharp

Sorry Find a problem in my origin code.

消息编辑者为:xia keshan

消息编辑者为:xia keshan still a problem

Occasional Contributor
Posts: 7

Re: Please Help! - Min Max dates only when Records have connecting dates

Thank you all,

appreciate your responses. Will try these codes when I return back to my desk!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 843 views
  • 1 like
  • 6 in conversation