BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tsengj
Calcite | Level 5

Hi all,

been thinking about this the whole week. your thoughts or help is much appreciated.

If i have a table with the following records,

ID     Start Date     End Date

1     1-July-2011     5-july-2011

1     6-july-2011      10-july-2011

1     11-july-2011    16-july-2011

2     1-july-2011      15-july-2011

2     17-july-2011     20-july-2011

How best should i code SAS to return only 3 rows with min and max date for the 2 ids above under the condition that only where dates are connected. I am expecting the following results.

1     1-july-2011     16-july-2011

2     1-july-2011     15-july-2011

2     17-july-2011     20-july-2011

Please help!,

Stuck!

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

It's probably too late at night for me to be looking at your problem, and I've probably made it far too complex, and I may not have included all possible conditions, but the following appears to do what you want.

I'd REALLY suggest testing it with your real data and looking over the results quite carefully:

data have;

  informat Start_Date End_Date anydtdte21.;

  format Start_Date End_Date date9.;

  input id Start_Date End_Date;

  cards;

1     1-Jul-2011     5-jul-2011

1     6-jul-2011      10-jul-2011

1     11-jul-2011    16-jul-2011

2     1-jul-2011      15-jul-2011

2     17-jul-2011     20-jul-2011

;

data want (keep=id Start_Date End_Date);

  set have (rename=(Start_Date=S_Date

                    End_Date=E_Date));

  format Start_Date End_Date date9.;

  by id;

  retain Start_Date End_Date;

  last_E_Date=lag(E_Date);

  if first.id then do;

    Start_Date=S_Date;

    End_Date=E_Date;

  end;

  else if last_E_Date+1 eq S_Date then do;

    End_Date=E_Date;

    if last.id then output;

  end;

  else if last_E_Date+1 ne S_date then do;

    output;

    Start_Date=S_Date;

    End_Date=E_Date;

    if last.id then output;

  end;

run;

View solution in original post

8 REPLIES 8
art297
Opal | Level 21

It's probably too late at night for me to be looking at your problem, and I've probably made it far too complex, and I may not have included all possible conditions, but the following appears to do what you want.

I'd REALLY suggest testing it with your real data and looking over the results quite carefully:

data have;

  informat Start_Date End_Date anydtdte21.;

  format Start_Date End_Date date9.;

  input id Start_Date End_Date;

  cards;

1     1-Jul-2011     5-jul-2011

1     6-jul-2011      10-jul-2011

1     11-jul-2011    16-jul-2011

2     1-jul-2011      15-jul-2011

2     17-jul-2011     20-jul-2011

;

data want (keep=id Start_Date End_Date);

  set have (rename=(Start_Date=S_Date

                    End_Date=E_Date));

  format Start_Date End_Date date9.;

  by id;

  retain Start_Date End_Date;

  last_E_Date=lag(E_Date);

  if first.id then do;

    Start_Date=S_Date;

    End_Date=E_Date;

  end;

  else if last_E_Date+1 eq S_Date then do;

    End_Date=E_Date;

    if last.id then output;

  end;

  else if last_E_Date+1 ne S_date then do;

    output;

    Start_Date=S_Date;

    End_Date=E_Date;

    if last.id then output;

  end;

run;

tsengj
Calcite | Level 5

Thanks Art!,

that would work if i have only 1 broken date series (i.e. dates 1-5 and 7-10). Would that still work if i multiple broken date series? (i.e. 1-5, 7-10, 11-16, 18-20, 21-25).

I suspect I may need to throw in a do-loop counter in there... but where?

Art may already be asleep, anyone else has any ideas?

thanks heaps!,

Almost there...

art297
Opal | Level 21

It worked for all of the cases in your new example, but you could have tested that yourself.

DF
Fluorite | Level 6 DF
Fluorite | Level 6

Something you might want to watch for is missing or overlapping date ranges.  I often use similarly structured data to your example and this is something I have to keep an eye out for.

Also Art's code would required your input data be correctly sorted.

data_null__
Jade | Level 19

I usually like to take a data intensive approach.  If you expand the data to one obs per day you can easily handle overlaps, by discarding duplicate days.  You can also easily adjust for a GAP size other than 1 day. 

data have;

   input ID  (Start End) (:date.);

   format S: E: date11.;

   cards;

1  1-jul-2011  5-jul-2011

1  6-jul-2011 10-jul-2011

1 11-jul-2011 16-jul-2011

2  1-jul-2011 15-jul-2011

2 17-jul-2011 20-jul-2011

;;;;

   run;

proc print;

   run;

data expandV(keep=id date) / view=expandV;

   set have;

   do date = start to end;

      output;

      end;

   format date date11.;

   run;

proc sort data=expandV out=expand nodupkey;

   by id date;

   run;

data expandV(keep=id date run) / view=expandV;

   set expand;

   by id date;

   dif = dif(date);

   if first.id then do;

      dif=1;

      run=1;

      end;

   if dif gt 1 then run + 1;

   run;

proc summary nway data=expandV;

   by id run;

   output out=flat(drop=_type_) min(date)=Start max(date)=End;

   run;

proc print;

   run;

Fisher
Quartz | Level 8

data test;

input id startdt enddt;

informat startdt enddt date9.;

format startdt enddt date9.;

datalines;

  1     01Jul2011     05jul2011

  1     06jul2011     10jul2011

  1     11jul2011     16jul2011

  2     01jul2011     15jul2011

  2     017jul2011    20jul2011

;

run;

data test1;

  set test;

  by id;

  flag = (startdt - lag(enddt)=1);

 

run;

proc sql;

  create table test3 as

  select id, min(startdt) as startdt format date9., max(enddt) as enddt format date9.

  from test1 

  group by id

  having count(id)-sum(flag)=1

  union

  select id, startdt, enddt from test1

  where id not in (select distinct id from test1 group by id having count(id)-sum(flag)=1); 

quit;

Ksharp
Super User

How about:

data have;
   input ID  (Start End) (:date12.);
   format S: E: date9.;
   cards;
1  1-jul-2011  5-jul-2011
1  6-jul-2011 10-jul-2011
1 11-jul-2011 16-jul-2011
2  1-jul-2011 15-jul-2011
2 17-jul-2011 20-jul-2011
;;;;
   run;
data want;
 set have;
 if (id eq lag(id) and start-lag(end) gt 1 ) or (id ne lag(id)) then flag+1;
run;
data want(keep=id _start end);
 set want;
 by flag;
 retain _start;
 if first.flag then _start=start;
 if last.flag then output;
 format _start date9.;
run;

Ksharp

Sorry Find a problem in my origin code.

消息编辑者为:xia keshan

消息编辑者为:xia keshan still a problem

tsengj
Calcite | Level 5

Thank you all,

appreciate your responses. Will try these codes when I return back to my desk!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1623 views
  • 1 like
  • 6 in conversation