DATA Step, Macro, Functions and more

Getting Consecutive Days

Reply
Super Contributor
Posts: 318

Getting Consecutive Days

I have two tables which I need to combine as sometimes some dates are found in table A and not in table B and vice versa. My desired result is that for those overlaps on consecutive days be combined.

How can I achieve this using proc SQL only?

Table A

id  startdate   enddate--------------------------
101 12/28/2013  12/31/2013

Table B

id  startdate   enddate--------------------------
101 12/15/2013  12/15/2013
101 12/16/2013  12/16/2013
101 12/28/2013  12/28/2013
101 12/29/2013  12/31/2013

Desired Result

id  startdate   enddate-------------------------
101 12/15/2013  12/16/2013
101 12/28/2013  12/31/2013

 

Super User
Posts: 10,500

Re: Getting Consecutive Days

It appears that you are collapsing intervals from data set B. Are you doing the same from dataset A? You should include an example for the data set A that clearly isn't in B so we have some more idea what is going on as I can see the final result coming only from B.

 

And why a restriction for proc sql only? Since order is important that restriction may add complexity to code that could be very straight forward in data step code.

Respected Advisor
Posts: 3,124

Re: Getting Consecutive Days

[ Edited ]

Sometimes solving problems are all about choosing the right tools. Like @ballardw mentioned, data step is the right tool for this kind of task. That said, it still can be done using Proc SQL, but you may not like the complexity of the code:

proc sql;
	create table have as
		select * from table_a
			union corr
		select * from table_b
	;
quit;

proc sql;
	create table want_sql as    
		select distinct id, min(startdate, (select min(startdate) from have where id=a.id and (startdate between a.startdate-1 and a.enddate+1 
			or enddate between a.startdate and a.enddate or a.startdate between startdate-1 and enddate+1 or a.enddate between startdate and enddate))) as startdate format=date9.,
			max(enddate, (select max(enddate) from have where id=a.id and (startdate between a.startdate-1 and a.enddate+1 
			or enddate between a.startdate and a.enddate or a.startdate between startdate-1 and enddate+1 or a.enddate between startdate and enddate))) as enddate format=date9.
		from have a
	;
quit;

 

 The idea is simple: try to group the interval; while the implementaion is strenous. Good Luck!

Contributor hbi
Contributor
Posts: 66

Re: Getting Consecutive Days

Hi,

 

Below is another pure PROC SQL approach. It uses a self-join twice to (detect earlier and later rows) and enables two rows being collapsed into one. However, it does not deal with collapsing three or more rows into one; not sure if that kind of scenario would happen in your data, based on the example you provided. 

 

PROC SQL;
  CREATE TABLE have (
    id        num, 
    startdate num format=date11., 
    enddate   num format=date11.);
QUIT;


PROC SQL;
INSERT INTO have
values (101, '15-Dec-2013'd, '15-Dec-2013'd)
values (101, '16-Dec-2013'd, '16-Dec-2013'd)
values (101, '28-Dec-2013'd, '28-Dec-2013'd)
values (101, '29-Dec-2013'd, '31-Dec-2013'd)
/* add some additional data for good measure ... */
/* note that there are 3 consecutive rows (1/15 to 1/28/2014) */
/* values (201, '15-Jan-2014'd, '17-Jan-2014'd) */
/* values (201, '18-Jan-2014'd, '26-Jan-2014'd) */
/* values (201, '27-Jan-2014'd, '28-Jan-2014'd) */
/* values (201, '01-Feb-2014'd, '01-Feb-2014'd) */
;
QUIT;


PROC SQL;
  CREATE TABLE want AS 
  SELECT id
       , startdate
       , coalesce(nextRow_enddate, enddate) format=date11. AS wanted_enddate
  /* subquery joins current row to earlier row and future row (if they exist) */
  FROM (SELECT aa.id
             , aa.startdate
             , aa.enddate
             , monotonic()   as seq_num
             , next.enddate  as nextRow_enddate
             , prior.enddate as priorRow_enddate
        FROM have as aa
        LEFT JOIN have as next
               ON next.id = aa.id
              AND next.startdate = aa.enddate+1
        LEFT JOIN have as prior
               ON prior.id = aa.id
              AND prior.enddate = aa.startdate-1
        WHERE priorRow_enddate IS NULL) as qry;
QUIT;

 

Ask a Question
Discussion stats
  • 3 replies
  • 217 views
  • 1 like
  • 4 in conversation