I have two tables which I need to combine as sometimes some dates are found in table A and not in table B and vice versa. My desired result is that for those overlaps on consecutive days be combined.
How can I achieve this using proc SQL only?
Table A
id startdate enddate--------------------------
101 12/28/2013 12/31/2013
Table B
id startdate enddate--------------------------
101 12/15/2013 12/15/2013
101 12/16/2013 12/16/2013
101 12/28/2013 12/28/2013
101 12/29/2013 12/31/2013
Desired Result
id startdate enddate-------------------------
101 12/15/2013 12/16/2013
101 12/28/2013 12/31/2013
It appears that you are collapsing intervals from data set B. Are you doing the same from dataset A? You should include an example for the data set A that clearly isn't in B so we have some more idea what is going on as I can see the final result coming only from B.
And why a restriction for proc sql only? Since order is important that restriction may add complexity to code that could be very straight forward in data step code.
Sometimes solving problems are all about choosing the right tools. Like @ballardw mentioned, data step is the right tool for this kind of task. That said, it still can be done using Proc SQL, but you may not like the complexity of the code:
proc sql; create table have as select * from table_a union corr select * from table_b ; quit; proc sql; create table want_sql as select distinct id, min(startdate, (select min(startdate) from have where id=a.id and (startdate between a.startdate-1 and a.enddate+1 or enddate between a.startdate and a.enddate or a.startdate between startdate-1 and enddate+1 or a.enddate between startdate and enddate))) as startdate format=date9., max(enddate, (select max(enddate) from have where id=a.id and (startdate between a.startdate-1 and a.enddate+1 or enddate between a.startdate and a.enddate or a.startdate between startdate-1 and enddate+1 or a.enddate between startdate and enddate))) as enddate format=date9. from have a ; quit;
The idea is simple: try to group the interval; while the implementaion is strenous. Good Luck!
Hi,
Below is another pure PROC SQL approach. It uses a self-join twice to (detect earlier and later rows) and enables two rows being collapsed into one. However, it does not deal with collapsing three or more rows into one; not sure if that kind of scenario would happen in your data, based on the example you provided.
PROC SQL;
CREATE TABLE have (
id num,
startdate num format=date11.,
enddate num format=date11.);
QUIT;
PROC SQL;
INSERT INTO have
values (101, '15-Dec-2013'd, '15-Dec-2013'd)
values (101, '16-Dec-2013'd, '16-Dec-2013'd)
values (101, '28-Dec-2013'd, '28-Dec-2013'd)
values (101, '29-Dec-2013'd, '31-Dec-2013'd)
/* add some additional data for good measure ... */
/* note that there are 3 consecutive rows (1/15 to 1/28/2014) */
/* values (201, '15-Jan-2014'd, '17-Jan-2014'd) */
/* values (201, '18-Jan-2014'd, '26-Jan-2014'd) */
/* values (201, '27-Jan-2014'd, '28-Jan-2014'd) */
/* values (201, '01-Feb-2014'd, '01-Feb-2014'd) */
;
QUIT;
PROC SQL;
CREATE TABLE want AS
SELECT id
, startdate
, coalesce(nextRow_enddate, enddate) format=date11. AS wanted_enddate
/* subquery joins current row to earlier row and future row (if they exist) */
FROM (SELECT aa.id
, aa.startdate
, aa.enddate
, monotonic() as seq_num
, next.enddate as nextRow_enddate
, prior.enddate as priorRow_enddate
FROM have as aa
LEFT JOIN have as next
ON next.id = aa.id
AND next.startdate = aa.enddate+1
LEFT JOIN have as prior
ON prior.id = aa.id
AND prior.enddate = aa.startdate-1
WHERE priorRow_enddate IS NULL) as qry;
QUIT;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.