cancel
Showing results for 
Search instead for 
Did you mean: 

Merging two tables by choosing the CLOSEST dates

SOLVED
ilikesas
Barite | Level 11
Solved!

Merging two tables by choosing the CLOSEST dates

Hi, I have one file with company tickers and dates of different events, and each company can have several different events.

I have another file with each company's ticker and end of year report, say december 1 of a given year.

The first file looks like this:

dateticker
12/12/2008ABC
4/6/2008ABC
25/3/2010DEF
6/2/2002DEF

The Second file looks like this:

End of year reportticker              
1/12/2010ABC
1/12/2009ABC
1/12/2008ABC
1/12/2007ABC
1/12/2006ABC

What I want to do is for company ABC on 12/12/2008 in the first file to do a merge with the data from the second file for the same company but the row corresponding to 1/12/2008 because its the closest recent date.

For ABC on 4/6/2008 from the first file I would like to merge with ABC from the second file but row corresponding to 1/12/2007 because its the closest recent date here.

And so on for all the companies and dates

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21
Solution

Re: Merging two tables by choosing the CLOSEST dates

Message contains a hyperlink

Here is a slight modification of 's code:

data h1;

     infile cards dlm=',';

     input date :ddmmyy10. ticker $;

     format date ddmmyy10.;

     cards;

12/12/2008,ABC

4/6/2008,ABC

25/3/2010,DEF

6/2/2002,DEF

;

data h2;

     infile cards dlm=',';

     input date :ddmmyy10. ticker $;

     format date ddmmyy10.;

     cards;

1/12/2010,ABC

1/12/2009,ABC

1/12/2008,ABC

1/12/2007,ABC

1/12/2006,ABC

1/12/2010,DEF

1/12/2009,DEF

1/12/2008,DEF

1/12/2007,DEF

1/12/2001,DEF

;

proc sql;

  create table want as

    select a.date,a.ticker, b.date as rpt_dt format=ddmmyy10.

      from h1 a

        left join

          h2 b

            on a.ticker=b.ticker

              where b.date <= a.date

                group by a.date,a.ticker

                  having abs(a.date-b.date)=min(abs(a.date-b.date))

                    order by a.ticker, b.date descending

  ;

quit;

View solution in original post

14 REPLIES 14
Haikuo
Onyx | Level 15

Re: Merging two tables by choosing the CLOSEST dates

The following is NOT the most efficient way, so ask for Hash to speed it up:

data h1;

     infile cards dlm='09'x;

     input date :ddmmyy10. ticker $;

     format date ddmmyy10.;

     cards;

12/12/2008 ABC

4/6/2008   ABC

25/3/2010  DEF

6/2/2002   DEF

;

data h2;

     infile cards dlm='09'x;

     input date :ddmmyy10. ticker $;

     format date ddmmyy10.;

     cards;

1/12/2010  ABC

1/12/2009  ABC

1/12/2008  ABC

1/12/2007  ABC

1/12/2006  ABC

;

proc sql;

     create table want as

           select *, (select date from h2 where ticker=a.ticker and date <= a.date having a.date-date=min(a.date-date)) as rpt_dt format=ddmmyy10.

                from h1 a

     ;

quit;

ilikesas
Barite | Level 11

Re: Merging two tables by choosing the CLOSEST dates

Hi Hai and thank you for replying,

I ran the code that you posted but the result that I got is the following:

1.4/6/2008.
2.6/2/2002.
obsdatetickerrpt_dt

what I wish that I could have is for example the row (with all columns) from h1 of say ABC 12/12/2008 merged with the row (with all columns) of ABC 1/12/2008 from h2

Thank you!


Haikuo
Onyx | Level 15

Re: Merging two tables by choosing the CLOSEST dates

when doing "copy and paste" from HTML to text (or vise versa) , sometimes those unprinted symbols are converted (eg. blank vs tab). So in this case, it is Tab converting to Blanks. So go ahead remove "infile" statement.

data h1;

  input date :ddmmyy10. ticker $;

  format date ddmmyy10.;

  cards;

12/12/2008 ABC

4/6/2008 ABC

25/3/2010 DEF

6/2/2002 DEF

;

data h2;

  input date :ddmmyy10. ticker $;

  format date ddmmyy10.;

  cards;

1/12/2010 ABC

1/12/2009 ABC

1/12/2008 ABC

1/12/2007 ABC

1/12/2006 ABC

;

proc sql;

  create table want as

  select *, (select date from h2 where ticker=a.ticker and date <= a.date having a.date-date=min(a.date-date)) as rpt_dt format=ddmmyy10.

  from h1 a

  ;

quit;

ilikesas
Barite | Level 11

Re: Merging two tables by choosing the CLOSEST dates

I did the code and it works thanks a lot !!!

Now suppose I have in each file an additional column:

h1

112/12/2008ABC a
206/04/2008ABC s
303/25/2010DEF d
402/06/2002DEF

h

h2

112/01/2010ABC ef
212/01/2009ABC dfgg
312/01/2008ABC rgrgrg
412/01/2007ABC thr
512/01/2005ABC thgt

When I tried to do the merge in sql I got the following want:

112/12/2008ABC a ef 01/12/2008
212/12/2008ABC a dfgg 01/12/2008
312/12/2008ABC a rgrgrg 01/12/2008
412/12/2008ABC a thr 01/12/2008
512/12/2008ABC a thgt 01/12/2008
606/04/2008ABC s ef 01/12/2007
706/04/2008ABC s dfgg 01/12/2007
806/04/2008ABC s rgrgrg 01/12/2007
906/04/2008ABC s thr 01/12/2007
1006/04/2008ABC s thgt 01/12/2007
1103/25/2010DEF d ef .
1203/25/2010DEF d dfgg .
1303/25/2010DEF d rgrgrg .
1403/25/2010DEF d thr .
1503/25/2010DEF d thgt .
1602/06/2002DEF h ef .
1702/06/2002DEF h dfgg .
1802/06/2002DEF h rgrgrg .
1902/06/2002DEF h thr .
2002/06/2002DEF h thgt .

How do I get rid of all the unnecessary entries and get what you did (4 rows) but with the two additional columns (1 from h1 and 1 from h2) ?

Thank you

chris2377
Quartz | Level 8

Re: Merging two tables by choosing the CLOSEST dates

I'm not sure if it's going to work, but maybe try to do something like this

data dataset1;

     input date :ddmmyy10. ticker $;

     format date ddmmyy10.;

     cards;

12/12/2008 ABC

4/6/2008   ABC

25/3/2010  DEF

6/2/2002   DEF

;

run;

data dataset1;

  set h1;

  year=year(date);

  eoy=mdy(12,1,year(date)); /*end of year date*/

  format eoy ddmmyy10.;

  dif=abs(intck('day', date, eoy)); /*calculates the number of days between the two dates*/

run;

proc sort data=dataset1;

  by ticker year dif;

run;

data dataset1;

  set dataset1;

  by ticker year;

  if first.year;

run;

In this way you get one observation for each ticker and year and you can merge it with the second dataset by date

art297
Opal | Level 21
Solution

Re: Merging two tables by choosing the CLOSEST dates

Message contains a hyperlink

Here is a slight modification of 's code:

data h1;

     infile cards dlm=',';

     input date :ddmmyy10. ticker $;

     format date ddmmyy10.;

     cards;

12/12/2008,ABC

4/6/2008,ABC

25/3/2010,DEF

6/2/2002,DEF

;

data h2;

     infile cards dlm=',';

     input date :ddmmyy10. ticker $;

     format date ddmmyy10.;

     cards;

1/12/2010,ABC

1/12/2009,ABC

1/12/2008,ABC

1/12/2007,ABC

1/12/2006,ABC

1/12/2010,DEF

1/12/2009,DEF

1/12/2008,DEF

1/12/2007,DEF

1/12/2001,DEF

;

proc sql;

  create table want as

    select a.date,a.ticker, b.date as rpt_dt format=ddmmyy10.

      from h1 a

        left join

          h2 b

            on a.ticker=b.ticker

              where b.date <= a.date

                group by a.date,a.ticker

                  having abs(a.date-b.date)=min(abs(a.date-b.date))

                    order by a.ticker, b.date descending

  ;

quit;

ilikesas
Barite | Level 11

Re: Merging two tables by choosing the CLOSEST dates

Hi Arthur,

I did your modification and now got rid of the of the "multiplicative effect" that I was having before, is it just because you made a left join?

thanks, and here is what my example file looks:

112/12/2008ABC a rgrgrg 01/12/2008
206/04/2008ABC s thr 01/12/2007
302/06/2002DEF h .
403/25/2010DEF d .

my next step will be to try this on my real data!

art297
Opal | Level 21

Re: Merging two tables by choosing the CLOSEST dates

Message contains a hyperlink

Yes, the left join should have removed the multiple record problem. However, the approach was different than 's, thus there were other differences as well. Plus, I see that you already figured out how to add the extra variable.

ilikesas
Barite | Level 11

Re: Merging two tables by choosing the CLOSEST dates

Hi,

I applied Dr Tabachneck's and Hai.kuo's  code to my real data which is 2 tables one having 76 000 rows and the other having 22 000 rows and my computer has been thinking for 3 hours already, is it normal?

Thank you

Reeza
Super User

Re: Merging two tables by choosing the CLOSEST dates

You should include your actual code Otherwise impossible to say. Maybe forget to include quit or a join in your code, who knows :smileyconfused:

ilikesas
Barite | Level 11

Re: Merging two tables by choosing the CLOSEST dates

Hi,

here included is my actual code:

PROC IMPORT OUT= bs8

            DATAFILE= " K:\FilesFormatted\bs8.xlsx"

            DBMS= EXCEL REPLACE;

    sheet = "Sheet1";

     GETNAMES=YES;

     MIXED=NO;

     SCANTEXT=YES;

     USEDATE=YES;

     SCANTIME=YES;

     

RUN;

data bs8;

set bs8;

format date2 ddmmyy10. ;

run;

PROC IMPORT OUT= sdc

            DATAFILE= " K:\FilesFormatted\new sdc.xlsx"

            DBMS= EXCEL REPLACE;

    sheet = "Sheet1";

     GETNAMES=YES;

     MIXED=NO;

     SCANTEXT=YES;

     USEDATE=YES;

     SCANTIME=YES;

      

RUN;

data sdc;

set sdc;

format date_an ddmmyy10. ;

format date_ef ddmmyy10. ;

run;

/*the merging by closest date code*/

proc sql;

  create table sdc2 as

    select *

      from sdc a

        left join

          bs8 b

            on a.acq_ticker=b.ticker1

              where b.date2 <= a.date_an

                group by a.date_an,a.acq_ticker

                  having abs(a.date_an-b.date2)=min(abs(a.date_an-b.date2))

                    order by a.acq_ticker, b.date2 descending

  ;

quit;

art297
Opal | Level 21

Re: Merging two tables by choosing the CLOSEST dates

The whole process should only take a couple of seconds. Have you tried to run the code step-by-step to see if there is a problem in one of the steps?

Can you post your log?

ilikesas
Barite | Level 11

Re: Merging two tables by choosing the CLOSEST dates

Hi,

I did an inner join and after 10 minutes got 1.2 million rows.

I had many duplicates from the missing entries of both tables, because when I dropped these duplicates I was left with only 8k rows!

I guess that when I was doing a left join my computer was calculating even more rows and at some point probably got stuck...

Thank you for your help!!!

Astounding
PROC Star

Re: Merging two tables by choosing the CLOSEST dates

The DATA step way is pretty easy until you start adding additional variables:

proc sort data=events;

   by ticker date;

run;

proc sort data=yearly_reports;

   by ticker end_of_year_report;

run;

data want;

   set yearly_reports (in=in1 rename=(end_of_year_report=date))  events (in=in2);

   by ticker date;

   if in1 then end_of_year_report=date;

   retain end_of_year_report;

   if in2;

run;