SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Merging two tables by choosing the CLOSEST dates

Accepted Solution Solved
Reply
Super Contributor
Posts: 441
Accepted Solution

Merging two tables by choosing the CLOSEST dates

Hi, I have one file with company tickers and dates of different events, and each company can have several different events.

I have another file with each company's ticker and end of year report, say december 1 of a given year.

The first file looks like this:

dateticker
12/12/2008ABC
4/6/2008ABC
25/3/2010DEF
6/2/2002DEF

The Second file looks like this:

End of year reportticker              
1/12/2010ABC
1/12/2009ABC
1/12/2008ABC
1/12/2007ABC
1/12/2006ABC

What I want to do is for company ABC on 12/12/2008 in the first file to do a merge with the data from the second file for the same company but the row corresponding to 1/12/2008 because its the closest recent date.

For ABC on 4/6/2008 from the first file I would like to merge with ABC from the second file but row corresponding to 1/12/2007 because its the closest recent date here.

And so on for all the companies and dates

Thank you


Accepted Solutions
Solution
‎12-02-2014 11:26 AM
PROC Star
Posts: 7,468

Re: Merging two tables by choosing the CLOSEST dates

Here is a slight modification of 's code:

data h1;

     infile cards dlm=',';

     input date :ddmmyy10. ticker $;

     format date ddmmyy10.;

     cards;

12/12/2008,ABC

4/6/2008,ABC

25/3/2010,DEF

6/2/2002,DEF

;

data h2;

     infile cards dlm=',';

     input date :ddmmyy10. ticker $;

     format date ddmmyy10.;

     cards;

1/12/2010,ABC

1/12/2009,ABC

1/12/2008,ABC

1/12/2007,ABC

1/12/2006,ABC

1/12/2010,DEF

1/12/2009,DEF

1/12/2008,DEF

1/12/2007,DEF

1/12/2001,DEF

;

proc sql;

  create table want as

    select a.date,a.ticker, b.date as rpt_dt format=ddmmyy10.

      from h1 a

        left join

          h2 b

            on a.ticker=b.ticker

              where b.date <= a.date

                group by a.date,a.ticker

                  having abs(a.date-b.date)=min(abs(a.date-b.date))

                    order by a.ticker, b.date descending

  ;

quit;

View solution in original post


All Replies
Respected Advisor
Posts: 3,156

Re: Merging two tables by choosing the CLOSEST dates

The following is NOT the most efficient way, so ask for Hash to speed it up:

data h1;

     infile cards dlm='09'x;

     input date :ddmmyy10. ticker $;

     format date ddmmyy10.;

     cards;

12/12/2008 ABC

4/6/2008   ABC

25/3/2010  DEF

6/2/2002   DEF

;

data h2;

     infile cards dlm='09'x;

     input date :ddmmyy10. ticker $;

     format date ddmmyy10.;

     cards;

1/12/2010  ABC

1/12/2009  ABC

1/12/2008  ABC

1/12/2007  ABC

1/12/2006  ABC

;

proc sql;

     create table want as

           select *, (select date from h2 where ticker=a.ticker and date <= a.date having a.date-date=min(a.date-date)) as rpt_dt format=ddmmyy10.

                from h1 a

     ;

quit;

Super Contributor
Posts: 441

Re: Merging two tables by choosing the CLOSEST dates

Hi Hai and thank you for replying,

I ran the code that you posted but the result that I got is the following:

1.4/6/2008.
2.6/2/2002.
obsdatetickerrpt_dt

what I wish that I could have is for example the row (with all columns) from h1 of say ABC 12/12/2008 merged with the row (with all columns) of ABC 1/12/2008 from h2

Thank you!


Respected Advisor
Posts: 3,156

Re: Merging two tables by choosing the CLOSEST dates

when doing "copy and paste" from HTML to text (or vise versa) , sometimes those unprinted symbols are converted (eg. blank vs tab). So in this case, it is Tab converting to Blanks. So go ahead remove "infile" statement.

data h1;

  input date :ddmmyy10. ticker $;

  format date ddmmyy10.;

  cards;

12/12/2008 ABC

4/6/2008 ABC

25/3/2010 DEF

6/2/2002 DEF

;

data h2;

  input date :ddmmyy10. ticker $;

  format date ddmmyy10.;

  cards;

1/12/2010 ABC

1/12/2009 ABC

1/12/2008 ABC

1/12/2007 ABC

1/12/2006 ABC

;

proc sql;

  create table want as

  select *, (select date from h2 where ticker=a.ticker and date <= a.date having a.date-date=min(a.date-date)) as rpt_dt format=ddmmyy10.

  from h1 a

  ;

quit;

Super Contributor
Posts: 441

Re: Merging two tables by choosing the CLOSEST dates

I did the code and it works thanks a lot !!!

Now suppose I have in each file an additional column:

h1

112/12/2008ABC a
206/04/2008ABC s
303/25/2010DEF d
402/06/2002DEF

h

h2

112/01/2010ABC ef
212/01/2009ABC dfgg
312/01/2008ABC rgrgrg
412/01/2007ABC thr
512/01/2005ABC thgt

When I tried to do the merge in sql I got the following want:

112/12/2008ABC a ef 01/12/2008
212/12/2008ABC a dfgg 01/12/2008
312/12/2008ABC a rgrgrg 01/12/2008
412/12/2008ABC a thr 01/12/2008
512/12/2008ABC a thgt 01/12/2008
606/04/2008ABC s ef 01/12/2007
706/04/2008ABC s dfgg 01/12/2007
806/04/2008ABC s rgrgrg 01/12/2007
906/04/2008ABC s thr 01/12/2007
1006/04/2008ABC s thgt 01/12/2007
1103/25/2010DEF d ef .
1203/25/2010DEF d dfgg .
1303/25/2010DEF d rgrgrg .
1403/25/2010DEF d thr .
1503/25/2010DEF d thgt .
1602/06/2002DEF h ef .
1702/06/2002DEF h dfgg .
1802/06/2002DEF h rgrgrg .
1902/06/2002DEF h thr .
2002/06/2002DEF h thgt .

How do I get rid of all the unnecessary entries and get what you did (4 rows) but with the two additional columns (1 from h1 and 1 from h2) ?

Thank you

Contributor
Posts: 60

Re: Merging two tables by choosing the CLOSEST dates

I'm not sure if it's going to work, but maybe try to do something like this

data dataset1;

     input date :ddmmyy10. ticker $;

     format date ddmmyy10.;

     cards;

12/12/2008 ABC

4/6/2008   ABC

25/3/2010  DEF

6/2/2002   DEF

;

run;

data dataset1;

  set h1;

  year=year(date);

  eoy=mdy(12,1,year(date)); /*end of year date*/

  format eoy ddmmyy10.;

  dif=abs(intck('day', date, eoy)); /*calculates the number of days between the two dates*/

run;

proc sort data=dataset1;

  by ticker year dif;

run;

data dataset1;

  set dataset1;

  by ticker year;

  if first.year;

run;

In this way you get one observation for each ticker and year and you can merge it with the second dataset by date

Solution
‎12-02-2014 11:26 AM
PROC Star
Posts: 7,468

Re: Merging two tables by choosing the CLOSEST dates

Here is a slight modification of 's code:

data h1;

     infile cards dlm=',';

     input date :ddmmyy10. ticker $;

     format date ddmmyy10.;

     cards;

12/12/2008,ABC

4/6/2008,ABC

25/3/2010,DEF

6/2/2002,DEF

;

data h2;

     infile cards dlm=',';

     input date :ddmmyy10. ticker $;

     format date ddmmyy10.;

     cards;

1/12/2010,ABC

1/12/2009,ABC

1/12/2008,ABC

1/12/2007,ABC

1/12/2006,ABC

1/12/2010,DEF

1/12/2009,DEF

1/12/2008,DEF

1/12/2007,DEF

1/12/2001,DEF

;

proc sql;

  create table want as

    select a.date,a.ticker, b.date as rpt_dt format=ddmmyy10.

      from h1 a

        left join

          h2 b

            on a.ticker=b.ticker

              where b.date <= a.date

                group by a.date,a.ticker

                  having abs(a.date-b.date)=min(abs(a.date-b.date))

                    order by a.ticker, b.date descending

  ;

quit;

Super Contributor
Posts: 441

Re: Merging two tables by choosing the CLOSEST dates

Hi Arthur,

I did your modification and now got rid of the of the "multiplicative effect" that I was having before, is it just because you made a left join?

thanks, and here is what my example file looks:

112/12/2008ABC a rgrgrg 01/12/2008
206/04/2008ABC s thr 01/12/2007
302/06/2002DEF h .
403/25/2010DEF d .

my next step will be to try this on my real data!

PROC Star
Posts: 7,468

Re: Merging two tables by choosing the CLOSEST dates

Yes, the left join should have removed the multiple record problem. However, the approach was different than 's, thus there were other differences as well. Plus, I see that you already figured out how to add the extra variable.

Super Contributor
Posts: 441

Re: Merging two tables by choosing the CLOSEST dates

Hi,

I applied Dr Tabachneck's and Hai.kuo's  code to my real data which is 2 tables one having 76 000 rows and the other having 22 000 rows and my computer has been thinking for 3 hours already, is it normal?

Thank you

Super User
Posts: 19,772

Re: Merging two tables by choosing the CLOSEST dates

You should include your actual code Otherwise impossible to say. Maybe forget to include quit or a join in your code, who knows :smileyconfused:

Super Contributor
Posts: 441

Re: Merging two tables by choosing the CLOSEST dates

Hi,

here included is my actual code:

PROC IMPORT OUT= bs8

            DATAFILE= " K:\FilesFormatted\bs8.xlsx"

            DBMS= EXCEL REPLACE;

    sheet = "Sheet1";

     GETNAMES=YES;

     MIXED=NO;

     SCANTEXT=YES;

     USEDATE=YES;

     SCANTIME=YES;

     

RUN;

data bs8;

set bs8;

format date2 ddmmyy10. ;

run;

PROC IMPORT OUT= sdc

            DATAFILE= " K:\FilesFormatted\new sdc.xlsx"

            DBMS= EXCEL REPLACE;

    sheet = "Sheet1";

     GETNAMES=YES;

     MIXED=NO;

     SCANTEXT=YES;

     USEDATE=YES;

     SCANTIME=YES;

      

RUN;

data sdc;

set sdc;

format date_an ddmmyy10. ;

format date_ef ddmmyy10. ;

run;

/*the merging by closest date code*/

proc sql;

  create table sdc2 as

    select *

      from sdc a

        left join

          bs8 b

            on a.acq_ticker=b.ticker1

              where b.date2 <= a.date_an

                group by a.date_an,a.acq_ticker

                  having abs(a.date_an-b.date2)=min(abs(a.date_an-b.date2))

                    order by a.acq_ticker, b.date2 descending

  ;

quit;

PROC Star
Posts: 7,468

Re: Merging two tables by choosing the CLOSEST dates

The whole process should only take a couple of seconds. Have you tried to run the code step-by-step to see if there is a problem in one of the steps?

Can you post your log?

Super Contributor
Posts: 441

Re: Merging two tables by choosing the CLOSEST dates

Hi,

I did an inner join and after 10 minutes got 1.2 million rows.

I had many duplicates from the missing entries of both tables, because when I dropped these duplicates I was left with only 8k rows!

I guess that when I was doing a left join my computer was calculating even more rows and at some point probably got stuck...

Thank you for your help!!!

Super User
Posts: 5,498

Re: Merging two tables by choosing the CLOSEST dates

The DATA step way is pretty easy until you start adding additional variables:

proc sort data=events;

   by ticker date;

run;

proc sort data=yearly_reports;

   by ticker end_of_year_report;

run;

data want;

   set yearly_reports (in=in1 rename=(end_of_year_report=date))  events (in=in2);

   by ticker date;

   if in1 then end_of_year_report=date;

   retain end_of_year_report;

   if in2;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 3801 views
  • 3 likes
  • 6 in conversation