SAS Data Management

ilikesas · Posted 11-30-2014 04:36 PM

Hi, I have one file with company tickers and dates of different events, and each company can have several different events.

I have another file with each company's ticker and end of year report, say december 1 of a given year.

The first file looks like this:

date	ticker
12/12/2008	ABC
4/6/2008	ABC
25/3/2010	DEF
6/2/2002	DEF

The Second file looks like this:

End of year report	ticker
1/12/2010	ABC
1/12/2009	ABC
1/12/2008	ABC
1/12/2007	ABC
1/12/2006	ABC

What I want to do is for company ABC on 12/12/2008 in the first file to do a merge with the data from the second file for the same company but the row corresponding to 1/12/2008 because its the closest recent date.

For ABC on 4/6/2008 from the first file I would like to merge with ABC from the second file but row corresponding to 1/12/2007 because its the closest recent date here.

And so on for all the companies and dates

Thank you

art297 · Posted 12-02-2014 11:26 AM

Here is a slight modification of 's code:

data h1;

infile cards dlm=',';

input date :ddmmyy10. ticker $;

format date ddmmyy10.;

cards;

12/12/2008,ABC

4/6/2008,ABC

25/3/2010,DEF

6/2/2002,DEF

;

data h2;

infile cards dlm=',';

input date :ddmmyy10. ticker $;

format date ddmmyy10.;

cards;

1/12/2010,ABC

1/12/2009,ABC

1/12/2008,ABC

1/12/2007,ABC

1/12/2006,ABC

1/12/2010,DEF

1/12/2009,DEF

1/12/2008,DEF

1/12/2007,DEF

1/12/2001,DEF

;

proc sql;

create table want as

select a.date,a.ticker, b.date as rpt_dt format=ddmmyy10.

from h1 a

left join

h2 b

on a.ticker=b.ticker

where b.date <= a.date

group by a.date,a.ticker

having abs(a.date-b.date)=min(abs(a.date-b.date))

order by a.ticker, b.date descending

;

quit;

View solution in original post

Haikuo · Posted 11-30-2014 06:51 PM

The following is NOT the most efficient way, so ask for Hash to speed it up:

data h1;

infile cards dlm='09'x;

input date :ddmmyy10. ticker $;

format date ddmmyy10.;

cards;

12/12/2008 ABC

4/6/2008 ABC

25/3/2010 DEF

6/2/2002 DEF

;

data h2;

infile cards dlm='09'x;

input date :ddmmyy10. ticker $;

format date ddmmyy10.;

cards;

1/12/2010 ABC

1/12/2009 ABC

1/12/2008 ABC

1/12/2007 ABC

1/12/2006 ABC

;

proc sql;

create table want as

select *, (select date from h2 where ticker=a.ticker and date <= a.date having a.date-date=min(a.date-date)) as rpt_dt format=ddmmyy10.

from h1 a

;

quit;

ilikesas · Posted 12-02-2014 09:18 AM

Hi Hai and thank you for replying,

I ran the code that you posted but the result that I got is the following:

1	.	4/6/2008	.
2	.	6/2/2002	.
obs	date	ticker	rpt_dt

what I wish that I could have is for example the row (with all columns) from h1 of say ABC 12/12/2008 merged with the row (with all columns) of ABC 1/12/2008 from h2

Thank you!

Haikuo · Posted 12-02-2014 09:56 AM

when doing "copy and paste" from HTML to text (or vise versa) , sometimes those unprinted symbols are converted (eg. blank vs tab). So in this case, it is Tab converting to Blanks. So go ahead remove "infile" statement.

data h1;

input date :ddmmyy10. ticker $;

format date ddmmyy10.;

cards;

12/12/2008 ABC

4/6/2008 ABC

25/3/2010 DEF

6/2/2002 DEF

;

data h2;

input date :ddmmyy10. ticker $;

format date ddmmyy10.;

cards;

1/12/2010 ABC

1/12/2009 ABC

1/12/2008 ABC

1/12/2007 ABC

1/12/2006 ABC

;

proc sql;

create table want as

select *, (select date from h2 where ticker=a.ticker and date <= a.date having a.date-date=min(a.date-date)) as rpt_dt format=ddmmyy10.

from h1 a

;

quit;

ilikesas · Posted 12-02-2014 11:54 AM

I did the code and it works thanks a lot !!!

Now suppose I have in each file an additional column:

h1

1	12/12/2008	ABC	a
2	06/04/2008	ABC	s
3	03/25/2010	DEF	d
4	02/06/2002	DEF	h

h2

1	12/01/2010	ABC	ef
2	12/01/2009	ABC	dfgg
3	12/01/2008	ABC	rgrgrg
4	12/01/2007	ABC	thr
5	12/01/2005	ABC	thgt

When I tried to do the merge in sql I got the following want:

1	12/12/2008	ABC	a	ef	01/12/2008
2	12/12/2008	ABC	a	dfgg	01/12/2008
3	12/12/2008	ABC	a	rgrgrg	01/12/2008
4	12/12/2008	ABC	a	thr	01/12/2008
5	12/12/2008	ABC	a	thgt	01/12/2008
6	06/04/2008	ABC	s	ef	01/12/2007
7	06/04/2008	ABC	s	dfgg	01/12/2007
8	06/04/2008	ABC	s	rgrgrg	01/12/2007
9	06/04/2008	ABC	s	thr	01/12/2007
10	06/04/2008	ABC	s	thgt	01/12/2007
11	03/25/2010	DEF	d	ef	.
12	03/25/2010	DEF	d	dfgg	.
13	03/25/2010	DEF	d	rgrgrg	.
14	03/25/2010	DEF	d	thr	.
15	03/25/2010	DEF	d	thgt	.
16	02/06/2002	DEF	h	ef	.
17	02/06/2002	DEF	h	dfgg	.
18	02/06/2002	DEF	h	rgrgrg	.
19	02/06/2002	DEF	h	thr	.
20	02/06/2002	DEF	h	thgt	.

How do I get rid of all the unnecessary entries and get what you did (4 rows) but with the two additional columns (1 from h1 and 1 from h2) ?

Thank you

chris2377 · Posted 12-02-2014 09:47 AM

I'm not sure if it's going to work, but maybe try to do something like this

data dataset1;

input date :ddmmyy10. ticker $;

format date ddmmyy10.;

cards;

12/12/2008 ABC

4/6/2008 ABC

25/3/2010 DEF

6/2/2002 DEF

;

run;

data dataset1;

set h1;

year=year(date);

eoy=mdy(12,1,year(date)); /*end of year date*/

format eoy ddmmyy10.;

dif=abs(intck('day', date, eoy)); /*calculates the number of days between the two dates*/

run;

proc sort data=dataset1;

by ticker year dif;

run;

data dataset1;

set dataset1;

by ticker year;

if first.year;

run;

In this way you get one observation for each ticker and year and you can merge it with the second dataset by date

art297 · Posted 12-02-2014 11:26 AM

Here is a slight modification of 's code:

data h1;

infile cards dlm=',';

input date :ddmmyy10. ticker $;

format date ddmmyy10.;

cards;

12/12/2008,ABC

4/6/2008,ABC

25/3/2010,DEF

6/2/2002,DEF

;

data h2;

infile cards dlm=',';

input date :ddmmyy10. ticker $;

format date ddmmyy10.;

cards;

1/12/2010,ABC

1/12/2009,ABC

1/12/2008,ABC

1/12/2007,ABC

1/12/2006,ABC

1/12/2010,DEF

1/12/2009,DEF

1/12/2008,DEF

1/12/2007,DEF

1/12/2001,DEF

;

proc sql;

create table want as

select a.date,a.ticker, b.date as rpt_dt format=ddmmyy10.

from h1 a

left join

h2 b

on a.ticker=b.ticker

where b.date <= a.date

group by a.date,a.ticker

having abs(a.date-b.date)=min(abs(a.date-b.date))

order by a.ticker, b.date descending

;

quit;

ilikesas · Posted 12-02-2014 12:31 PM

Hi Arthur,

I did your modification and now got rid of the of the "multiplicative effect" that I was having before, is it just because you made a left join?

thanks, and here is what my example file looks:

1	12/12/2008	ABC	a	rgrgrg	01/12/2008
2	06/04/2008	ABC	s	thr	01/12/2007
3	02/06/2002	DEF	h		.
4	03/25/2010	DEF	d		.

my next step will be to try this on my real data!

art297 · Posted 12-02-2014 04:15 PM

Yes, the left join should have removed the multiple record problem. However, the approach was different than 's, thus there were other differences as well. Plus, I see that you already figured out how to add the extra variable.

ilikesas · Posted 12-21-2014 02:55 PM

Hi,

I applied Dr Tabachneck's and Hai.kuo's code to my real data which is 2 tables one having 76 000 rows and the other having 22 000 rows and my computer has been thinking for 3 hours already, is it normal?

Thank you

Reeza · Posted 12-21-2014 04:36 PM

You should include your actual code Otherwise impossible to say. Maybe forget to include quit or a join in your code, who knows :smileyconfused:

ilikesas · Posted 12-21-2014 04:55 PM

Hi,

here included is my actual code:

PROC IMPORT OUT= bs8

DATAFILE= " K:\FilesFormatted\bs8.xlsx"

DBMS= EXCEL REPLACE;

sheet = "Sheet1";

GETNAMES=YES;

MIXED=NO;

SCANTEXT=YES;

USEDATE=YES;

SCANTIME=YES;

RUN;

data bs8;

set bs8;

format date2 ddmmyy10. ;

run;

PROC IMPORT OUT= sdc

DATAFILE= " K:\FilesFormatted\new sdc.xlsx"

DBMS= EXCEL REPLACE;

sheet = "Sheet1";

GETNAMES=YES;

MIXED=NO;

SCANTEXT=YES;

USEDATE=YES;

SCANTIME=YES;

RUN;

data sdc;

set sdc;

format date_an ddmmyy10. ;

format date_ef ddmmyy10. ;

run;

/*the merging by closest date code*/

proc sql;

create table sdc2 as

select *

from sdc a

left join

bs8 b

on a.acq_ticker=b.ticker1

where b.date2 <= a.date_an

group by a.date_an,a.acq_ticker

having abs(a.date_an-b.date2)=min(abs(a.date_an-b.date2))

order by a.acq_ticker, b.date2 descending

;

quit;

art297 · Posted 12-21-2014 05:22 PM

The whole process should only take a couple of seconds. Have you tried to run the code step-by-step to see if there is a problem in one of the steps?

Can you post your log?

ilikesas · Posted 12-21-2014 07:18 PM

Hi,

I did an inner join and after 10 minutes got 1.2 million rows.

I had many duplicates from the missing entries of both tables, because when I dropped these duplicates I was left with only 8k rows!

I guess that when I was doing a left join my computer was calculating even more rows and at some point probably got stuck...

Thank you for your help!!!

Astounding · Posted 12-02-2014 11:50 AM

The DATA step way is pretty easy until you start adding additional variables:

proc sort data=events;

by ticker date;

run;

proc sort data=yearly_reports;

by ticker end_of_year_report;

run;

data want;

set yearly_reports (in=in1 rename=(end_of_year_report=date)) events (in=in2);

by ticker date;

if in1 then end_of_year_report=date;

retain end_of_year_report;

if in2;

run;

SAS Data Management

Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Re: Merging two tables by choosing the CLOSEST dates

Follow Us

What is...

SAS Data Management

Our biggest data and AI event of the year.

Follow Us

What is...