Hi, I have one file with company tickers and dates of different events, and each company can have several different events.
I have another file with each company's ticker and end of year report, say december 1 of a given year.
The first file looks like this:
date | ticker |
---|---|
12/12/2008 | ABC |
4/6/2008 | ABC |
25/3/2010 | DEF |
6/2/2002 | DEF |
The Second file looks like this:
End of year report | ticker |
---|---|
1/12/2010 | ABC |
1/12/2009 | ABC |
1/12/2008 | ABC |
1/12/2007 | ABC |
1/12/2006 | ABC |
What I want to do is for company ABC on 12/12/2008 in the first file to do a merge with the data from the second file for the same company but the row corresponding to 1/12/2008 because its the closest recent date.
For ABC on 4/6/2008 from the first file I would like to merge with ABC from the second file but row corresponding to 1/12/2007 because its the closest recent date here.
And so on for all the companies and dates
Thank you
Here is a slight modification of 's code:
data h1;
infile cards dlm=',';
input date :ddmmyy10. ticker $;
format date ddmmyy10.;
cards;
12/12/2008,ABC
4/6/2008,ABC
25/3/2010,DEF
6/2/2002,DEF
;
data h2;
infile cards dlm=',';
input date :ddmmyy10. ticker $;
format date ddmmyy10.;
cards;
1/12/2010,ABC
1/12/2009,ABC
1/12/2008,ABC
1/12/2007,ABC
1/12/2006,ABC
1/12/2010,DEF
1/12/2009,DEF
1/12/2008,DEF
1/12/2007,DEF
1/12/2001,DEF
;
proc sql;
create table want as
select a.date,a.ticker, b.date as rpt_dt format=ddmmyy10.
from h1 a
left join
h2 b
on a.ticker=b.ticker
where b.date <= a.date
group by a.date,a.ticker
having abs(a.date-b.date)=min(abs(a.date-b.date))
order by a.ticker, b.date descending
;
quit;
The following is NOT the most efficient way, so ask for Hash to speed it up:
data h1;
infile cards dlm='09'x;
input date :ddmmyy10. ticker $;
format date ddmmyy10.;
cards;
12/12/2008 ABC
4/6/2008 ABC
25/3/2010 DEF
6/2/2002 DEF
;
data h2;
infile cards dlm='09'x;
input date :ddmmyy10. ticker $;
format date ddmmyy10.;
cards;
1/12/2010 ABC
1/12/2009 ABC
1/12/2008 ABC
1/12/2007 ABC
1/12/2006 ABC
;
proc sql;
create table want as
select *, (select date from h2 where ticker=a.ticker and date <= a.date having a.date-date=min(a.date-date)) as rpt_dt format=ddmmyy10.
from h1 a
;
quit;
Hi Hai and thank you for replying,
I ran the code that you posted but the result that I got is the following:
1 | . | 4/6/2008 | . |
---|---|---|---|
2 | . | 6/2/2002 | . |
obs | date | ticker | rpt_dt |
what I wish that I could have is for example the row (with all columns) from h1 of say ABC 12/12/2008 merged with the row (with all columns) of ABC 1/12/2008 from h2
Thank you!
when doing "copy and paste" from HTML to text (or vise versa) , sometimes those unprinted symbols are converted (eg. blank vs tab). So in this case, it is Tab converting to Blanks. So go ahead remove "infile" statement.
data h1;
input date :ddmmyy10. ticker $;
format date ddmmyy10.;
cards;
12/12/2008 ABC
4/6/2008 ABC
25/3/2010 DEF
6/2/2002 DEF
;
data h2;
input date :ddmmyy10. ticker $;
format date ddmmyy10.;
cards;
1/12/2010 ABC
1/12/2009 ABC
1/12/2008 ABC
1/12/2007 ABC
1/12/2006 ABC
;
proc sql;
create table want as
select *, (select date from h2 where ticker=a.ticker and date <= a.date having a.date-date=min(a.date-date)) as rpt_dt format=ddmmyy10.
from h1 a
;
quit;
I did the code and it works thanks a lot !!!
Now suppose I have in each file an additional column:
h1
1 | 12/12/2008 | ABC | a |
---|---|---|---|
2 | 06/04/2008 | ABC | s |
3 | 03/25/2010 | DEF | d |
4 | 02/06/2002 | DEF | h |
h2
1 | 12/01/2010 | ABC | ef |
---|---|---|---|
2 | 12/01/2009 | ABC | dfgg |
3 | 12/01/2008 | ABC | rgrgrg |
4 | 12/01/2007 | ABC | thr |
5 | 12/01/2005 | ABC | thgt |
When I tried to do the merge in sql I got the following want:
1 | 12/12/2008 | ABC | a | ef | 01/12/2008 |
---|---|---|---|---|---|
2 | 12/12/2008 | ABC | a | dfgg | 01/12/2008 |
3 | 12/12/2008 | ABC | a | rgrgrg | 01/12/2008 |
4 | 12/12/2008 | ABC | a | thr | 01/12/2008 |
5 | 12/12/2008 | ABC | a | thgt | 01/12/2008 |
6 | 06/04/2008 | ABC | s | ef | 01/12/2007 |
7 | 06/04/2008 | ABC | s | dfgg | 01/12/2007 |
8 | 06/04/2008 | ABC | s | rgrgrg | 01/12/2007 |
9 | 06/04/2008 | ABC | s | thr | 01/12/2007 |
10 | 06/04/2008 | ABC | s | thgt | 01/12/2007 |
11 | 03/25/2010 | DEF | d | ef | . |
12 | 03/25/2010 | DEF | d | dfgg | . |
13 | 03/25/2010 | DEF | d | rgrgrg | . |
14 | 03/25/2010 | DEF | d | thr | . |
15 | 03/25/2010 | DEF | d | thgt | . |
16 | 02/06/2002 | DEF | h | ef | . |
17 | 02/06/2002 | DEF | h | dfgg | . |
18 | 02/06/2002 | DEF | h | rgrgrg | . |
19 | 02/06/2002 | DEF | h | thr | . |
20 | 02/06/2002 | DEF | h | thgt | . |
How do I get rid of all the unnecessary entries and get what you did (4 rows) but with the two additional columns (1 from h1 and 1 from h2) ?
Thank you
I'm not sure if it's going to work, but maybe try to do something like this
data dataset1;
input date :ddmmyy10. ticker $;
format date ddmmyy10.;
cards;
12/12/2008 ABC
4/6/2008 ABC
25/3/2010 DEF
6/2/2002 DEF
;
run;
data dataset1;
set h1;
year=year(date);
eoy=mdy(12,1,year(date)); /*end of year date*/
format eoy ddmmyy10.;
dif=abs(intck('day', date, eoy)); /*calculates the number of days between the two dates*/
run;
proc sort data=dataset1;
by ticker year dif;
run;
data dataset1;
set dataset1;
by ticker year;
if first.year;
run;
In this way you get one observation for each ticker and year and you can merge it with the second dataset by date
Here is a slight modification of 's code:
data h1;
infile cards dlm=',';
input date :ddmmyy10. ticker $;
format date ddmmyy10.;
cards;
12/12/2008,ABC
4/6/2008,ABC
25/3/2010,DEF
6/2/2002,DEF
;
data h2;
infile cards dlm=',';
input date :ddmmyy10. ticker $;
format date ddmmyy10.;
cards;
1/12/2010,ABC
1/12/2009,ABC
1/12/2008,ABC
1/12/2007,ABC
1/12/2006,ABC
1/12/2010,DEF
1/12/2009,DEF
1/12/2008,DEF
1/12/2007,DEF
1/12/2001,DEF
;
proc sql;
create table want as
select a.date,a.ticker, b.date as rpt_dt format=ddmmyy10.
from h1 a
left join
h2 b
on a.ticker=b.ticker
where b.date <= a.date
group by a.date,a.ticker
having abs(a.date-b.date)=min(abs(a.date-b.date))
order by a.ticker, b.date descending
;
quit;
Hi Arthur,
I did your modification and now got rid of the of the "multiplicative effect" that I was having before, is it just because you made a left join?
thanks, and here is what my example file looks:
1 | 12/12/2008 | ABC | a | rgrgrg | 01/12/2008 |
---|---|---|---|---|---|
2 | 06/04/2008 | ABC | s | thr | 01/12/2007 |
3 | 02/06/2002 | DEF | h | . | |
4 | 03/25/2010 | DEF | d | . |
my next step will be to try this on my real data!
Hi,
I applied Dr Tabachneck's and Hai.kuo's code to my real data which is 2 tables one having 76 000 rows and the other having 22 000 rows and my computer has been thinking for 3 hours already, is it normal?
Thank you
You should include your actual code Otherwise impossible to say. Maybe forget to include quit or a join in your code, who knows :smileyconfused:
Hi,
here included is my actual code:
PROC IMPORT OUT= bs8
DATAFILE= " K:\FilesFormatted\bs8.xlsx"
DBMS= EXCEL REPLACE;
sheet = "Sheet1";
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
data bs8;
set bs8;
format date2 ddmmyy10. ;
run;
PROC IMPORT OUT= sdc
DATAFILE= " K:\FilesFormatted\new sdc.xlsx"
DBMS= EXCEL REPLACE;
sheet = "Sheet1";
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
data sdc;
set sdc;
format date_an ddmmyy10. ;
format date_ef ddmmyy10. ;
run;
/*the merging by closest date code*/
proc sql;
create table sdc2 as
select *
from sdc a
left join
bs8 b
on a.acq_ticker=b.ticker1
where b.date2 <= a.date_an
group by a.date_an,a.acq_ticker
having abs(a.date_an-b.date2)=min(abs(a.date_an-b.date2))
order by a.acq_ticker, b.date2 descending
;
quit;
The whole process should only take a couple of seconds. Have you tried to run the code step-by-step to see if there is a problem in one of the steps?
Can you post your log?
Hi,
I did an inner join and after 10 minutes got 1.2 million rows.
I had many duplicates from the missing entries of both tables, because when I dropped these duplicates I was left with only 8k rows!
I guess that when I was doing a left join my computer was calculating even more rows and at some point probably got stuck...
Thank you for your help!!!
The DATA step way is pretty easy until you start adding additional variables:
proc sort data=events;
by ticker date;
run;
proc sort data=yearly_reports;
by ticker end_of_year_report;
run;
data want;
set yearly_reports (in=in1 rename=(end_of_year_report=date)) events (in=in2);
by ticker date;
if in1 then end_of_year_report=date;
retain end_of_year_report;
if in2;
run;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.