Hi, all,
Greetings, and nice to meet you again.
I am trying to merge two datasets. One is a very large mega data of US daily stock returns. The other is a much smaller data containing only important event days for firms.
What I am testing is how a firm's single-day events impact her post-event stock returns on a daily basis. So, by this merging, I am attaching a dummy variable indicating event days to the mega data of daily stock returns.
I have two following datasets:
(1) US daily stock returns
CUSIP | DATE |
00002130 | 8/17/1990 |
00002130 | 8/20/1990 |
00002130 | 8/21/1990 |
00002130 | 8/22/1990 |
00002130 | 8/23/1990 |
00002130 | 8/24/1990 |
00002130 | 8/27/1990 |
…......... | |
00020910 | 3/1/1988 |
00020910 | 3/2/1988 |
00020910 | 3/5/1988 |
00020910 | 3/6/1988 |
…......... | |
00020910 | 4/3/1989 |
00020910 | 4/4/1989 |
00020910 | 4/5/1989 |
00020910 | 4/6/1989 |
…......... |
(2) important event days
CUSIP | date | Event |
00002130 | 8/23/1990 | 1 |
00002130 | 4/17/1992 | 1 |
00002130 | 7/18/1992 | 1 |
00002130 | 6/22/1994 | 1 |
00002130 | 9/21/1994 | 1 |
00002130 | 6/20/1995 | 1 |
00002130 | 12/19/1996 | 1 |
00020910 | 3/3/1988 | 1 |
00020910 | 1/17/1989 | 1 |
00020910 | 4/5/1989 | 1 |
00020910 | 7/11/1990 | 1 |
00020910 | 10/11/1995 | 1 |
…......... |
(3) And, what I want to obtain is as follows:
CUSIP | DATE | Event |
00002130 | 8/17/1990 | |
00002130 | 8/20/1990 | |
00002130 | 8/21/1990 | |
00002130 | 8/22/1990 | |
00002130 | 8/23/1990 | 1 |
00002130 | 8/24/1990 | |
00002130 | 8/27/1990 | |
…......... | ||
00020910 | 3/1/1988 | |
00020910 | 3/2/1988 | |
00020910 | 3/5/1988 | 1 |
00020910 | 3/6/1988 | |
…......... | ||
00020910 | 4/3/1989 | |
00020910 | 4/4/1989 | |
00020910 | 4/5/1989 | 1 |
00020910 | 4/6/1989 | |
…......... |
From the two base datasets (1) and (2), most event days correspond to daily stock returns.
However, there are cases in which, like Cusip 00020910's event on 3/3/1988, the event day does not have its corresponding day of stock returns.
In this case, I want to attach the dummy's value = 1 to the very next observation from the daily stock returns, that is, 3/5/1988. Once again, since I am testing the post-event impact on stock returns, the event dummy should be attached to the NEXT closest observation (i.e., stock returns).
I looked up SAS community posts to figure out how to do this. The best I could find was to use proc sql, by applying On a.cusip=b.cusip where b.date <= a.date group by a.cusip
having abs(a.date-b.date)=min(abs(a.date-b.date)) order by a.cusip, b.date;
However, it didn't work. Please help!
Sincerely,
KS -,
A very small correction:
data want(drop = c d rc);
if _N_ = 1 then do;
dcl hash h(dataset : 'have2(rename = (cusip = c date = d)', ordered : 'Y');
h.definekey('c', 'd');
h.definedone();
dcl hiter i('h');
end;
set have1;
c = ' ';
d = .;
if h.check(key : cusip, key : date) = 0 then do;
pd = date;
h.remove(key : cusip, key : date);
end;
else do;
h.add(key : cusip, key : date, data : cusip, data : date);
if i.setcur(key : cusip, key : date) = 0 then
if i.prev() = 0 & cusip = c & date - d <= 5 then do;
pd = d;
rc = i.last();
rc = i.next();
h.remove(key : cusip, key : pd);
end;
h.remove(key : cusip, key : date);
end;
event = pd > .;
format pd mmddyy10.;
run;
Result:
CUSIP DATE pd event 00002130 08/17/1990 . 0 00002130 08/20/1990 . 0 00002130 08/21/1990 . 0 00002130 08/22/1990 . 0 00002130 08/23/1990 08/23/1990 1 00002130 08/24/1990 . 0 00002130 08/27/1990 . 0 00020910 03/01/1988 . 0 00020910 03/02/1988 . 0 00020910 03/05/1988 03/03/1988 1 00020910 03/06/1988 . 0 00020910 04/03/1989 . 0 00020910 04/04/1989 . 0 00020910 04/05/1989 04/05/1989 1 00020910 04/06/1989 . 0
@KS99 How large is your 'very large' data?
It is daily stock returns of NYSE and other markets over 1986 to 2020.
Has more than 40,000,000 obs. So, I divided them into four subsets to do the task in SAS.
Thank you!
Do not divide your data into smaller portions.
Try something like this. Should scale well.
Feel free to ask 🙂
data have1;
input CUSIP $ DATE : mmddyy10.;
format DATE mmddyy10.;
datalines;
00002130 8/17/1990
00002130 8/20/1990
00002130 8/21/1990
00002130 8/22/1990
00002130 8/23/1990
00002130 8/24/1990
00002130 8/27/1990
00020910 3/1/1988
00020910 3/2/1988
00020910 3/5/1988
00020910 3/6/1988
00020910 4/3/1989
00020910 4/4/1989
00020910 4/5/1989
00020910 4/6/1989
;
data have2;
input CUSIP $ date :mmddyy10. Event;
format DATE mmddyy10.;
datalines;
00002130 8/23/1990 1
00002130 4/17/1992 1
00002130 7/18/1992 1
00002130 6/22/1994 1
00002130 9/21/1994 1
00002130 6/20/1995 1
00002130 12/19/1996 1
00020910 3/3/1988 1
00020910 1/17/1989 1
00020910 4/5/1989 1
00020910 7/11/1990 1
00020910 10/11/1995 1
;
data want(drop = c d rc);
if _N_ = 1 then do;
dcl hash h(dataset : 'have2(rename = (cusip = c date = d)', ordered : 'Y');
h.definekey('c', 'd');
h.definedone();
dcl hiter i('h');
end;
set have1;
c = ' ';
d = .;
if h.check(key : cusip, key : date) = 0 then do;
pd = date;
h.remove(key : cusip, key : date);
end;
else do;
h.add(key : cusip, key : date, data : cusip, data : date);
if i.setcur(key : cusip, key : date) = 0 then
if i.prev() = 0 & cusip = c then do;
pd = d;
rc = i.last();
rc = i.next();
h.remove(key : cusip, key : pd);
end;
h.remove(key : cusip, key : date);
end;
event = pd > .;
format pd mmddyy10.;
run;
Result:
CUSIP DATE pd event 00002130 08/17/1990 . 0 00002130 08/20/1990 . 0 00002130 08/21/1990 . 0 00002130 08/22/1990 . 0 00002130 08/23/1990 08/23/1990 1 00002130 08/24/1990 . 0 00002130 08/27/1990 . 0 00020910 03/01/1988 . 0 00020910 03/02/1988 . 0 00020910 03/05/1988 03/03/1988 1 00020910 03/06/1988 . 0 00020910 04/03/1989 01/17/1989 1 00020910 04/04/1989 . 0 00020910 04/05/1989 04/05/1989 1 00020910 04/06/1989 . 0
Dear Peter,
Wow! Thank you! It works wonderfully.
By the way, Peter, I found my own fault in explaining my desired results.
00020910 04/03/1989 01/17/1989 1
The results your codes produced have such a row. But I want the stock date and event date to be no more than, say, 5 days. What should I add to your codes?
Many many thanks in advance!!
Sincerely,
KS Choi -,
A very small correction:
data want(drop = c d rc);
if _N_ = 1 then do;
dcl hash h(dataset : 'have2(rename = (cusip = c date = d)', ordered : 'Y');
h.definekey('c', 'd');
h.definedone();
dcl hiter i('h');
end;
set have1;
c = ' ';
d = .;
if h.check(key : cusip, key : date) = 0 then do;
pd = date;
h.remove(key : cusip, key : date);
end;
else do;
h.add(key : cusip, key : date, data : cusip, data : date);
if i.setcur(key : cusip, key : date) = 0 then
if i.prev() = 0 & cusip = c & date - d <= 5 then do;
pd = d;
rc = i.last();
rc = i.next();
h.remove(key : cusip, key : pd);
end;
h.remove(key : cusip, key : date);
end;
event = pd > .;
format pd mmddyy10.;
run;
Result:
CUSIP DATE pd event 00002130 08/17/1990 . 0 00002130 08/20/1990 . 0 00002130 08/21/1990 . 0 00002130 08/22/1990 . 0 00002130 08/23/1990 08/23/1990 1 00002130 08/24/1990 . 0 00002130 08/27/1990 . 0 00020910 03/01/1988 . 0 00020910 03/02/1988 . 0 00020910 03/05/1988 03/03/1988 1 00020910 03/06/1988 . 0 00020910 04/03/1989 . 0 00020910 04/04/1989 . 0 00020910 04/05/1989 04/05/1989 1 00020910 04/06/1989 . 0
Oh, no worries, Peter.
I figured it out.
DATE - pd > 5 then delete or something something,
Thank you!
Just posted a corrected version 🙂
Hi, Peter,
Thank you so much for your revised codes.
It works perfectly not only for this case, but also for my subsequent creations.
But soon again, I bumped into another problem.
(I posted it on Community board.) I'd appreciate your help greatly.
Sincerely,
KS -,
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.