Hi, SAS community!
Urgently I need your help.
I have stock data with firm identifier (=CUSIP), stock date (=Date), and daily returns (=return).
d indicates an event day.
data have;
input CUSIP $ DATE :mmddyy10. return d;
format date mmddyy10.;
datalines;
0001241 6/28/1988 0.003 .
0001241 6/29/1988 0 .
0001241 6/30/1988 0 .
0001241 7/1/1988 0.018 .
0001241 7/4/1988 0.007 .
0001241 7/5/1988 0.003 .
0001241 7/6/1988 0.010 .
0001241 7/7/1988 0.031 .
0001241 7/8/1988 0.003 .
0001241 7/11/1988 0 .
0001241 7/12/1988 -0.017 .
0001241 7/13/1988 -0.006 .
0001241 7/14/1988 -0.027 .
0001241 7/15/1988 0.003 .
0001241 7/18/1988 -0.028 .
0001241 7/19/1988 0.011 .
0001241 7/20/1988 0 .
0001241 7/21/1988 0 .
0001241 7/22/1988 -0.010 .
0001241 7/25/1988 0.011 .
0001241 7/26/1988 0.010 .
0001241 7/27/1988 0 .
0001241 7/28/1988 0.010 .
0001241 7/29/1988 0.017 .
0001241 8/1/1988 0 .
0001241 8/2/1988 0 .
0001241 8/3/1988 -0.010 .
0001241 8/4/1988 -0.007 .
0001241 8/5/1988 0 .
0001241 8/8/1988 0 .
0001241 8/9/1988 -0.003 .
0001241 8/10/1988 -0.014 .
0001241 8/11/1988 0 .
0001241 8/12/1988 0 .
0001241 8/15/1988 -0.003 .
0001241 8/16/1988 -0.007 .
0001241 8/17/1988 0 .
0001241 8/18/1988 0 .
0001241 8/19/1988 0 .
0001241 8/22/1988 0 .
0001241 8/23/1988 -0.003 .
0001241 8/24/1988 -0.003 .
0001241 8/25/1988 -0.011 .
0001241 8/26/1988 -0.026 .
0001241 8/29/1988 0 .
0001241 8/30/1988 -0.015 .
0001241 8/31/1988 0.015 .
0001241 9/1/1988 -0.003 .
0001241 9/2/1988 0.007 .
0001241 9/5/1988 0.003 .
0001241 9/6/1988 0.003 .
0001241 9/7/1988 0 .
0001241 9/8/1988 -0.015 .
0001241 9/9/1988 -0.007 .
0001241 9/12/1988 0.003 .
0001241 9/13/1988 0.003 .
0001241 9/14/1988 0 .
0001241 9/15/1988 0.023 .
0001241 9/16/1988 -0.018 .
0001241 9/19/1988 -0.015 .
0001241 9/20/1988 -0.011 .
0001241 9/21/1988 0.003 .
0001241 9/22/1988 -0.007 .
0001241 9/23/1988 0.003 .
0001241 9/26/1988 -0.003 .
0001241 9/27/1988 0 .
0001241 9/28/1988 0 .
0001241 9/29/1988 0.003 .
0001241 9/30/1988 0.003 .
0001241 10/3/1988 -0.003 .
0001241 10/4/1988 -0.007 .
0001241 10/5/1988 0.003 .
0001241 10/6/1988 0 .
0001241 10/7/1988 0.003 .
0001241 10/10/1988 0 .
0001241 10/11/1988 0 .
0001241 10/12/1988 -0.003 .
0001241 10/13/1988 0 .
0001241 10/14/1988 0 .
0001241 10/17/1988 0 .
0001241 10/18/1988 0 .
0001241 10/19/1988 0.015 .
0001241 10/20/1988 -0.003 .
0001241 10/21/1988 0.003 .
0001241 10/24/1988 0.011 .
0001241 10/25/1988 0.015 .
0001241 10/26/1988 0.019 .
0001241 10/27/1988 -0.013 .
0001241 10/28/1988 0.005 .
0001241 10/31/1988 0.011 .
0001241 11/1/1988 0.011 .
0001241 11/2/1988 -0.007 .
0001241 11/3/1988 -0.003 .
0001241 11/4/1988 -0.003 .
0001241 11/7/1988 0 .
0001241 11/8/1988 0.003 .
0001241 11/9/1988 -0.003 .
0001241 11/10/1988 0 .
0001241 11/11/1988 0 .
0001241 11/14/1988 0.003 .
0001241 11/15/1988 0.022 .
0001241 11/16/1988 0.043 .
0001241 11/17/1988 0.010 .
0001241 11/18/1988 -0.013 .
0001241 11/21/1988 0 .
0001241 11/22/1988 0 .
0001241 11/23/1988 0 .
0001241 11/24/1988 -0.006 .
0001241 11/25/1988 -0.014 .
0001241 11/28/1988 0 .
0001241 11/29/1988 0 .
0001241 11/30/1988 0.017 .
0001241 12/1/1988 -0.010 .
0001241 12/2/1988 -0.003 .
0001241 12/5/1988 0 .
0001241 12/6/1988 -0.007 .
0001241 12/7/1988 -0.021 .
0001241 12/8/1988 -0.029 .
0001241 12/9/1988 -0.011 .
0001241 12/12/1988 -0.003 .
0001241 12/13/1988 0.003 .
0001241 12/14/1988 0.003 .
0001241 12/15/1988 0 .
0001241 12/16/1988 0 .
0001241 12/19/1988 0 .
0001241 12/20/1988 0.019 .
0001241 12/21/1988 0 .
0001241 12/22/1988 0.003 .
0001241 12/23/1988 0 .
0001241 12/26/1988 0 .
0001241 12/27/1988 0 .
0001241 12/28/1988 0.003 .
0001241 12/29/1988 0.018 .
0001241 12/30/1988 0 .
0001241 1/2/1989 0 .
0001241 1/3/1989 0.029 1
0001241 1/4/1989 0 .
0001241 1/5/1989 0 .
0001241 1/6/1989 0 .
0001241 1/9/1989 0.007 .
0001241 1/10/1989 0.014 .
0001241 1/11/1989 0 .
0001241 1/12/1989 0.006 .
0001241 1/13/1989 0.010 .
0001241 1/16/1989 -0.02 .
0001241 1/17/1989 0 .
0001241 1/18/1989 0.013 .
0001241 1/19/1989 0.013 .
0001241 1/20/1989 0.013 .
0001241 1/23/1989 0.013 .
0001241 1/24/1989 0.016 .
0001241 1/25/1989 0.009 .
0001241 1/26/1989 0.003 .
0001241 1/27/1989 0.025 .
0001241 1/30/1989 0.006 1
0001241 1/31/1989 0 .
0001241 2/1/1989 0 .
;
What I want to achieve is, to extract observations from -120th day up until -21st day from the date designated by d=1.
(Then, I intend calculate means and standard deviations of returns BY CUSIP and 100 days over each [-120,-21]).
There is one condition for what I want to achieve:
In the exemplary dataset above, we have two event days (d=1) on 1/3/1989
and 1/30/1989
.
These two dates are close. Thus, in a dataset of extractions of (-120,-21), many duplicate dates can exist because (-120,-21) from 1/3/1989 and (-120,-21) from 1/30/1989 can share many overlapping days.
For this, I used my SAS codes that I obtained from a kind member of SAS community.
It runs like this:
data want_120_21;
start=stop+1;
retain stop 0;
do stop=stop+1 by 1 until(last.CUSIP);
set have end=eof nobs=nobs; by CUSIP; end;
put (_n_ nobs sedol start stop ) (=);
tradeno=0;
do obs=start to stop;
set have;
if d=1 then do; tradeno+1;
tradedate=date; format tradedate date9.;
do p=max(obs-120,start) to min(obs-21,stop); offset=p-obs;
set have point=p;
output;
end;
end;
end;
run;
I used to use this code to create the windows like (+2,+17), (+2,+126), etc.
But when I apply this code to create the windows (-120,-21), it doesn't work. Keeps producing "Disk is full."
I really appreciate your help,
thanks in advance!!
Sincerely,
KS -
It really helps to show what you expect the output data set needs to look like.
You should have your example data in the form of data step code. I believe that has been mentioned more than once. Barring that use text pasted into a text box opened on the forum with the </> icon. I had to do a lot of editing of strange characters to make a data step that worked from that formatted table.
Example:
data have; input CUSIP $ DATE :mmddyy10. return d; format date mmddyy10.; datalines; 001855 7/16/1991 0.5 . 001855 7/19/1991 0.025 . 001855 7/22/1991 0.006 . 001855 7/23/1991 0.076 . 001855 7/24/1991 0.001 . 001855 7/25/1991 0.008 . 001855 7/29/1991 0.0015 . 001855 7/30/1991 0.00016 1 001855 8/1/1991 0.0015 . 001855 8/2/1991 0.0016 . 001855 8/5/1991 0.019 . ;
You do not very clearly define what (-120, -3) actually means. Is it days or is it number of records? Your existing code seems to be using number of records and not days.
You also do not say exactly what memory is "full". I suspect that because of the following line that your LOG window gets full of text. You have potentially a a very large number of loops even with a moderate to small data set to start with
put (_n_ nobs CUSIP start stop ) (=);
Since your example data does not have anywhere near enough records to do -120 of anything here is a way to get -5 to -3 RECORDS. I am assuming because the definitions of -120 and such are not provided that you mean within the same cusip group.
data helper; set have; where d=1; firstrec = lineno -5; if firstrec <0 then firstrec=1; lastrec = lineno -3; run; proc sql; create table want_5_3 as select a.cusip, a.date as eventdate ,b.date as tradedate, b.return from helper as a left join have as b on a.cusip=b.cusip and a.firstrec le b.lineno le a.lastrec order by a.cusip,a.date,b.date ; quit;
This approach would require you to add the lineno, or other variable name to indicate which row of the data set each record belongs on AFTER sorting by Cusip and date.
One of the very likely problems with your code is that if the range of values you request could actually be looking for number of records that do not exist at all because the -120 records, as in your example data would be before the first record.
The SQL uses a range of records calculated for each event date. Note, this should be have with negative or positive numbers just use a +3 or whatever.
IF however -120 is supposed to a date interval then this should work. This again uses -5 and -3 as bounds for Date range:
proc sql; create table want as select a.cusip, a.date as eventdate, b.date as tradedate, b.return from (select * from have where d=1) as a left join have as b on a.cusip=b.cusip and (a.date - 5) le b.date le (a.date - 3) ; quit;
If neither of these approaches yield what you would expect for -5 and -3 then provide what you expect. Please do not use ranges of values that cannot be exercised with your example data.
If you actually write and keep code looking like your example then you are up for ugliest code I have seen on this forum this calendar year, and possibly several before that. Indenting in your friend. I had to indent your code to have a chance of even guessing what was done.
data want_120_21; start=stop+1; retain stop 0; do stop=stop+1 by 1 until(last.CUSIP); set have end=eof nobs=nobs; by CUSIP; end; put (_n_ nobs CUSIP start stop ) (=); tradeno=0; do obs=start to stop; set have; if d=1 then do; tradeno+1; tradedate=DATE; format tradedate date9.; do p=max(obs-120,start) to min(obs-21,stop); offset=p-obs; set have point=p; output; end; end; end; run;
It really helps to show what you expect the output data set needs to look like.
You should have your example data in the form of data step code. I believe that has been mentioned more than once. Barring that use text pasted into a text box opened on the forum with the </> icon. I had to do a lot of editing of strange characters to make a data step that worked from that formatted table.
Example:
data have; input CUSIP $ DATE :mmddyy10. return d; format date mmddyy10.; datalines; 001855 7/16/1991 0.5 . 001855 7/19/1991 0.025 . 001855 7/22/1991 0.006 . 001855 7/23/1991 0.076 . 001855 7/24/1991 0.001 . 001855 7/25/1991 0.008 . 001855 7/29/1991 0.0015 . 001855 7/30/1991 0.00016 1 001855 8/1/1991 0.0015 . 001855 8/2/1991 0.0016 . 001855 8/5/1991 0.019 . ;
You do not very clearly define what (-120, -3) actually means. Is it days or is it number of records? Your existing code seems to be using number of records and not days.
You also do not say exactly what memory is "full". I suspect that because of the following line that your LOG window gets full of text. You have potentially a a very large number of loops even with a moderate to small data set to start with
put (_n_ nobs CUSIP start stop ) (=);
Since your example data does not have anywhere near enough records to do -120 of anything here is a way to get -5 to -3 RECORDS. I am assuming because the definitions of -120 and such are not provided that you mean within the same cusip group.
data helper; set have; where d=1; firstrec = lineno -5; if firstrec <0 then firstrec=1; lastrec = lineno -3; run; proc sql; create table want_5_3 as select a.cusip, a.date as eventdate ,b.date as tradedate, b.return from helper as a left join have as b on a.cusip=b.cusip and a.firstrec le b.lineno le a.lastrec order by a.cusip,a.date,b.date ; quit;
This approach would require you to add the lineno, or other variable name to indicate which row of the data set each record belongs on AFTER sorting by Cusip and date.
One of the very likely problems with your code is that if the range of values you request could actually be looking for number of records that do not exist at all because the -120 records, as in your example data would be before the first record.
The SQL uses a range of records calculated for each event date. Note, this should be have with negative or positive numbers just use a +3 or whatever.
IF however -120 is supposed to a date interval then this should work. This again uses -5 and -3 as bounds for Date range:
proc sql; create table want as select a.cusip, a.date as eventdate, b.date as tradedate, b.return from (select * from have where d=1) as a left join have as b on a.cusip=b.cusip and (a.date - 5) le b.date le (a.date - 3) ; quit;
If neither of these approaches yield what you would expect for -5 and -3 then provide what you expect. Please do not use ranges of values that cannot be exercised with your example data.
If you actually write and keep code looking like your example then you are up for ugliest code I have seen on this forum this calendar year, and possibly several before that. Indenting in your friend. I had to indent your code to have a chance of even guessing what was done.
data want_120_21; start=stop+1; retain stop 0; do stop=stop+1 by 1 until(last.CUSIP); set have end=eof nobs=nobs; by CUSIP; end; put (_n_ nobs CUSIP start stop ) (=); tradeno=0; do obs=start to stop; set have; if d=1 then do; tradeno+1; tradedate=DATE; format tradedate date9.; do p=max(obs-120,start) to min(obs-21,stop); offset=p-obs; set have point=p; output; end; end; end; run;
As @ballardw remarked, your code really needs some formatting. And it may also be right that the actual problem is that the log window runs full, so you may want to get rid of the PUT statement.
Apart from that, I think your code should work, if you are looking at number of records, and not number of days. What it basically does is to find the first and last observation for a CUSIP, and then loop through it with POINT=.
If you want to speed things up, reading the table twice is not necessary when collecting data purely in the past, and not in the future - I think this will work as well:
data want_120_21;
set have(rename=(date=tradedate));
by CUSIP;
if first.CUSIP then do;
start=_n_;
tradeno=0;
end;
retain start tradeno;
if d=1;
if _n_>=start-21;
tradeno+1;
do p=max(_n_-120,start) to _n_-21;
offset=p-_n_;
set have point=p;
output;
end;
run;
But of course this means that you will have different code for looking back and looking forward. It may be better to use your original code - without the PUT statement. And you can probably speed things up a little by using KEEP= and RENAME= in your SET statements:
data want_120_21;
start=stop+1;
retain stop 0;
do stop=stop+1 by 1 until(last.CUSIP);
set have(keep=CUSIP);
by CUSIP;
end;
tradeno=0;
do obs=start to stop;
set have(rename=(date=tradedate));
if d=1 then do;
tradeno+1;
do p=max(obs-120,start) to min(obs-21,stop);
offset=p-obs;
set have point=p;
output;
end;
end;
end;
run;
Thank you s_lassen,
I tried your code and it perfectly works,
Thank you!
KS -,
data have; infile cards expandtabs truncover; input CUSIP DATE :mmddyy12. return d; format date mmddyy10.; cards; 001855 7/16/1991 0.5 001855 7/19/1991 0.025 001855 7/22/1991 0.006 001855 7/23/1991 0.076 001855 7/24/1991 0.001 001855 7/25/1991 0.008 001855 7/29/1991 0.0015 001855 7/30/1991 0.00016 1 001855 8/1/1991 0.0015 001855 8/2/1991 0.0016 001855 8/5/1991 0.019 ; data key; set have(rename=(date=_date)); if d=1 then do; do date=_date-120 to _date-21; output; end; end; keep CUSIP DATE; format date mmddyy10.; run; data want; if _n_=1 then do; declare hash h(dataset:'key'); h.definekey('CUSIP','DATE'); h.definedone(); end; set have; if h.check()=0; run;
Fine answers, absolutely, ... but remarkable that everyone solves it with data step and / or PROC SQL.
I would immediately think of PROC TIMEDATA (SAS/ETS).
PROC TIMEDATA to me is the data step for time series data, because it allows you to look backwards and forwards in the time series.
Koen
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.