BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
KS99
Obsidian | Level 7

Hi, SAS community! 

 

Urgently I need your help. 

I have stock data with firm identifier (=CUSIP), stock date (=Date), and daily returns (=return). 

d indicates an event day.

data have;
 input CUSIP $ DATE :mmddyy10. return d;
 format date mmddyy10.;
datalines;
0001241	6/28/1988	0.003 .
0001241	6/29/1988	0	.
0001241	6/30/1988	0	.
0001241	7/1/1988	0.018 .
0001241	7/4/1988	0.007 .
0001241	7/5/1988	0.003 .	
0001241	7/6/1988	0.010 .	
0001241	7/7/1988	0.031 .	
0001241	7/8/1988	0.003 .	
0001241	7/11/1988	0 .	
0001241	7/12/1988	-0.017 .	
0001241	7/13/1988	-0.006 .	
0001241	7/14/1988	-0.027 .	
0001241	7/15/1988	0.003 .	
0001241	7/18/1988	-0.028 .	
0001241	7/19/1988	0.011 .	
0001241	7/20/1988	0 .	
0001241	7/21/1988	0 .	
0001241	7/22/1988	-0.010 .	
0001241	7/25/1988	0.011 .	
0001241	7/26/1988	0.010 .	
0001241	7/27/1988	0 .	
0001241	7/28/1988	0.010 .
0001241	7/29/1988	0.017 .
0001241	8/1/1988	0 .
0001241	8/2/1988	0 .
0001241	8/3/1988	-0.010 .
0001241	8/4/1988	-0.007 .	
0001241	8/5/1988	0 .	
0001241	8/8/1988	0 .	
0001241	8/9/1988	-0.003 .	
0001241	8/10/1988	-0.014 .	
0001241	8/11/1988	0 .	
0001241	8/12/1988	0 .	
0001241	8/15/1988	-0.003 .	
0001241	8/16/1988	-0.007 .	
0001241	8/17/1988	0 .	
0001241	8/18/1988	0 .	
0001241	8/19/1988	0 .	
0001241	8/22/1988	0 .	
0001241	8/23/1988	-0.003 .	
0001241	8/24/1988	-0.003 .	
0001241	8/25/1988	-0.011 .	
0001241	8/26/1988	-0.026 .	
0001241	8/29/1988	0 .	
0001241	8/30/1988	-0.015 .
0001241	8/31/1988	0.015 .	
0001241	9/1/1988	-0.003 .	
0001241	9/2/1988	0.007 .	
0001241	9/5/1988	0.003 .	
0001241	9/6/1988	0.003 .	
0001241	9/7/1988	0 .	
0001241	9/8/1988	-0.015 .
0001241	9/9/1988	-0.007 .	
0001241	9/12/1988	0.003 .	
0001241	9/13/1988	0.003 .	
0001241	9/14/1988	0 .	
0001241	9/15/1988	0.023 .	
0001241	9/16/1988	-0.018 .	
0001241	9/19/1988	-0.015 .	
0001241	9/20/1988	-0.011 .	
0001241	9/21/1988	0.003 .	
0001241	9/22/1988	-0.007 .	
0001241	9/23/1988	0.003 .	
0001241	9/26/1988	-0.003 .	
0001241	9/27/1988	0 .	
0001241	9/28/1988	0 .	
0001241	9/29/1988	0.003 .	
0001241	9/30/1988	0.003 .	
0001241	10/3/1988	-0.003 .	
0001241	10/4/1988	-0.007 .	
0001241	10/5/1988	0.003 .	
0001241	10/6/1988	0 .	
0001241	10/7/1988	0.003 .	
0001241	10/10/1988	0 .	
0001241	10/11/1988	0 .	
0001241	10/12/1988	-0.003 .	
0001241	10/13/1988	0 .	
0001241	10/14/1988	0 .	
0001241	10/17/1988	0 .	
0001241	10/18/1988	0 .	
0001241	10/19/1988	0.015 .	
0001241	10/20/1988	-0.003 .	
0001241	10/21/1988	0.003 .	
0001241	10/24/1988	0.011 .	
0001241	10/25/1988	0.015 .	
0001241	10/26/1988	0.019 .	
0001241	10/27/1988	-0.013 .	
0001241	10/28/1988	0.005 .	
0001241	10/31/1988	0.011 .	
0001241	11/1/1988	0.011 .	
0001241	11/2/1988	-0.007 .	
0001241	11/3/1988	-0.003 .	
0001241	11/4/1988	-0.003 .	
0001241	11/7/1988	0 .	
0001241	11/8/1988	0.003 .	
0001241	11/9/1988	-0.003 .	
0001241	11/10/1988	0 .	
0001241	11/11/1988	0 .	
0001241	11/14/1988	0.003 .	
0001241	11/15/1988	0.022 .	
0001241	11/16/1988	0.043 .	
0001241	11/17/1988	0.010 .	
0001241	11/18/1988	-0.013 .	
0001241	11/21/1988	0 .	
0001241	11/22/1988	0 .	
0001241	11/23/1988	0 .	
0001241	11/24/1988	-0.006 .	
0001241	11/25/1988	-0.014 .	
0001241	11/28/1988	0 .	
0001241	11/29/1988	0 .	
0001241	11/30/1988	0.017 .	
0001241	12/1/1988	-0.010 .	
0001241	12/2/1988	-0.003 .	
0001241	12/5/1988	0 .	
0001241	12/6/1988	-0.007 .	
0001241	12/7/1988	-0.021 .	
0001241	12/8/1988	-0.029 .	
0001241	12/9/1988	-0.011 .	
0001241	12/12/1988	-0.003 .	
0001241	12/13/1988	0.003 .	
0001241	12/14/1988	0.003 .	
0001241	12/15/1988	0 .	
0001241	12/16/1988	0 .	
0001241	12/19/1988	0 .	
0001241	12/20/1988	0.019 .	
0001241	12/21/1988	0 .	
0001241	12/22/1988	0.003 .	
0001241	12/23/1988	0 .	
0001241	12/26/1988	0 .	
0001241	12/27/1988	0 .	
0001241	12/28/1988	0.003 .	
0001241	12/29/1988	0.018 .	
0001241	12/30/1988	0 .	
0001241	1/2/1989	0 .	
0001241	1/3/1989	0.029	1
0001241	1/4/1989	0 .	
0001241	1/5/1989	0 .	
0001241	1/6/1989	0 .	
0001241	1/9/1989	0.007 .	
0001241	1/10/1989	0.014 .	
0001241	1/11/1989	0 .	
0001241	1/12/1989	0.006 .	
0001241	1/13/1989	0.010 .	
0001241	1/16/1989	-0.02 .	
0001241	1/17/1989	0 .	
0001241	1/18/1989	0.013 .	
0001241	1/19/1989	0.013 .	
0001241	1/20/1989	0.013 .	
0001241	1/23/1989	0.013 .	
0001241	1/24/1989	0.016 .	
0001241	1/25/1989	0.009 .	
0001241	1/26/1989	0.003 .	
0001241	1/27/1989	0.025 .	
0001241	1/30/1989	0.006 1
0001241	1/31/1989	0 .	
0001241	2/1/1989	0 .	
; 

What I want to achieve is, to extract observations from -120th day up until -21st day from the date designated by d=1. 

(Then, I intend calculate means and standard deviations of returns BY CUSIP and 100 days over each [-120,-21]). 

There is one condition for what I want to achieve: 

In the exemplary dataset above, we have two event days (d=1) on 1/3/1989 and 1/30/1989

These two dates are close. Thus, in a dataset of extractions of (-120,-21), many duplicate dates can exist because (-120,-21) from 1/3/1989 and (-120,-21) from 1/30/1989 can share many overlapping days. 

 

For this, I used my SAS codes that I obtained from a kind member of SAS community. 

It runs like this: 

data want_120_21; 
start=stop+1;
retain stop 0;
do stop=stop+1 by 1 until(last.CUSIP);
set have end=eof nobs=nobs; by CUSIP; end;
put (_n_ nobs sedol start stop ) (=);
tradeno=0;
do obs=start to stop;
set have;
if d=1 then do; tradeno+1;
tradedate=date; format tradedate date9.;
do p=max(obs-120,start) to min(obs-21,stop); offset=p-obs;
set have point=p;
output;
end;
end;
end;
run;

I used to use this code to create the windows like (+2,+17), (+2,+126), etc.  

But when I apply this code to create the windows (-120,-21), it doesn't work. Keeps producing "Disk is full." 

 

I really appreciate your help, 

thanks in advance!! 

 

Sincerely, 

 

KS - 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

It really helps to show what you expect the output data set needs to look like.

You should have your example data in the form of data step code. I believe that has been mentioned more than once. Barring that use text pasted into a text box opened on the forum with the </> icon. I had to do a lot of editing of strange characters to make a data step that worked from that formatted table.

Example:

data have;
 input CUSIP $ DATE :mmddyy10. return d;
 format date mmddyy10.;
datalines;
001855 7/16/1991 0.5 .
001855 7/19/1991 0.025 .
001855 7/22/1991 0.006 .
001855 7/23/1991 0.076 .
001855 7/24/1991 0.001 .
001855 7/25/1991 0.008 .
001855 7/29/1991 0.0015 .
001855 7/30/1991 0.00016 1
001855 8/1/1991 0.0015 .
001855 8/2/1991 0.0016 .
001855 8/5/1991 0.019 .
;

You do not very clearly define what (-120, -3) actually means. Is it days or is it number of records? Your existing code seems to be using number of records and not days.

 

You also do not say exactly what memory is "full".  I suspect that because of the following line that your LOG window gets full of text. You have potentially a a very large number of loops even with a moderate to small data set to start with

   put (_n_ nobs CUSIP start stop ) (=); 

Since your example data does not have anywhere near enough records to do -120 of anything here is a way to get -5 to -3 RECORDS. I am assuming because the definitions of -120 and such are not provided that you mean within the same cusip group.

 

data helper;
  set have;
  where d=1;
  firstrec = lineno -5;
  if firstrec <0 then firstrec=1;
  lastrec = lineno -3;
run;

proc sql;
   create table want_5_3 as
   select a.cusip, a.date as eventdate
          ,b.date as tradedate, b.return
   from helper as a
        left join 
        have as b
        on a.cusip=b.cusip
          and a.firstrec le b.lineno le a.lastrec
   order by a.cusip,a.date,b.date
   ;
 quit;

This approach would require you to add the lineno, or other variable name to indicate which row of the data set each record belongs on AFTER sorting by Cusip and date.

One of the very likely problems with your code is that if the range of values you request could actually be looking for number of records that do not exist at all because the -120 records, as in your example data would be before the first record.

The SQL uses a range of records calculated for each event date. Note, this should be have with negative or positive numbers just use a +3 or whatever.

 

IF however -120 is supposed to a date interval then this should work. This again uses -5 and -3 as bounds for Date range:

proc sql;
   create table want as
   select a.cusip, a.date as eventdate,
          b.date as tradedate, b.return
   from (select * from have where d=1) as a
        left join
        have as b
        on a.cusip=b.cusip
          and (a.date - 5) le b.date le (a.date - 3)
   ;
quit;

If neither of these approaches yield what you would expect for -5 and -3 then provide what you expect. Please do not use ranges of values that cannot be exercised with your example data.

 

 

 

If you actually write and keep code looking like your example then you are up for ugliest code I have seen on this forum this calendar year, and possibly several before that. Indenting in your friend. I had to indent your code to have a chance of even guessing what was done.

data want_120_21;
   start=stop+1; 
   retain stop 0; 
   do stop=stop+1 by 1 until(last.CUSIP);
      set have  end=eof  nobs=nobs; 
      by CUSIP; 
   end;
   put (_n_ nobs CUSIP start stop ) (=); 
   tradeno=0; 
   do obs=start to stop; 
      set have; 
      if d=1 then do; 
         tradeno+1;
         tradedate=DATE; 
         format tradedate date9.;
         do p=max(obs-120,start) to min(obs-21,stop); 
            offset=p-obs;
            set have point=p; 
            output; 
         end; 
      end; 
   end; 
run; 

View solution in original post

8 REPLIES 8
ballardw
Super User

It really helps to show what you expect the output data set needs to look like.

You should have your example data in the form of data step code. I believe that has been mentioned more than once. Barring that use text pasted into a text box opened on the forum with the </> icon. I had to do a lot of editing of strange characters to make a data step that worked from that formatted table.

Example:

data have;
 input CUSIP $ DATE :mmddyy10. return d;
 format date mmddyy10.;
datalines;
001855 7/16/1991 0.5 .
001855 7/19/1991 0.025 .
001855 7/22/1991 0.006 .
001855 7/23/1991 0.076 .
001855 7/24/1991 0.001 .
001855 7/25/1991 0.008 .
001855 7/29/1991 0.0015 .
001855 7/30/1991 0.00016 1
001855 8/1/1991 0.0015 .
001855 8/2/1991 0.0016 .
001855 8/5/1991 0.019 .
;

You do not very clearly define what (-120, -3) actually means. Is it days or is it number of records? Your existing code seems to be using number of records and not days.

 

You also do not say exactly what memory is "full".  I suspect that because of the following line that your LOG window gets full of text. You have potentially a a very large number of loops even with a moderate to small data set to start with

   put (_n_ nobs CUSIP start stop ) (=); 

Since your example data does not have anywhere near enough records to do -120 of anything here is a way to get -5 to -3 RECORDS. I am assuming because the definitions of -120 and such are not provided that you mean within the same cusip group.

 

data helper;
  set have;
  where d=1;
  firstrec = lineno -5;
  if firstrec <0 then firstrec=1;
  lastrec = lineno -3;
run;

proc sql;
   create table want_5_3 as
   select a.cusip, a.date as eventdate
          ,b.date as tradedate, b.return
   from helper as a
        left join 
        have as b
        on a.cusip=b.cusip
          and a.firstrec le b.lineno le a.lastrec
   order by a.cusip,a.date,b.date
   ;
 quit;

This approach would require you to add the lineno, or other variable name to indicate which row of the data set each record belongs on AFTER sorting by Cusip and date.

One of the very likely problems with your code is that if the range of values you request could actually be looking for number of records that do not exist at all because the -120 records, as in your example data would be before the first record.

The SQL uses a range of records calculated for each event date. Note, this should be have with negative or positive numbers just use a +3 or whatever.

 

IF however -120 is supposed to a date interval then this should work. This again uses -5 and -3 as bounds for Date range:

proc sql;
   create table want as
   select a.cusip, a.date as eventdate,
          b.date as tradedate, b.return
   from (select * from have where d=1) as a
        left join
        have as b
        on a.cusip=b.cusip
          and (a.date - 5) le b.date le (a.date - 3)
   ;
quit;

If neither of these approaches yield what you would expect for -5 and -3 then provide what you expect. Please do not use ranges of values that cannot be exercised with your example data.

 

 

 

If you actually write and keep code looking like your example then you are up for ugliest code I have seen on this forum this calendar year, and possibly several before that. Indenting in your friend. I had to indent your code to have a chance of even guessing what was done.

data want_120_21;
   start=stop+1; 
   retain stop 0; 
   do stop=stop+1 by 1 until(last.CUSIP);
      set have  end=eof  nobs=nobs; 
      by CUSIP; 
   end;
   put (_n_ nobs CUSIP start stop ) (=); 
   tradeno=0; 
   do obs=start to stop; 
      set have; 
      if d=1 then do; 
         tradeno+1;
         tradedate=DATE; 
         format tradedate date9.;
         do p=max(obs-120,start) to min(obs-21,stop); 
            offset=p-obs;
            set have point=p; 
            output; 
         end; 
      end; 
   end; 
run; 
KS99
Obsidian | Level 7
Dear Ballardw,
Thank you so much for your exhaustive, detailed explanations.
I remember you helped me a few times in the past.
I really appreciate it.
Next time when I upload a question, I will keep the guidelines and formatting.

KS -,
s_lassen
Meteorite | Level 14

As @ballardw remarked, your code really needs some formatting. And it may also be right that the actual problem is that the log window runs full, so you may want to get rid of the PUT statement.

 

Apart from that, I think your code should work, if you are looking at number of records, and not number of days. What it basically does is to find the first and last observation for a CUSIP, and then loop through it with POINT=.

 

If you want to speed things up, reading the table twice is not necessary when collecting data purely in the past, and not in the future - I think this will work as well:

data want_120_21;
  set have(rename=(date=tradedate)); 
  by CUSIP;
  if first.CUSIP then do;
    start=_n_;
	tradeno=0;
	end;
  retain start tradeno;
  if d=1;
  if _n_>=start-21;
  tradeno+1;
  do p=max(_n_-120,start) to _n_-21;
    offset=p-_n_;
    set have point=p; 
    output; 
    end;  
run;

But of course this means that you will have different code for looking back and looking forward. It may be better to use your original code - without the PUT statement. And you can probably speed things up a little by using KEEP= and RENAME= in your SET statements:

data want_120_21;
  start=stop+1; 
  retain stop 0; 
  do stop=stop+1 by 1 until(last.CUSIP);
    set have(keep=CUSIP); 
    by CUSIP; 
    end;
  tradeno=0; 
  do obs=start to stop; 
    set have(rename=(date=tradedate)); 
	if d=1 then do; 
	  tradeno+1;
      do p=max(obs-120,start) to min(obs-21,stop);
        offset=p-obs;
        set have point=p; 
	    output; 
	    end; 
	  end; 
    end; 
run;  

 

KS99
Obsidian | Level 7

Thank you s_lassen, 

I tried your code and it perfectly works, 

Thank you! 

 

KS -, 

 

 

Ksharp
Super User
data have;
infile cards expandtabs truncover;
input CUSIP DATE :mmddyy12. return d;
format date mmddyy10.;
cards;
001855 7/16/1991 0.5  
001855 7/19/1991 0.025  
001855 7/22/1991 0.006  
001855 7/23/1991 0.076  
001855 7/24/1991 0.001  
001855 7/25/1991 0.008  
001855 7/29/1991 0.0015  
001855 7/30/1991 0.00016 1
001855 8/1/1991 0.0015  
001855 8/2/1991 0.0016  
001855 8/5/1991 0.019  
;
data key;
 set have(rename=(date=_date));
 if d=1 then do;
   do date=_date-120 to _date-21;
     output;
   end;
 end;
 keep  CUSIP DATE;
 format date mmddyy10.;
run;
data want;
 if _n_=1 then do;
  declare hash h(dataset:'key');
  h.definekey('CUSIP','DATE');
  h.definedone();
 end;
set have;
if h.check()=0;
run;
KS99
Obsidian | Level 7
Thank you, KSharp,
I do appreciate your codes.
I will keep them and use them for a later context.

Have a good weekend!

KS -,
sbxkoenk
SAS Super FREQ

Fine answers, absolutely, ... but remarkable that everyone solves it with data step and / or PROC SQL.
I would immediately think of PROC TIMEDATA (SAS/ETS).
PROC TIMEDATA to me is the data step for time series data, because it allows you to look backwards and forwards in the time series.

 

Koen

 
KS99
Obsidian | Level 7
Oh, is it?
If PROC TIMEDATA is faster than PROC SQL, then I will study it to use it.

Many thanks!

KS -,

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 894 views
  • 4 likes
  • 5 in conversation