BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
KS99
Obsidian | Level 7

Hi, SAS Community, 

 

I am sorry if I am bothering you guys with many questions today. 

But your help is much more effective than my own creation of hours and hours. 

 

I am dealing with a large dataset of US daily stock returns. A variable "Trading Day" indicates specific days in which firm insiders traded their stocks. 

DATE CUSIP Trading Day
9/21/1990 00002130  
9/24/1990 00002130  
9/25/1990 00002130  
9/26/1990 00002130 1
9/27/1990 00002130  
9/28/1990 00002130 1
10/1/1990 00002130  
10/2/1990 00002130  
10/3/1990 00002130  
10/4/1990 00002130  
10/5/1990 00002130  
10/8/1990 00002130  
10/9/1990 00002130  
10/10/1990 00002130  
10/11/1990 00002130  
10/12/1990 00002130  
10/15/1990 00002130  
10/16/1990 00002130  
10/17/1990 00002130  
10/18/1990 00002130 1
10/19/1990 00002130  
10/22/1990 00002130 1
10/23/1990 00002130 1
10/24/1990 00002130  
10/25/1990 00002130  
10/26/1990 00002130 1
10/29/1990 00002130  
10/30/1990 00002130 1
10/31/1990 00002130 1
…......

 

Now, I want to extract days +2 to +22 from the specific date Trading Day=1 (when insiders traded). For this, I used the following codes: 

 

data Temp; 
set Us_stock; 
BY CUSIP; 
if TradingDay>. then do p=max(_n_+2,1,p+1) to min(_n_+22,nobs); 
        set Us_stock point=p nobs=nobs; 
        output; 
end; 
run; 

 

But what I obtained is the following: 

DATE CUSIP Cluster
9/28/1990 00002130 1
10/1/1990 00002130 1
10/2/1990 00002130 1
10/3/1990 00002130 1
10/4/1990 00002130 1
10/5/1990 00002130 1
10/8/1990 00002130 1
10/9/1990 00002130 1
10/10/1990 00002130 1
10/11/1990 00002130 1
10/12/1990 00002130 1
10/15/1990 00002130 1
10/16/1990 00002130 1
10/17/1990 00002130 1
10/18/1990 00002130 1
10/19/1990 00002130 1
10/22/1990 00002130 1
10/23/1990 00002130 1
10/24/1990 00002130 1
10/25/1990 00002130 1
10/26/1990 00002130 1
10/30/1990 00002130 2
11/1/1990 00002130 2
11/2/1990 00002130 2
11/5/1990 00002130 2
11/6/1990 00002130 2
11/7/1990 00002130 2
11/8/1990 00002130 2
11/9/1990 00002130 2
11/12/1990 00002130 2
…......

 

Never mind the variable "Cluster." I created it to do proc means by group. 

For the trading day 09/26/1990 (Cusip: 00002130), I have successfully extracted the days from 09/28/1990 to 10/26/1990. And, for the next trade on 09/28/1990, I am supposed to have 10/02/1990 to 10/30/1990. However, my codes is giving me something else, from 10/30/1990 ~. How should I get out of this? Should I abandon "p=max() to min()"? 

 

Many thanks in advance!! 

 

Sincerely, 

KS -, 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So I would recommend first finding the first/last point= value for each CUSIP.

Then you can use those in your calculations of the range of observations to replicated.

You probably also want to keep something to indicate which particular "trade day" you results are for.

First let's adjust your data to have more than one CUSIP value.  I also shortened you "TRADE DAY" column header into use DAY for the variable name.

data have;
  infile cards truncover;
  input DATE :mmddyy. CUSIP $ Day ;
  format date date9.;
cards;
9/21/1990 00002130  
9/24/1990 00002130  
9/25/1990 00002130  
9/26/1990 00002130 1
9/27/1990 00002130  
9/28/1990 00002130 1
10/1/1990 00002130  
10/2/1990 00002130  
10/3/1990 00002130  
10/4/1990 00002130  
10/5/1990 00002130  
10/8/1990 00002130  
10/9/1990 00002130  
10/10/1990 00002130  
10/11/1990 00002130  
10/12/1990 00002130  
10/15/1990 00002130  
10/16/1990 00002130  
10/17/1990 00002130  
10/18/1990 00002130 1
10/19/1990 00002130  
10/22/1990 00002130 1
10/23/1990 00002130 1
10/24/1990 AAAAAAAA
10/25/1990 AAAAAAAA  
10/26/1990 AAAAAAAA 1
10/29/1990 AAAAAAAA  
10/30/1990 AAAAAAAA 1
10/31/1990 AAAAAAAA 1
;

Now we can use a double DOW loop.  The first will find the start and stop record number (point= values) for each CUSIP.  Then the second will look for the DAY=1 records and then output multiple observations relative to that date.  I used a range of 5 observations from -2 to +2 just for illustration.


data want;
* find first/last obs for CUSIP ;
  start=stop+1;
  retain stop 0;
do stop=stop+1 by 1 until(last.cusip);
  set have end=eof nobs=nobs;
  by cusip;
end;
put (_n_ nobs cusip start stop ) (=);
* Look for trading days ;
tradeno=0;
do obs=start to stop;
  set have;
  if day=1 then do;
    tradeno+1;
    tradedate=date;
    format tradedate date9.;
    do p=max(obs-2,start) to min(obs+2,stop);
      offset=p-obs;
      set have point=p;
      output;
    end;
  end;
end;
run; 

Results:

Obs    start    stop         DATE     CUSIP      Day    tradeno    obs    tradedate    offset

  1       1      23     24SEP1990    00002130     .        1         4    26SEP1990      -2
  2       1      23     25SEP1990    00002130     .        1         4    26SEP1990      -1
  3       1      23     26SEP1990    00002130     1        1         4    26SEP1990       0
  4       1      23     27SEP1990    00002130     .        1         4    26SEP1990       1
  5       1      23     28SEP1990    00002130     1        1         4    26SEP1990       2
  6       1      23     26SEP1990    00002130     1        2         6    28SEP1990      -2
  7       1      23     27SEP1990    00002130     .        2         6    28SEP1990      -1
  8       1      23     28SEP1990    00002130     1        2         6    28SEP1990       0
  9       1      23     01OCT1990    00002130     .        2         6    28SEP1990       1
 10       1      23     02OCT1990    00002130     .        2         6    28SEP1990       2
 11       1      23     16OCT1990    00002130     .        3        20    18OCT1990      -2
 12       1      23     17OCT1990    00002130     .        3        20    18OCT1990      -1
 13       1      23     18OCT1990    00002130     1        3        20    18OCT1990       0
 14       1      23     19OCT1990    00002130     .        3        20    18OCT1990       1
 15       1      23     22OCT1990    00002130     1        3        20    18OCT1990       2
 16       1      23     18OCT1990    00002130     1        4        22    22OCT1990      -2
 17       1      23     19OCT1990    00002130     .        4        22    22OCT1990      -1
 18       1      23     22OCT1990    00002130     1        4        22    22OCT1990       0
 19       1      23     23OCT1990    00002130     1        4        22    22OCT1990       1
 20       1      23     19OCT1990    00002130     .        5        23    23OCT1990      -2
 21       1      23     22OCT1990    00002130     1        5        23    23OCT1990      -1
 22       1      23     23OCT1990    00002130     1        5        23    23OCT1990       0
 23      24      29     24OCT1990    AAAAAAAA     .        1        26    26OCT1990      -2
 24      24      29     25OCT1990    AAAAAAAA     .        1        26    26OCT1990      -1
 25      24      29     26OCT1990    AAAAAAAA     1        1        26    26OCT1990       0
 26      24      29     29OCT1990    AAAAAAAA     .        1        26    26OCT1990       1
 27      24      29     30OCT1990    AAAAAAAA     1        1        26    26OCT1990       2
 28      24      29     26OCT1990    AAAAAAAA     1        2        28    30OCT1990      -2
 29      24      29     29OCT1990    AAAAAAAA     .        2        28    30OCT1990      -1
 30      24      29     30OCT1990    AAAAAAAA     1        2        28    30OCT1990       0
 31      24      29     31OCT1990    AAAAAAAA     1        2        28    30OCT1990       1
 32      24      29     29OCT1990    AAAAAAAA     .        3        29    31OCT1990      -2
 33      24      29     30OCT1990    AAAAAAAA     1        3        29    31OCT1990      -1
 34      24      29     31OCT1990    AAAAAAAA     1        3        29    31OCT1990       0

 

View solution in original post

8 REPLIES 8
ballardw
Super User

You have not provided any example of Us_stock_1, no example data step code for the one you do show (here's one)

data us_stock;
   infile datalines missover;
   input DATE:mmddyy10. 	CUSIP $ 	TradingDay;
format date mmddyy10.; datalines; 9/21/1990 00002130 9/24/1990 00002130 9/25/1990 00002130 9/26/1990 00002130 1 9/27/1990 00002130 9/28/1990 00002130 1 10/1/1990 00002130 10/2/1990 00002130 10/3/1990 00002130 10/4/1990 00002130 10/5/1990 00002130 10/8/1990 00002130 10/9/1990 00002130 10/10/1990 00002130 10/11/1990 00002130 10/12/1990 00002130 10/15/1990 00002130 10/16/1990 00002130 10/17/1990 00002130 10/18/1990 00002130 1 10/19/1990 00002130 10/22/1990 00002130 1 10/23/1990 00002130 1 10/24/1990 00002130 10/25/1990 00002130 10/26/1990 00002130 1 10/29/1990 00002130 10/30/1990 00002130 1 10/31/1990 00002130 1 ;

so without the Us_stock_1 there isn't really any way to test this code.

 

I would expect some problem if you expect sequential records (from the _n_) to match dates of trading activity as I understand there are many days without trading. The way you show your TradingDay variable it appears that you are asking for many overlaps and that might be an issue. I also don't see any actual attempt to use the BY Cuspid (has to be a dental joke in there somewhere). So perhaps that is an issue with resetting P at some point but I haven't a clue what as I'm not sure I understand this process to begin with.

 

 

KS99
Obsidian | Level 7

Thank you, Ballardw, 

It was my mistake. 

Us_stock_1 is actually Us_stock. 

I revised it. 

 

 

ballardw
Super User

Maybe something like this? Which identifies the "groups" and creates separate output groups for each including overlaps. Start data set is small and has no output data for some of the groups because no dates exist for given cusip.

data have;
  infile cards truncover;
  input DATE :mmddyy. CUSIP $ Day ;
  format date date9.;
cards;
9/21/1990 00002130  
9/24/1990 00002130  
9/25/1990 00002130  
9/26/1990 00002130 1
9/27/1990 00002130  
9/28/1990 00002130 1
10/1/1990 00002130  
10/2/1990 00002130  
10/3/1990 00002130  
10/4/1990 00002130  
10/5/1990 00002130  
10/8/1990 00002130  
10/9/1990 00002130  
10/10/1990 00002130  
10/11/1990 00002130  
10/12/1990 00002130  
10/15/1990 00002130  
10/16/1990 00002130  
10/17/1990 00002130  
10/18/1990 00002130 1
10/19/1990 00002130  
10/22/1990 00002130 1
10/23/1990 00002130 1
10/24/1990 AAAAAAAA
10/25/1990 AAAAAAAA  
10/26/1990 AAAAAAAA 1
10/29/1990 AAAAAAAA  
10/30/1990 AAAAAAAA 1
10/31/1990 AAAAAAAA 1
;

data helper;
   set have;
   by cusip;
   if first.cusip then call missing(group);
   if day then do;
      startdate = date+2;
      enddate   = date+22;
      group+1;
      output;
   end;
   format startdate enddate date9.;
run;


proc sql;
   create table want as 
   select b.date, b.cusip, a.group
   from helper as a
        left join
        have as b
        on a.cusip=b.cusip
  where a.startdate le b.date le a.enddate
  order by b.cusip,a.group,b.date
  ;
quit;
KS99
Obsidian | Level 7
Oh, hi, Ballardw,

Thank you for your codes. I will keep them and use them for a future research.
I wish you a very good week!

KS -,
Tom
Super User Tom
Super User

So you want to convert 1 observation into 21 observations?

You do realize that means some of the observations will be output more than once. Unless that is a large gap between records with TRADING_DAY=1.

 

What is the variable that will indicate which block/period/group those 21 observations belong to?  You don't seem to have created one and you cannot use any of the existing variables since they will get overwritten when you run the SET POINT= statement in the loop.

 

What are you going to do with these groups of 21 observations once you have them?

KS99
Obsidian | Level 7
Hi, Tom,
Sorry for my late reply.
Once I get 21 observations, I was going to calculate the means of them for each Cusip-Cluster.

KS -,
Tom
Super User Tom
Super User

So I would recommend first finding the first/last point= value for each CUSIP.

Then you can use those in your calculations of the range of observations to replicated.

You probably also want to keep something to indicate which particular "trade day" you results are for.

First let's adjust your data to have more than one CUSIP value.  I also shortened you "TRADE DAY" column header into use DAY for the variable name.

data have;
  infile cards truncover;
  input DATE :mmddyy. CUSIP $ Day ;
  format date date9.;
cards;
9/21/1990 00002130  
9/24/1990 00002130  
9/25/1990 00002130  
9/26/1990 00002130 1
9/27/1990 00002130  
9/28/1990 00002130 1
10/1/1990 00002130  
10/2/1990 00002130  
10/3/1990 00002130  
10/4/1990 00002130  
10/5/1990 00002130  
10/8/1990 00002130  
10/9/1990 00002130  
10/10/1990 00002130  
10/11/1990 00002130  
10/12/1990 00002130  
10/15/1990 00002130  
10/16/1990 00002130  
10/17/1990 00002130  
10/18/1990 00002130 1
10/19/1990 00002130  
10/22/1990 00002130 1
10/23/1990 00002130 1
10/24/1990 AAAAAAAA
10/25/1990 AAAAAAAA  
10/26/1990 AAAAAAAA 1
10/29/1990 AAAAAAAA  
10/30/1990 AAAAAAAA 1
10/31/1990 AAAAAAAA 1
;

Now we can use a double DOW loop.  The first will find the start and stop record number (point= values) for each CUSIP.  Then the second will look for the DAY=1 records and then output multiple observations relative to that date.  I used a range of 5 observations from -2 to +2 just for illustration.


data want;
* find first/last obs for CUSIP ;
  start=stop+1;
  retain stop 0;
do stop=stop+1 by 1 until(last.cusip);
  set have end=eof nobs=nobs;
  by cusip;
end;
put (_n_ nobs cusip start stop ) (=);
* Look for trading days ;
tradeno=0;
do obs=start to stop;
  set have;
  if day=1 then do;
    tradeno+1;
    tradedate=date;
    format tradedate date9.;
    do p=max(obs-2,start) to min(obs+2,stop);
      offset=p-obs;
      set have point=p;
      output;
    end;
  end;
end;
run; 

Results:

Obs    start    stop         DATE     CUSIP      Day    tradeno    obs    tradedate    offset

  1       1      23     24SEP1990    00002130     .        1         4    26SEP1990      -2
  2       1      23     25SEP1990    00002130     .        1         4    26SEP1990      -1
  3       1      23     26SEP1990    00002130     1        1         4    26SEP1990       0
  4       1      23     27SEP1990    00002130     .        1         4    26SEP1990       1
  5       1      23     28SEP1990    00002130     1        1         4    26SEP1990       2
  6       1      23     26SEP1990    00002130     1        2         6    28SEP1990      -2
  7       1      23     27SEP1990    00002130     .        2         6    28SEP1990      -1
  8       1      23     28SEP1990    00002130     1        2         6    28SEP1990       0
  9       1      23     01OCT1990    00002130     .        2         6    28SEP1990       1
 10       1      23     02OCT1990    00002130     .        2         6    28SEP1990       2
 11       1      23     16OCT1990    00002130     .        3        20    18OCT1990      -2
 12       1      23     17OCT1990    00002130     .        3        20    18OCT1990      -1
 13       1      23     18OCT1990    00002130     1        3        20    18OCT1990       0
 14       1      23     19OCT1990    00002130     .        3        20    18OCT1990       1
 15       1      23     22OCT1990    00002130     1        3        20    18OCT1990       2
 16       1      23     18OCT1990    00002130     1        4        22    22OCT1990      -2
 17       1      23     19OCT1990    00002130     .        4        22    22OCT1990      -1
 18       1      23     22OCT1990    00002130     1        4        22    22OCT1990       0
 19       1      23     23OCT1990    00002130     1        4        22    22OCT1990       1
 20       1      23     19OCT1990    00002130     .        5        23    23OCT1990      -2
 21       1      23     22OCT1990    00002130     1        5        23    23OCT1990      -1
 22       1      23     23OCT1990    00002130     1        5        23    23OCT1990       0
 23      24      29     24OCT1990    AAAAAAAA     .        1        26    26OCT1990      -2
 24      24      29     25OCT1990    AAAAAAAA     .        1        26    26OCT1990      -1
 25      24      29     26OCT1990    AAAAAAAA     1        1        26    26OCT1990       0
 26      24      29     29OCT1990    AAAAAAAA     .        1        26    26OCT1990       1
 27      24      29     30OCT1990    AAAAAAAA     1        1        26    26OCT1990       2
 28      24      29     26OCT1990    AAAAAAAA     1        2        28    30OCT1990      -2
 29      24      29     29OCT1990    AAAAAAAA     .        2        28    30OCT1990      -1
 30      24      29     30OCT1990    AAAAAAAA     1        2        28    30OCT1990       0
 31      24      29     31OCT1990    AAAAAAAA     1        2        28    30OCT1990       1
 32      24      29     29OCT1990    AAAAAAAA     .        3        29    31OCT1990      -2
 33      24      29     30OCT1990    AAAAAAAA     1        3        29    31OCT1990      -1
 34      24      29     31OCT1990    AAAAAAAA     1        3        29    31OCT1990       0

 

KS99
Obsidian | Level 7

Dear Tom, 

 

Furthermore: I applied your codes created above, and they worked more than perfectly! 

I applied them to my subsequent analysis also, and they work still more than perfectly and robustly! 

 

Thank you for helping me with this. 

 

I really appreciate your help-out, and 

I wish you a very good night! 

 

Sincerely, 

KS -, 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 488 views
  • 0 likes
  • 3 in conversation