Hi, SAS Community,
I am sorry if I am bothering you guys with many questions today.
But your help is much more effective than my own creation of hours and hours.
I am dealing with a large dataset of US daily stock returns. A variable "Trading Day" indicates specific days in which firm insiders traded their stocks.
DATE | CUSIP | Trading Day |
9/21/1990 | 00002130 | |
9/24/1990 | 00002130 | |
9/25/1990 | 00002130 | |
9/26/1990 | 00002130 | 1 |
9/27/1990 | 00002130 | |
9/28/1990 | 00002130 | 1 |
10/1/1990 | 00002130 | |
10/2/1990 | 00002130 | |
10/3/1990 | 00002130 | |
10/4/1990 | 00002130 | |
10/5/1990 | 00002130 | |
10/8/1990 | 00002130 | |
10/9/1990 | 00002130 | |
10/10/1990 | 00002130 | |
10/11/1990 | 00002130 | |
10/12/1990 | 00002130 | |
10/15/1990 | 00002130 | |
10/16/1990 | 00002130 | |
10/17/1990 | 00002130 | |
10/18/1990 | 00002130 | 1 |
10/19/1990 | 00002130 | |
10/22/1990 | 00002130 | 1 |
10/23/1990 | 00002130 | 1 |
10/24/1990 | 00002130 | |
10/25/1990 | 00002130 | |
10/26/1990 | 00002130 | 1 |
10/29/1990 | 00002130 | |
10/30/1990 | 00002130 | 1 |
10/31/1990 | 00002130 | 1 |
…...... |
Now, I want to extract days +2 to +22 from the specific date Trading Day=1 (when insiders traded). For this, I used the following codes:
data Temp;
set Us_stock;
BY CUSIP;
if TradingDay>. then do p=max(_n_+2,1,p+1) to min(_n_+22,nobs);
set Us_stock point=p nobs=nobs;
output;
end;
run;
But what I obtained is the following:
DATE | CUSIP | Cluster |
9/28/1990 | 00002130 | 1 |
10/1/1990 | 00002130 | 1 |
10/2/1990 | 00002130 | 1 |
10/3/1990 | 00002130 | 1 |
10/4/1990 | 00002130 | 1 |
10/5/1990 | 00002130 | 1 |
10/8/1990 | 00002130 | 1 |
10/9/1990 | 00002130 | 1 |
10/10/1990 | 00002130 | 1 |
10/11/1990 | 00002130 | 1 |
10/12/1990 | 00002130 | 1 |
10/15/1990 | 00002130 | 1 |
10/16/1990 | 00002130 | 1 |
10/17/1990 | 00002130 | 1 |
10/18/1990 | 00002130 | 1 |
10/19/1990 | 00002130 | 1 |
10/22/1990 | 00002130 | 1 |
10/23/1990 | 00002130 | 1 |
10/24/1990 | 00002130 | 1 |
10/25/1990 | 00002130 | 1 |
10/26/1990 | 00002130 | 1 |
10/30/1990 | 00002130 | 2 |
11/1/1990 | 00002130 | 2 |
11/2/1990 | 00002130 | 2 |
11/5/1990 | 00002130 | 2 |
11/6/1990 | 00002130 | 2 |
11/7/1990 | 00002130 | 2 |
11/8/1990 | 00002130 | 2 |
11/9/1990 | 00002130 | 2 |
11/12/1990 | 00002130 | 2 |
…...... |
Never mind the variable "Cluster." I created it to do proc means by group.
For the trading day 09/26/1990 (Cusip: 00002130), I have successfully extracted the days from 09/28/1990 to 10/26/1990. And, for the next trade on 09/28/1990, I am supposed to have 10/02/1990 to 10/30/1990. However, my codes is giving me something else, from 10/30/1990 ~. How should I get out of this? Should I abandon "p=max() to min()"?
Many thanks in advance!!
Sincerely,
KS -,
So I would recommend first finding the first/last point= value for each CUSIP.
Then you can use those in your calculations of the range of observations to replicated.
You probably also want to keep something to indicate which particular "trade day" you results are for.
First let's adjust your data to have more than one CUSIP value. I also shortened you "TRADE DAY" column header into use DAY for the variable name.
data have;
infile cards truncover;
input DATE :mmddyy. CUSIP $ Day ;
format date date9.;
cards;
9/21/1990 00002130
9/24/1990 00002130
9/25/1990 00002130
9/26/1990 00002130 1
9/27/1990 00002130
9/28/1990 00002130 1
10/1/1990 00002130
10/2/1990 00002130
10/3/1990 00002130
10/4/1990 00002130
10/5/1990 00002130
10/8/1990 00002130
10/9/1990 00002130
10/10/1990 00002130
10/11/1990 00002130
10/12/1990 00002130
10/15/1990 00002130
10/16/1990 00002130
10/17/1990 00002130
10/18/1990 00002130 1
10/19/1990 00002130
10/22/1990 00002130 1
10/23/1990 00002130 1
10/24/1990 AAAAAAAA
10/25/1990 AAAAAAAA
10/26/1990 AAAAAAAA 1
10/29/1990 AAAAAAAA
10/30/1990 AAAAAAAA 1
10/31/1990 AAAAAAAA 1
;
Now we can use a double DOW loop. The first will find the start and stop record number (point= values) for each CUSIP. Then the second will look for the DAY=1 records and then output multiple observations relative to that date. I used a range of 5 observations from -2 to +2 just for illustration.
data want;
* find first/last obs for CUSIP ;
start=stop+1;
retain stop 0;
do stop=stop+1 by 1 until(last.cusip);
set have end=eof nobs=nobs;
by cusip;
end;
put (_n_ nobs cusip start stop ) (=);
* Look for trading days ;
tradeno=0;
do obs=start to stop;
set have;
if day=1 then do;
tradeno+1;
tradedate=date;
format tradedate date9.;
do p=max(obs-2,start) to min(obs+2,stop);
offset=p-obs;
set have point=p;
output;
end;
end;
end;
run;
Results:
Obs start stop DATE CUSIP Day tradeno obs tradedate offset 1 1 23 24SEP1990 00002130 . 1 4 26SEP1990 -2 2 1 23 25SEP1990 00002130 . 1 4 26SEP1990 -1 3 1 23 26SEP1990 00002130 1 1 4 26SEP1990 0 4 1 23 27SEP1990 00002130 . 1 4 26SEP1990 1 5 1 23 28SEP1990 00002130 1 1 4 26SEP1990 2 6 1 23 26SEP1990 00002130 1 2 6 28SEP1990 -2 7 1 23 27SEP1990 00002130 . 2 6 28SEP1990 -1 8 1 23 28SEP1990 00002130 1 2 6 28SEP1990 0 9 1 23 01OCT1990 00002130 . 2 6 28SEP1990 1 10 1 23 02OCT1990 00002130 . 2 6 28SEP1990 2 11 1 23 16OCT1990 00002130 . 3 20 18OCT1990 -2 12 1 23 17OCT1990 00002130 . 3 20 18OCT1990 -1 13 1 23 18OCT1990 00002130 1 3 20 18OCT1990 0 14 1 23 19OCT1990 00002130 . 3 20 18OCT1990 1 15 1 23 22OCT1990 00002130 1 3 20 18OCT1990 2 16 1 23 18OCT1990 00002130 1 4 22 22OCT1990 -2 17 1 23 19OCT1990 00002130 . 4 22 22OCT1990 -1 18 1 23 22OCT1990 00002130 1 4 22 22OCT1990 0 19 1 23 23OCT1990 00002130 1 4 22 22OCT1990 1 20 1 23 19OCT1990 00002130 . 5 23 23OCT1990 -2 21 1 23 22OCT1990 00002130 1 5 23 23OCT1990 -1 22 1 23 23OCT1990 00002130 1 5 23 23OCT1990 0 23 24 29 24OCT1990 AAAAAAAA . 1 26 26OCT1990 -2 24 24 29 25OCT1990 AAAAAAAA . 1 26 26OCT1990 -1 25 24 29 26OCT1990 AAAAAAAA 1 1 26 26OCT1990 0 26 24 29 29OCT1990 AAAAAAAA . 1 26 26OCT1990 1 27 24 29 30OCT1990 AAAAAAAA 1 1 26 26OCT1990 2 28 24 29 26OCT1990 AAAAAAAA 1 2 28 30OCT1990 -2 29 24 29 29OCT1990 AAAAAAAA . 2 28 30OCT1990 -1 30 24 29 30OCT1990 AAAAAAAA 1 2 28 30OCT1990 0 31 24 29 31OCT1990 AAAAAAAA 1 2 28 30OCT1990 1 32 24 29 29OCT1990 AAAAAAAA . 3 29 31OCT1990 -2 33 24 29 30OCT1990 AAAAAAAA 1 3 29 31OCT1990 -1 34 24 29 31OCT1990 AAAAAAAA 1 3 29 31OCT1990 0
You have not provided any example of Us_stock_1, no example data step code for the one you do show (here's one)
data us_stock; infile datalines missover; input DATE:mmddyy10. CUSIP $ TradingDay;
format date mmddyy10.; datalines; 9/21/1990 00002130 9/24/1990 00002130 9/25/1990 00002130 9/26/1990 00002130 1 9/27/1990 00002130 9/28/1990 00002130 1 10/1/1990 00002130 10/2/1990 00002130 10/3/1990 00002130 10/4/1990 00002130 10/5/1990 00002130 10/8/1990 00002130 10/9/1990 00002130 10/10/1990 00002130 10/11/1990 00002130 10/12/1990 00002130 10/15/1990 00002130 10/16/1990 00002130 10/17/1990 00002130 10/18/1990 00002130 1 10/19/1990 00002130 10/22/1990 00002130 1 10/23/1990 00002130 1 10/24/1990 00002130 10/25/1990 00002130 10/26/1990 00002130 1 10/29/1990 00002130 10/30/1990 00002130 1 10/31/1990 00002130 1 ;
so without the Us_stock_1 there isn't really any way to test this code.
I would expect some problem if you expect sequential records (from the _n_) to match dates of trading activity as I understand there are many days without trading. The way you show your TradingDay variable it appears that you are asking for many overlaps and that might be an issue. I also don't see any actual attempt to use the BY Cuspid (has to be a dental joke in there somewhere). So perhaps that is an issue with resetting P at some point but I haven't a clue what as I'm not sure I understand this process to begin with.
Thank you, Ballardw,
It was my mistake.
Us_stock_1 is actually Us_stock.
I revised it.
Maybe something like this? Which identifies the "groups" and creates separate output groups for each including overlaps. Start data set is small and has no output data for some of the groups because no dates exist for given cusip.
data have; infile cards truncover; input DATE :mmddyy. CUSIP $ Day ; format date date9.; cards; 9/21/1990 00002130 9/24/1990 00002130 9/25/1990 00002130 9/26/1990 00002130 1 9/27/1990 00002130 9/28/1990 00002130 1 10/1/1990 00002130 10/2/1990 00002130 10/3/1990 00002130 10/4/1990 00002130 10/5/1990 00002130 10/8/1990 00002130 10/9/1990 00002130 10/10/1990 00002130 10/11/1990 00002130 10/12/1990 00002130 10/15/1990 00002130 10/16/1990 00002130 10/17/1990 00002130 10/18/1990 00002130 1 10/19/1990 00002130 10/22/1990 00002130 1 10/23/1990 00002130 1 10/24/1990 AAAAAAAA 10/25/1990 AAAAAAAA 10/26/1990 AAAAAAAA 1 10/29/1990 AAAAAAAA 10/30/1990 AAAAAAAA 1 10/31/1990 AAAAAAAA 1 ; data helper; set have; by cusip; if first.cusip then call missing(group); if day then do; startdate = date+2; enddate = date+22; group+1; output; end; format startdate enddate date9.; run; proc sql; create table want as select b.date, b.cusip, a.group from helper as a left join have as b on a.cusip=b.cusip where a.startdate le b.date le a.enddate order by b.cusip,a.group,b.date ; quit;
So you want to convert 1 observation into 21 observations?
You do realize that means some of the observations will be output more than once. Unless that is a large gap between records with TRADING_DAY=1.
What is the variable that will indicate which block/period/group those 21 observations belong to? You don't seem to have created one and you cannot use any of the existing variables since they will get overwritten when you run the SET POINT= statement in the loop.
What are you going to do with these groups of 21 observations once you have them?
So I would recommend first finding the first/last point= value for each CUSIP.
Then you can use those in your calculations of the range of observations to replicated.
You probably also want to keep something to indicate which particular "trade day" you results are for.
First let's adjust your data to have more than one CUSIP value. I also shortened you "TRADE DAY" column header into use DAY for the variable name.
data have;
infile cards truncover;
input DATE :mmddyy. CUSIP $ Day ;
format date date9.;
cards;
9/21/1990 00002130
9/24/1990 00002130
9/25/1990 00002130
9/26/1990 00002130 1
9/27/1990 00002130
9/28/1990 00002130 1
10/1/1990 00002130
10/2/1990 00002130
10/3/1990 00002130
10/4/1990 00002130
10/5/1990 00002130
10/8/1990 00002130
10/9/1990 00002130
10/10/1990 00002130
10/11/1990 00002130
10/12/1990 00002130
10/15/1990 00002130
10/16/1990 00002130
10/17/1990 00002130
10/18/1990 00002130 1
10/19/1990 00002130
10/22/1990 00002130 1
10/23/1990 00002130 1
10/24/1990 AAAAAAAA
10/25/1990 AAAAAAAA
10/26/1990 AAAAAAAA 1
10/29/1990 AAAAAAAA
10/30/1990 AAAAAAAA 1
10/31/1990 AAAAAAAA 1
;
Now we can use a double DOW loop. The first will find the start and stop record number (point= values) for each CUSIP. Then the second will look for the DAY=1 records and then output multiple observations relative to that date. I used a range of 5 observations from -2 to +2 just for illustration.
data want;
* find first/last obs for CUSIP ;
start=stop+1;
retain stop 0;
do stop=stop+1 by 1 until(last.cusip);
set have end=eof nobs=nobs;
by cusip;
end;
put (_n_ nobs cusip start stop ) (=);
* Look for trading days ;
tradeno=0;
do obs=start to stop;
set have;
if day=1 then do;
tradeno+1;
tradedate=date;
format tradedate date9.;
do p=max(obs-2,start) to min(obs+2,stop);
offset=p-obs;
set have point=p;
output;
end;
end;
end;
run;
Results:
Obs start stop DATE CUSIP Day tradeno obs tradedate offset 1 1 23 24SEP1990 00002130 . 1 4 26SEP1990 -2 2 1 23 25SEP1990 00002130 . 1 4 26SEP1990 -1 3 1 23 26SEP1990 00002130 1 1 4 26SEP1990 0 4 1 23 27SEP1990 00002130 . 1 4 26SEP1990 1 5 1 23 28SEP1990 00002130 1 1 4 26SEP1990 2 6 1 23 26SEP1990 00002130 1 2 6 28SEP1990 -2 7 1 23 27SEP1990 00002130 . 2 6 28SEP1990 -1 8 1 23 28SEP1990 00002130 1 2 6 28SEP1990 0 9 1 23 01OCT1990 00002130 . 2 6 28SEP1990 1 10 1 23 02OCT1990 00002130 . 2 6 28SEP1990 2 11 1 23 16OCT1990 00002130 . 3 20 18OCT1990 -2 12 1 23 17OCT1990 00002130 . 3 20 18OCT1990 -1 13 1 23 18OCT1990 00002130 1 3 20 18OCT1990 0 14 1 23 19OCT1990 00002130 . 3 20 18OCT1990 1 15 1 23 22OCT1990 00002130 1 3 20 18OCT1990 2 16 1 23 18OCT1990 00002130 1 4 22 22OCT1990 -2 17 1 23 19OCT1990 00002130 . 4 22 22OCT1990 -1 18 1 23 22OCT1990 00002130 1 4 22 22OCT1990 0 19 1 23 23OCT1990 00002130 1 4 22 22OCT1990 1 20 1 23 19OCT1990 00002130 . 5 23 23OCT1990 -2 21 1 23 22OCT1990 00002130 1 5 23 23OCT1990 -1 22 1 23 23OCT1990 00002130 1 5 23 23OCT1990 0 23 24 29 24OCT1990 AAAAAAAA . 1 26 26OCT1990 -2 24 24 29 25OCT1990 AAAAAAAA . 1 26 26OCT1990 -1 25 24 29 26OCT1990 AAAAAAAA 1 1 26 26OCT1990 0 26 24 29 29OCT1990 AAAAAAAA . 1 26 26OCT1990 1 27 24 29 30OCT1990 AAAAAAAA 1 1 26 26OCT1990 2 28 24 29 26OCT1990 AAAAAAAA 1 2 28 30OCT1990 -2 29 24 29 29OCT1990 AAAAAAAA . 2 28 30OCT1990 -1 30 24 29 30OCT1990 AAAAAAAA 1 2 28 30OCT1990 0 31 24 29 31OCT1990 AAAAAAAA 1 2 28 30OCT1990 1 32 24 29 29OCT1990 AAAAAAAA . 3 29 31OCT1990 -2 33 24 29 30OCT1990 AAAAAAAA 1 3 29 31OCT1990 -1 34 24 29 31OCT1990 AAAAAAAA 1 3 29 31OCT1990 0
Dear Tom,
Furthermore: I applied your codes created above, and they worked more than perfectly!
I applied them to my subsequent analysis also, and they work still more than perfectly and robustly!
Thank you for helping me with this.
I really appreciate your help-out, and
I wish you a very good night!
Sincerely,
KS -,
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.