Hi, greetings,
Thank you in advance, SAS community!
I have a large US daily stock data. If a company (code: cusip) had an event on a certain day (code: date), I marked that observation with a new variable t=1, and t=. otherwise.
Now, I want to pick the stock price (code: prc) on the same day as the marked event day, but of a prior year. I also want to pick the stock price on the end day of the event's prior year.
I know that I should use the intnx function. But I have a poor command of it and, whenever I used it, the function produced mechanically same days or end days which are sometimes non-existing in the previous year in my sample. Now, what I want to do is this:
If such a same (end) day is missing, and thus its stock price is missing, I want to pick the nearest earlier day and its stock price instead.
To pick the same day from the prior year, I used the following code:
data usa_stock_a; set Usa_stock; by cusip; if t=1 then Year_ago = intnx ('year', date,-1,'s'); run;
So, in other words, my original dataset was:
CUSIP | DATE | PRC | t |
00030710 | 2/20/2015 | 30.38999939 | |
00030710 | 2/23/2015 | 32.09999847 | |
00030710 | 2/24/2015 | 33.25999832 | |
......................... | |||
00030710 | 2/22/2016 | 22.68000031 | |
00030710 | 2/23/2016 | 23.23999977 | 1 |
00030710 | 2/24/2016 | 23.05999947 | |
00030710 | 2/25/2016 | 22.72999954 | |
00030710 | 2/26/2016 | 21.73999977 | |
00030710 | 2/29/2016 | 21.4326678 | |
…......................... | |||
00030710 | 2/24/2017 | 7.579999924 | |
00030710 | 2/27/2017 | 7.590000153 | 1 |
00030710 | 2/28/2017 | 9.170000076 | |
00030710 | 3/1/2017 | 8.909999847 | |
00030710 | 3/2/2017 | 8.909999847 |
I want to create the following:
CUSIP | DATE | PRC | t | Year_ago_1 | PRC_ago_1 |
00030710 | 2/20/2015 | 30.38999939 | . | ||
00030710 | 2/23/2015 | 32.09999847 | . | ||
00030710 | 2/24/2015 | 33.25999832 | . | ||
......................... | |||||
00030710 | 2/22/2016 | 22.68000031 | . | ||
00030710 | 2/23/2016 | 23.23999977 | 1 | 2/23/2015 | 32.09999847 |
00030710 | 2/24/2016 | 23.05999947 | . | ||
00030710 | 2/25/2016 | 22.72999954 | . | ||
00030710 | 2/26/2016 | 21.73999977 | . | ||
00030710 | 2/29/2016 | 21.4326678 | . | ||
…......................... | |||||
00030710 | 2/24/2017 | 7.579999924 | . | ||
00030710 | 2/27/2017 | 7.590000153 | 1 | 2/26/2016 | 21.73999977 |
00030710 | 2/28/2017 | 9.170000076 | . | ||
00030710 | 3/1/2017 | 8.909999847 | . | ||
00030710 | 3/2/2017 | 8.909999847 | . |
If you teach me how to do this, I can do it for picking the end days by myself!
Many thanks!
KS -,
This should run reasonably quickly (assuming data is already sorted).
First note it is actually EASIER to post the code to create your dataset than it is to go to the trouble of trying to make a "pretty" table.
data have;
infile cards truncover;
input CUSIP $ DATE :mmddyy. PRC t ;
format date yymmdd10.;
cards;
00030710 2/20/2015 30.38999939
00030710 2/23/2015 32.09999847
00030710 2/24/2015 33.25999832
00030710 2/22/2016 22.68000031
00030710 2/23/2016 23.23999977 1
00030710 2/24/2016 23.05999947
00030710 2/25/2016 22.72999954
00030710 2/26/2016 21.73999977
00030710 2/29/2016 21.4326678
00030710 2/24/2017 7.579999924
00030710 2/27/2017 7.590000153 1
00030710 2/28/2017 9.170000076
00030710 3/1/2017 8.909999847
00030710 3/2/2017 8.909999847
;
So make a copy of the EVENTS and calculate the year ago date. Use that older date as the DATE value, move the actual date and actual PRC to new variable names so you don't create conflicts later. This step could be a VIEW instead of physical copy.
data year_ago / view=year_ago ;
set have;
where t=1 ;
year_ago = intnx('year',date,-1,'s');
format year_ago yymmdd10.;
rename year_ago = date date=event_date prc=event_prc ;
run;
Now INTERLEAVE this records with the original data. Remember the values read from the last original observation. Then when you hit one of these year ago dates output the result. You probably should add a test to prevent getting a value from more than say 370 days ago.
data want ;
set have (drop=t in=in1) year_ago;
by cusip date ;
retain old_date old_prc ;
if first.cusip then call missing(of old_date old_prc);
if in1 then do;
old_date = date;
old_prc = prc ;
format old_date yymmdd10.;
end;
if t=1 then do;
days = event_date - old_date ;
if (days > 370) then call missing(of old_date old_prc);
output;
end;
drop date prc ;
run;
Results
event_ Obs CUSIP event_date prc t old_date old_prc days 1 00030710 2016-02-23 23.2400 1 2015-02-23 32.1000 365 2 00030710 2017-02-27 7.5900 1 2016-02-26 21.7400 367
This is UNTESTED code. If you want tested code, please provide data as SAS data step code (instructions).
This solution uses PROC SQL to merge the data with itself selecting the date one year earlier to merge with — and if that date doesn't exist, we look back a maximum of 5 days to find a date that does exist and then select the most recent date that does exist.
proc sql;
create table want as select
a.*
,case when a.t=1 then b.date else . end as date_last_year format=date9.
,case when a.t=1 then b.prc else . end as prc_last_year
from have as a left join have as b
on a.cusip=b.cusip and (intnx('year',a.date,-1,'s')-b.date) between 0 and 5
group by a.cusip,a.date
having b.date=max(b.date);
quit;
Dear PaigeMiller,
Thank you for your code!
I tested your code, and it does work.
One thing about it is that it takes way too much time, since my dataset has a few tens of millions of observations.
I do not care about the code length. If you have time, can you write a less time-taking one for me?
Not only this time, but for many times in the past, I really appreciate your help, Miller!
Sincerely,
KS -
I think you should start a new thread, explain the problem, and clearly state you want a relatively speedy solution for data sets that have tens of millions of observations.
This should run reasonably quickly (assuming data is already sorted).
First note it is actually EASIER to post the code to create your dataset than it is to go to the trouble of trying to make a "pretty" table.
data have;
infile cards truncover;
input CUSIP $ DATE :mmddyy. PRC t ;
format date yymmdd10.;
cards;
00030710 2/20/2015 30.38999939
00030710 2/23/2015 32.09999847
00030710 2/24/2015 33.25999832
00030710 2/22/2016 22.68000031
00030710 2/23/2016 23.23999977 1
00030710 2/24/2016 23.05999947
00030710 2/25/2016 22.72999954
00030710 2/26/2016 21.73999977
00030710 2/29/2016 21.4326678
00030710 2/24/2017 7.579999924
00030710 2/27/2017 7.590000153 1
00030710 2/28/2017 9.170000076
00030710 3/1/2017 8.909999847
00030710 3/2/2017 8.909999847
;
So make a copy of the EVENTS and calculate the year ago date. Use that older date as the DATE value, move the actual date and actual PRC to new variable names so you don't create conflicts later. This step could be a VIEW instead of physical copy.
data year_ago / view=year_ago ;
set have;
where t=1 ;
year_ago = intnx('year',date,-1,'s');
format year_ago yymmdd10.;
rename year_ago = date date=event_date prc=event_prc ;
run;
Now INTERLEAVE this records with the original data. Remember the values read from the last original observation. Then when you hit one of these year ago dates output the result. You probably should add a test to prevent getting a value from more than say 370 days ago.
data want ;
set have (drop=t in=in1) year_ago;
by cusip date ;
retain old_date old_prc ;
if first.cusip then call missing(of old_date old_prc);
if in1 then do;
old_date = date;
old_prc = prc ;
format old_date yymmdd10.;
end;
if t=1 then do;
days = event_date - old_date ;
if (days > 370) then call missing(of old_date old_prc);
output;
end;
drop date prc ;
run;
Results
event_ Obs CUSIP event_date prc t old_date old_prc days 1 00030710 2016-02-23 23.2400 1 2015-02-23 32.1000 365 2 00030710 2017-02-27 7.5900 1 2016-02-26 21.7400 367
Thank you, Tom,
Your code works way faster than I expected!
The logic you are using is the one I was thinking. But I had a problem in creating workable codes. Yours are a bit over my head, but I will try to learn them.
I wish you a nice weekend!
Sincerely,
KS -,
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.