☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Obsidian | Level 7

## Choosing one-year prior 'same day' or 'end day' or their closest day if missing

Hi, greetings,

Thank you in advance, SAS community!

I have a large US daily stock data. If a company (code: cusip) had an event on a certain day (code: date), I marked that observation with a new variable t=1, and t=. otherwise.

Now, I want to pick the stock price (code: prc) on the same day as the marked event day, but of a prior year. I also want to pick the stock price on the end day of the event's prior year.

I know that I should use the intnx function. But I have a poor command of it and, whenever I used it, the function produced mechanically same days or end days which are sometimes non-existing in the previous year in my sample. Now, what I want to do is this:

If such a same (end) day is missing, and thus its stock price is missing, I want to pick the nearest earlier day and its stock price instead.

To pick the same day from the prior year, I used the following code:

``data usa_stock_a; set Usa_stock; by cusip; if t=1 then Year_ago = intnx ('year', date,-1,'s'); run; ``

So, in other words, my original dataset was:

 CUSIP DATE PRC t 00030710 2/20/2015 30.38999939 00030710 2/23/2015 32.09999847 00030710 2/24/2015 33.25999832 ......................... 00030710 2/22/2016 22.68000031 00030710 2/23/2016 23.23999977 1 00030710 2/24/2016 23.05999947 00030710 2/25/2016 22.72999954 00030710 2/26/2016 21.73999977 00030710 2/29/2016 21.4326678 …......................... 00030710 2/24/2017 7.579999924 00030710 2/27/2017 7.590000153 1 00030710 2/28/2017 9.170000076 00030710 3/1/2017 8.909999847 00030710 3/2/2017 8.909999847

I want to create the following:

 CUSIP DATE PRC t Year_ago_1 PRC_ago_1 00030710 2/20/2015 30.38999939 . 00030710 2/23/2015 32.09999847 . 00030710 2/24/2015 33.25999832 . ......................... 00030710 2/22/2016 22.68000031 . 00030710 2/23/2016 23.23999977 1 2/23/2015 32.09999847 00030710 2/24/2016 23.05999947 . 00030710 2/25/2016 22.72999954 . 00030710 2/26/2016 21.73999977 . 00030710 2/29/2016 21.4326678 . …......................... 00030710 2/24/2017 7.579999924 . 00030710 2/27/2017 7.590000153 1 2/26/2016 21.73999977 00030710 2/28/2017 9.170000076 . 00030710 3/1/2017 8.909999847 . 00030710 3/2/2017 8.909999847 .

If you teach me how to do this, I can do it for picking the end days by myself!

Many thanks!

KS -,

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Choosing one-year prior 'same day' or 'end day' or their closest day if missing

This should run reasonably quickly (assuming data is already sorted).

First note it is actually EASIER to post the code to create your dataset than it is to go to the trouble of trying to make a "pretty" table.

``````data have;
infile cards truncover;
input CUSIP \$ DATE :mmddyy.  PRC  t ;
format date yymmdd10.;
cards;
00030710  2/20/2015  30.38999939
00030710  2/23/2015  32.09999847
00030710  2/24/2015  33.25999832
00030710  2/22/2016  22.68000031
00030710  2/23/2016  23.23999977  1
00030710  2/24/2016  23.05999947
00030710  2/25/2016  22.72999954
00030710  2/26/2016  21.73999977
00030710  2/29/2016  21.4326678
00030710  2/24/2017  7.579999924
00030710  2/27/2017  7.590000153  1
00030710  2/28/2017  9.170000076
00030710  3/1/2017  8.909999847
00030710  3/2/2017  8.909999847
;``````

So make a copy of the EVENTS and calculate the year ago date.  Use that older date as the DATE value, move the actual date and actual PRC to new variable names so you don't create conflicts later.  This step could be a VIEW instead of physical copy.

``````data year_ago / view=year_ago ;
set have;
where t=1 ;
year_ago = intnx('year',date,-1,'s');
format year_ago yymmdd10.;
rename year_ago = date date=event_date prc=event_prc ;
run;
``````

Now INTERLEAVE this records with the original data.  Remember the values read from the last original observation.  Then when you hit one of these year ago dates output the result.  You probably should add a test to prevent getting a value from more than say 370 days ago.

``````data want ;
set have (drop=t in=in1) year_ago;
by cusip date ;
retain old_date old_prc ;
if first.cusip then call missing(of old_date old_prc);
if in1 then do;
old_date = date;
old_prc = prc ;
format old_date yymmdd10.;
end;
if t=1 then do;
days = event_date - old_date ;
if (days > 370) then call missing(of old_date old_prc);
output;
end;
drop date prc ;
run;``````

Results

```                                  event_
Obs     CUSIP      event_date      prc      t      old_date    old_prc    days

1     00030710    2016-02-23    23.2400    1    2015-02-23    32.1000     365
2     00030710    2017-02-27     7.5900    1    2016-02-26    21.7400     367

```
5 REPLIES 5
Diamond | Level 26

## Re: Choosing one-year prior 'same day' or 'end day' or their closest day if missing

This is UNTESTED code. If you want tested code, please provide data as SAS data step code (instructions).

This solution uses PROC SQL to merge the data with itself selecting the date one year earlier to merge with — and if that date doesn't exist, we look back a maximum of 5 days to find a date that does exist and then select the most recent date that does exist.

``````proc sql;
create table want as select
a.*
,case when a.t=1 then b.date else . end as date_last_year format=date9.
,case when a.t=1 then b.prc else . end as prc_last_year
from have as a left join have as b
on a.cusip=b.cusip and (intnx('year',a.date,-1,'s')-b.date) between 0 and 5
group by a.cusip,a.date
having b.date=max(b.date);
quit;``````

--
Paige Miller
Obsidian | Level 7

## Re: Choosing one-year prior 'same day' or 'end day' or their closest day if missing

Dear PaigeMiller,

Thank you for your code!

I tested your code, and it does work.

One thing about it is that it takes way too much time, since my dataset has a few tens of millions of observations.

I do not care about the code length. If you have time, can you write a less time-taking one for me?

Not only this time, but for many times in the past, I really appreciate your help, Miller!

Sincerely,

KS -

Diamond | Level 26

## Re: Choosing one-year prior 'same day' or 'end day' or their closest day if missing

I think you should start a new thread, explain the problem, and clearly state you want a relatively speedy solution for data sets that have tens of millions of observations.

--
Paige Miller
Super User

## Re: Choosing one-year prior 'same day' or 'end day' or their closest day if missing

This should run reasonably quickly (assuming data is already sorted).

First note it is actually EASIER to post the code to create your dataset than it is to go to the trouble of trying to make a "pretty" table.

``````data have;
infile cards truncover;
input CUSIP \$ DATE :mmddyy.  PRC  t ;
format date yymmdd10.;
cards;
00030710  2/20/2015  30.38999939
00030710  2/23/2015  32.09999847
00030710  2/24/2015  33.25999832
00030710  2/22/2016  22.68000031
00030710  2/23/2016  23.23999977  1
00030710  2/24/2016  23.05999947
00030710  2/25/2016  22.72999954
00030710  2/26/2016  21.73999977
00030710  2/29/2016  21.4326678
00030710  2/24/2017  7.579999924
00030710  2/27/2017  7.590000153  1
00030710  2/28/2017  9.170000076
00030710  3/1/2017  8.909999847
00030710  3/2/2017  8.909999847
;``````

So make a copy of the EVENTS and calculate the year ago date.  Use that older date as the DATE value, move the actual date and actual PRC to new variable names so you don't create conflicts later.  This step could be a VIEW instead of physical copy.

``````data year_ago / view=year_ago ;
set have;
where t=1 ;
year_ago = intnx('year',date,-1,'s');
format year_ago yymmdd10.;
rename year_ago = date date=event_date prc=event_prc ;
run;
``````

Now INTERLEAVE this records with the original data.  Remember the values read from the last original observation.  Then when you hit one of these year ago dates output the result.  You probably should add a test to prevent getting a value from more than say 370 days ago.

``````data want ;
set have (drop=t in=in1) year_ago;
by cusip date ;
retain old_date old_prc ;
if first.cusip then call missing(of old_date old_prc);
if in1 then do;
old_date = date;
old_prc = prc ;
format old_date yymmdd10.;
end;
if t=1 then do;
days = event_date - old_date ;
if (days > 370) then call missing(of old_date old_prc);
output;
end;
drop date prc ;
run;``````

Results

```                                  event_
Obs     CUSIP      event_date      prc      t      old_date    old_prc    days

1     00030710    2016-02-23    23.2400    1    2015-02-23    32.1000     365
2     00030710    2017-02-27     7.5900    1    2016-02-26    21.7400     367

```
Obsidian | Level 7

## Re: Choosing one-year prior 'same day' or 'end day' or their closest day if missing

Thank you, Tom,

Your code works way faster than I expected!

The logic you are using is the one I was thinking. But I had a problem in creating workable codes. Yours are a bit over my head, but I will try to learn them.

I wish you a nice weekend!

Sincerely,

KS -,

Discussion stats
• 5 replies
• 739 views
• 0 likes
• 3 in conversation