BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
KS99
Obsidian | Level 7

Hi, greetings, 

Thank you in advance, SAS community! 

 

I have a large US daily stock data. If a company (code: cusip) had an event on a certain day (code: date), I marked that observation with a new variable t=1, and t=. otherwise. 

 

Now, I want to pick the stock price (code: prc) on the same day as the marked event day, but of a prior year. I also want to pick the stock price on the end day of the event's prior year. 

 

I know that I should use the intnx function. But I have a poor command of it and, whenever I used it, the function produced mechanically same days or end days which are sometimes non-existing in the previous year in my sample. Now, what I want to do is this: 

 

If such a same (end) day is missing, and thus its stock price is missing, I want to pick the nearest earlier day and its stock price instead. 

 

To pick the same day from the prior year, I used the following code: 

data usa_stock_a; set Usa_stock; by cusip; if t=1 then Year_ago = intnx ('year', date,-1,'s'); run; 

So, in other words, my original dataset was: 

 

CUSIP DATE PRC t
00030710 2/20/2015 30.38999939  
00030710 2/23/2015 32.09999847  
00030710 2/24/2015 33.25999832  
.........................
00030710 2/22/2016 22.68000031  
00030710 2/23/2016 23.23999977 1
00030710 2/24/2016 23.05999947  
00030710 2/25/2016 22.72999954  
00030710 2/26/2016 21.73999977  
00030710 2/29/2016 21.4326678  
….........................
00030710 2/24/2017 7.579999924  
00030710 2/27/2017 7.590000153 1
00030710 2/28/2017 9.170000076  
00030710 3/1/2017 8.909999847  
00030710 3/2/2017 8.909999847  

 

I want to create the following: 

 

CUSIP DATE PRC t Year_ago_1 PRC_ago_1
00030710 2/20/2015 30.38999939   .  
00030710 2/23/2015 32.09999847   .  
00030710 2/24/2015 33.25999832   .  
.........................
00030710 2/22/2016 22.68000031   .  
00030710 2/23/2016 23.23999977 1 2/23/2015 32.09999847
00030710 2/24/2016 23.05999947   .  
00030710 2/25/2016 22.72999954   .  
00030710 2/26/2016 21.73999977   .  
00030710 2/29/2016 21.4326678   .  
….........................
00030710 2/24/2017 7.579999924   .  
00030710 2/27/2017 7.590000153 1 2/26/2016 21.73999977
00030710 2/28/2017 9.170000076   .  
00030710 3/1/2017 8.909999847   .  
00030710 3/2/2017 8.909999847   .  

 

If you teach me how to do this, I can do it for picking the end days by myself!  

 

Many thanks! 

 

KS -, 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

This should run reasonably quickly (assuming data is already sorted).

First note it is actually EASIER to post the code to create your dataset than it is to go to the trouble of trying to make a "pretty" table.

data have;
  infile cards truncover;
  input CUSIP $ DATE :mmddyy.  PRC  t ;
  format date yymmdd10.;
cards;
00030710  2/20/2015  30.38999939   
00030710  2/23/2015  32.09999847   
00030710  2/24/2015  33.25999832   
00030710  2/22/2016  22.68000031   
00030710  2/23/2016  23.23999977  1
00030710  2/24/2016  23.05999947   
00030710  2/25/2016  22.72999954   
00030710  2/26/2016  21.73999977   
00030710  2/29/2016  21.4326678   
00030710  2/24/2017  7.579999924   
00030710  2/27/2017  7.590000153  1
00030710  2/28/2017  9.170000076   
00030710  3/1/2017  8.909999847   
00030710  3/2/2017  8.909999847   
;

So make a copy of the EVENTS and calculate the year ago date.  Use that older date as the DATE value, move the actual date and actual PRC to new variable names so you don't create conflicts later.  This step could be a VIEW instead of physical copy.

data year_ago / view=year_ago ;
  set have;
  where t=1 ;
  year_ago = intnx('year',date,-1,'s');
  format year_ago yymmdd10.;
  rename year_ago = date date=event_date prc=event_prc ;
run;

Now INTERLEAVE this records with the original data.  Remember the values read from the last original observation.  Then when you hit one of these year ago dates output the result.  You probably should add a test to prevent getting a value from more than say 370 days ago.

data want ;
  set have (drop=t in=in1) year_ago;
  by cusip date ;
  retain old_date old_prc ;
  if first.cusip then call missing(of old_date old_prc);
  if in1 then do;
    old_date = date;
    old_prc = prc ;
    format old_date yymmdd10.;
  end;
  if t=1 then do;
    days = event_date - old_date ;
    if (days > 370) then call missing(of old_date old_prc);
    output;
  end;
  drop date prc ;
run;

Results

                                  event_
Obs     CUSIP      event_date      prc      t      old_date    old_prc    days

 1     00030710    2016-02-23    23.2400    1    2015-02-23    32.1000     365
 2     00030710    2017-02-27     7.5900    1    2016-02-26    21.7400     367

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

This is UNTESTED code. If you want tested code, please provide data as SAS data step code (instructions).

 

This solution uses PROC SQL to merge the data with itself selecting the date one year earlier to merge with — and if that date doesn't exist, we look back a maximum of 5 days to find a date that does exist and then select the most recent date that does exist.

 

proc sql;
    create table want as select
        a.*
        ,case when a.t=1 then b.date else . end as date_last_year format=date9.
        ,case when a.t=1 then b.prc else . end as prc_last_year
    from have as a left join have as b
        on a.cusip=b.cusip and (intnx('year',a.date,-1,'s')-b.date) between 0 and 5
    group by a.cusip,a.date
    having b.date=max(b.date);
quit;

 

--
Paige Miller
KS99
Obsidian | Level 7

Dear PaigeMiller, 

 

Thank you for your code! 

I tested your code, and it does work. 

One thing about it is that it takes way too much time, since my dataset has a few tens of millions of observations. 

I do not care about the code length. If you have time, can you write a less time-taking one for me? 

 

Not only this time, but for many times in the past, I really appreciate your help, Miller! 

 

Sincerely, 

 

KS - 

PaigeMiller
Diamond | Level 26

I think you should start a new thread, explain the problem, and clearly state you want a relatively speedy solution for data sets that have tens of millions of observations.

--
Paige Miller
Tom
Super User Tom
Super User

This should run reasonably quickly (assuming data is already sorted).

First note it is actually EASIER to post the code to create your dataset than it is to go to the trouble of trying to make a "pretty" table.

data have;
  infile cards truncover;
  input CUSIP $ DATE :mmddyy.  PRC  t ;
  format date yymmdd10.;
cards;
00030710  2/20/2015  30.38999939   
00030710  2/23/2015  32.09999847   
00030710  2/24/2015  33.25999832   
00030710  2/22/2016  22.68000031   
00030710  2/23/2016  23.23999977  1
00030710  2/24/2016  23.05999947   
00030710  2/25/2016  22.72999954   
00030710  2/26/2016  21.73999977   
00030710  2/29/2016  21.4326678   
00030710  2/24/2017  7.579999924   
00030710  2/27/2017  7.590000153  1
00030710  2/28/2017  9.170000076   
00030710  3/1/2017  8.909999847   
00030710  3/2/2017  8.909999847   
;

So make a copy of the EVENTS and calculate the year ago date.  Use that older date as the DATE value, move the actual date and actual PRC to new variable names so you don't create conflicts later.  This step could be a VIEW instead of physical copy.

data year_ago / view=year_ago ;
  set have;
  where t=1 ;
  year_ago = intnx('year',date,-1,'s');
  format year_ago yymmdd10.;
  rename year_ago = date date=event_date prc=event_prc ;
run;

Now INTERLEAVE this records with the original data.  Remember the values read from the last original observation.  Then when you hit one of these year ago dates output the result.  You probably should add a test to prevent getting a value from more than say 370 days ago.

data want ;
  set have (drop=t in=in1) year_ago;
  by cusip date ;
  retain old_date old_prc ;
  if first.cusip then call missing(of old_date old_prc);
  if in1 then do;
    old_date = date;
    old_prc = prc ;
    format old_date yymmdd10.;
  end;
  if t=1 then do;
    days = event_date - old_date ;
    if (days > 370) then call missing(of old_date old_prc);
    output;
  end;
  drop date prc ;
run;

Results

                                  event_
Obs     CUSIP      event_date      prc      t      old_date    old_prc    days

 1     00030710    2016-02-23    23.2400    1    2015-02-23    32.1000     365
 2     00030710    2017-02-27     7.5900    1    2016-02-26    21.7400     367

KS99
Obsidian | Level 7

Thank you, Tom, 

Your code works way faster than I expected! 

The logic you are using is the one I was thinking. But I had a problem in creating workable codes. Yours are a bit over my head, but I will try to learn them. 

 

I wish you a nice weekend! 

 

Sincerely, 

 

KS -, 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 515 views
  • 0 likes
  • 3 in conversation