BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
gotchj1
Fluorite | Level 6

I'm trying to clean up a large dataset with many stocks. The current dataset i have contains every stock that has ever existed and im trying to cut it down to all the stocks that have ever been in the S&P 500. I have a date column, a ticker column, a closing price column. Not all the stocks in the S&P 500 were added to the index at the same time and some of the stocks were delisted over time. I have currently have all of the historical data on each stock in the S&P 500. For example, Apple was added into the S&P 500 on Nov 30, 1982, but i have data for Apple going back to 1980. Im trying to write a code that doesn't delete the other stock information besides the information from 1980 to Nov 29, 1982 just for Apple. So for example, there will still be information for Amazon from 1970 to the current time, but for Apple the data will start at Nov 30, 1982.

 

So far i was thinking some thing along the lines of this code but it isnt working.

 

DATA CRSP1;
    SET CRSP;
    if Ticker='A' then "10Apr1995"d =< date => "05Jun2000"d;
    else if Ticker='AAPL' then date=>"23Mar2005"d;
    else if Ticker='AMZN' then date=>"08Dec2012"d;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@gotchj1 wrote:

I'm trying to clean up a large dataset with many stocks. The current dataset i have contains every stock that has ever existed and im trying to cut it down to all the stocks that have ever been in the S&P 500. I have a date column, a ticker column, a closing price column. Not all the stocks in the S&P 500 were added to the index at the same time and some of the stocks were delisted over time. I have currently have all of the historical data on each stock in the S&P 500. For example, Apple was added into the S&P 500 on Nov 30, 1982, but i have data for Apple going back to 1980. Im trying to write a code that doesn't delete the other stock information besides the information from 1980 to Nov 29, 1982 just for Apple. So for example, there will still be information for Amazon from 1970 to the current time, but for Apple the data will start at Nov 30, 1982.

 

So far i was thinking some thing along the lines of this code but it isnt working.

 

DATA CRSP1;
    SET CRSP;
    if Ticker='A' then "10Apr1995"d =< date => "05Jun2000"d;
    else if Ticker='AAPL' then date=>"23Mar2005"d;
    else if Ticker='AMZN' then date=>"08Dec2012"d;
run;


The way youo are using "then" above you are attempting to create a value and likely throwing an error or two.

If you want to select observation the code for the above would look like this to use a subsetting IF

DATA CRSP1;
    SET CRSP;
    if (Ticker='A' and (( "10Apr1995"d =< date) or (date => "05Jun2000"d))
    or ( Ticker='AAPL' and date=>"23Mar2005"d)
    or ( Ticker='AMZN' and date=>"08Dec2012"d);
run;

Note that your ""10Apr1995"d =< date => "05Jun2000"d" is probably framed incorrectly and my guess of what you may have intended is just an example of.  The only time that condition would be true is if date => '05Jun2000'd and so includes Apr 1995.

If you intended dates between then the code would be "10Apr1995"d =< date <= "05Jun2000"d.

 

However the suggestion by @HarrySnart is really a much more flexible, simpler to code and likely more robust in the long run.

View solution in original post

3 REPLIES 3
HarrySnart
SAS Employee

Hi @gotchj1 , have you tried doing this as a join in PROC SQL? If you have a reference table with date added and stock name you could try an inner join on your main table where stock = stock and date ge date_added.

ballardw
Super User

@gotchj1 wrote:

I'm trying to clean up a large dataset with many stocks. The current dataset i have contains every stock that has ever existed and im trying to cut it down to all the stocks that have ever been in the S&P 500. I have a date column, a ticker column, a closing price column. Not all the stocks in the S&P 500 were added to the index at the same time and some of the stocks were delisted over time. I have currently have all of the historical data on each stock in the S&P 500. For example, Apple was added into the S&P 500 on Nov 30, 1982, but i have data for Apple going back to 1980. Im trying to write a code that doesn't delete the other stock information besides the information from 1980 to Nov 29, 1982 just for Apple. So for example, there will still be information for Amazon from 1970 to the current time, but for Apple the data will start at Nov 30, 1982.

 

So far i was thinking some thing along the lines of this code but it isnt working.

 

DATA CRSP1;
    SET CRSP;
    if Ticker='A' then "10Apr1995"d =< date => "05Jun2000"d;
    else if Ticker='AAPL' then date=>"23Mar2005"d;
    else if Ticker='AMZN' then date=>"08Dec2012"d;
run;


The way youo are using "then" above you are attempting to create a value and likely throwing an error or two.

If you want to select observation the code for the above would look like this to use a subsetting IF

DATA CRSP1;
    SET CRSP;
    if (Ticker='A' and (( "10Apr1995"d =< date) or (date => "05Jun2000"d))
    or ( Ticker='AAPL' and date=>"23Mar2005"d)
    or ( Ticker='AMZN' and date=>"08Dec2012"d);
run;

Note that your ""10Apr1995"d =< date => "05Jun2000"d" is probably framed incorrectly and my guess of what you may have intended is just an example of.  The only time that condition would be true is if date => '05Jun2000'd and so includes Apr 1995.

If you intended dates between then the code would be "10Apr1995"d =< date <= "05Jun2000"d.

 

However the suggestion by @HarrySnart is really a much more flexible, simpler to code and likely more robust in the long run.

mkeintz
PROC Star

I suspect you'll find the proc sql suggestion most easily employed.

 

But, assuming the stock trading data (dataset STOCK) has variables ticker, date, and other variable of interest, ... and the SP500_LIST dataset has variablea TICKER, ENTRY_DATE, and  EXIT_DATE, then the code below should work.

 

data want;
  set sp500_list (in=insp500 keep=ticker entry_date rename=(entry_date=date))
      stocks (in=indaily) ;
  by ticker date ;
  if first.ticker then call missing(entry_date, exit_date);
  if insp500 then set sp500_list;  /* Retrieve entry_date and exit_date for this ticker*/
  if indaily=1 and entry_date <= date <=exit_date ;
run;

Note a ticker can enter and exit the S&P500 more than once, either because of changing finances of the company, or because TICKER has been re-assigned to different companies.  So SP500_LIST can have multiple obs per ticker.

 

The above code assumes:

  1. STOCKS file is sorted by ticker/date.
  2. SP500_LIST file is sorted by ticker/entry_date

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 651 views
  • 2 likes
  • 4 in conversation