BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Theo_Gh
Obsidian | Level 7

Hi everyone,

 

I have a dataset made up of daily stock prices and daily shares outstanding and I would want to extract the stock prices and shares outstanding on the last day of trading for each fiscal year.

 

Your help will be very much appreciated. Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Nobody is asking for the whole dataset.  A sample of a couple of gvkey's and a couple years would be fine.

 

First, can we nail down this question?  Are you trying to do fiscal year, or calendar year?  Your sample data show no information on fiscal year.

 

HOWEVER, you mention an annual dataset in addition to your daily dataset.  Since you are using compustat data (I recognise gvkey and iid as company and stock issue identifier), your annual file almost certainly has gvkey and datadate as well - where datadate in annual is end-of-fiscal-year. 

 

So let's assume you have two datasets

  1. DAILY, sorted by gvkey and datadate, with about 200 obs per year for each gvkey.  DATADATE is the calendar date of stock trading.
  2. ANNUAL, also sorted by gvkey and datadate, also sorted by gvkey/datadate, with one record per year, where datadate is the end-of-fiscal year.
  3. You want data for last DAILY record matching or just prior to the annual record, merged with data from the annual dataset.

 

data want;
  set daily (in=in_day keep=gvkey datadate)  annual (in=in_annual);
  by gvkey datadate;
  if in_day then set daily (drop=gvkey rename=(datadate=datadate_daily));
  if in_annual;
run;

 

Notes:

  1. The first SET statement has two data sets in it: daily (but with gvkey and datadate only) and annual (with all its vars).
  2. Ordinarily all the daily would be read first, then all the annual.  But the BY statement tells sas to interleave the two data sets, by gvkey and datadate.  Now if daily happens to have an identical gvkey/datadate with annual, the daily record will still precede the annual record, because the daily dataset name is to the left of the annual dataset name.
  3. The IN_DAY and IN_ANNUAL vars are just dummies telling you whether the record-in-hand is from daily or annual.
  4. the "if in_day then set daily" rereads the daily record, this time with all the remaining vars of the daily record.  This is done separately becuase the first SET statement would convert all the daily vars to missing when an annual record is read.  That's why the first SET limits the daily vars exposed to this action.
  5. The "if in_annual" is a subsetting if, so that the data step only output all those vars when the if_annual is true.

 

Caveat: this assumes only on stock issue per gvkey.  If you daily dataset has more than one IID per gvkey, you'll need some adjustment.

 

Editted addition:  Caveat Number 2:  The above assumes that, for each gvkey, the first datadate will be from DAILY.  Otherwise there is a risk of combinining daily data from a prior gvkey with annual data from the current gvkey.  If that is a concern then add these two statement after the "IF IN_ANNUAL":

 

  output;
  call missing (of _all_);
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
Reeza
Super User

Look up BY Group processing. Associate a BY group with the FiscalYear and Date and then use LAST. to extract the last date.

 

If you want code samples, post sample data. 

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n138da4gme3zb7n1nifp...

mkeintz
PROC Star

What does your dataset look like?  It must have a date variable, per your description.  Does it also already have a fiscal year variable?  If so, then you can implement by-group processing as documented in the link provided by @Reeza

 

If it doesn't already have a fiscal year var, what data does it have to determine fiscal year?  (I have dealt with dataset that have a fiscal-year-end-month variable,  i.e. 5 means May is the last month of each fiscal year, 12 means december, etc.).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Theo_Gh
Obsidian | Level 7

The dataset is too large to upload here but it looks like this :

 Date                     close price   s/oustanding

 

2 Jan 2014                   8                245

.

.

31 Dec 2014                 9               340

2 Jan 2015                   10              500

.

.

31 Dec 2015                7                 450

If I use this code:

data wanted; set dataset;

by month non sorted

if last.month;

run;

That would not give me data on 31 Dec 2014 and 2015, would it ?

Reeza
Super User

What if Dec 31 is a Saturday? You don't have values for every day. 

 

The gist of code is correct but don't use the NOTSORTED option if that's what you intended, your code appears garbled. 

mkeintz
PROC Star

You originally wrote fiscal year, but you are not showing any data that allows the determination of fiscal year, only calendar year.

 

Your example does not show a variable named month, yet your program uses such a variable.  How about a sample of the actual data, with the actual vars?

 

If you really only have a date var and you want calendar year-end and your data is sorted chronologically, then

 

data end_of_year;

  merge have  have (firstobs=2 keep=date rename=(date=next_date));

  if year(date)^=year(next_date);

run;

 

Note this will keep the last record of have, no matter what part of the year it is from.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

Here is an example using dataset sashelp.stocks, which is a monthly dataset sorted by ascending stock (IBM, Intel, Microsoft) and then descending chronological order of date.

proc sort data=sashelp.stocks out=mystocks;
  by stock date;
run;

data end_of_year;
  set mystocks;
  by stock;

  merge mystocks (keep=date) 
        mystocks (firstobs=2 keep=date rename=(date=next_date));
  if last.stock or year(next_date)>year(date);
run;

The SET statement followed by a BY statement tells sas to establish dummy vars first.stock and last.stock to indicate the status of the record-in-hand.  The MERGE statement (unaccompanied by a BY statement) allows the current date to be compared to the next date (i.e. a 1-period "lead") to determine change of year.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
user24feb
Barite | Level 11

I think you are thinking in the wrong direction. You probably only need proc means.

 

* .. imagine these are your data;
Data A (Drop=i);
  i=1;
  Format Date Date9.;
  Do While (i<=1500);
    Date='01JAN2010'd+i;
    Price=Round(Rannor(1)*5+100,0.01);
	Shares=Round(Ranuni(1)*1000,1);
	i=i+Ceil(Ranuni(1)*2+1); * .. want the time series to be intermittent;
	Output;
  End;
Run;

* .. then use a auxiliary variable ;
Data A;
  Set A;
  Year=Year(Date);
Run;

Proc Means Data=A NoPrint;
  By Year;
  Var Date;
  Output Out=A_Result (Drop=_: Year)
    MaxID(Date(Price) Date(Shares))=Price Shares
	Max(Date)=;
Run;
Theo_Gh
Obsidian | Level 7
Sorry, everyone. This is a sample of my dataset; the whole dataset is obviously too large to upload here. As you can see the only date variable I have is the data date. I have another dataset which is annual and has gvkey as an identifier with which I will merge this data after the extraction. I don't know if data date is an appropriate date but that's all I have as you can see.
mkeintz
PROC Star

Nobody is asking for the whole dataset.  A sample of a couple of gvkey's and a couple years would be fine.

 

First, can we nail down this question?  Are you trying to do fiscal year, or calendar year?  Your sample data show no information on fiscal year.

 

HOWEVER, you mention an annual dataset in addition to your daily dataset.  Since you are using compustat data (I recognise gvkey and iid as company and stock issue identifier), your annual file almost certainly has gvkey and datadate as well - where datadate in annual is end-of-fiscal-year. 

 

So let's assume you have two datasets

  1. DAILY, sorted by gvkey and datadate, with about 200 obs per year for each gvkey.  DATADATE is the calendar date of stock trading.
  2. ANNUAL, also sorted by gvkey and datadate, also sorted by gvkey/datadate, with one record per year, where datadate is the end-of-fiscal year.
  3. You want data for last DAILY record matching or just prior to the annual record, merged with data from the annual dataset.

 

data want;
  set daily (in=in_day keep=gvkey datadate)  annual (in=in_annual);
  by gvkey datadate;
  if in_day then set daily (drop=gvkey rename=(datadate=datadate_daily));
  if in_annual;
run;

 

Notes:

  1. The first SET statement has two data sets in it: daily (but with gvkey and datadate only) and annual (with all its vars).
  2. Ordinarily all the daily would be read first, then all the annual.  But the BY statement tells sas to interleave the two data sets, by gvkey and datadate.  Now if daily happens to have an identical gvkey/datadate with annual, the daily record will still precede the annual record, because the daily dataset name is to the left of the annual dataset name.
  3. The IN_DAY and IN_ANNUAL vars are just dummies telling you whether the record-in-hand is from daily or annual.
  4. the "if in_day then set daily" rereads the daily record, this time with all the remaining vars of the daily record.  This is done separately becuase the first SET statement would convert all the daily vars to missing when an annual record is read.  That's why the first SET limits the daily vars exposed to this action.
  5. The "if in_annual" is a subsetting if, so that the data step only output all those vars when the if_annual is true.

 

Caveat: this assumes only on stock issue per gvkey.  If you daily dataset has more than one IID per gvkey, you'll need some adjustment.

 

Editted addition:  Caveat Number 2:  The above assumes that, for each gvkey, the first datadate will be from DAILY.  Otherwise there is a risk of combinining daily data from a prior gvkey with annual data from the current gvkey.  If that is a concern then add these two statement after the "IF IN_ANNUAL":

 

  output;
  call missing (of _all_);
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Theo_Gh
Obsidian | Level 7
Thank you so much for your patience.
I apologize that I'm not always able to make my questions clear and straightforward.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 4972 views
  • 0 likes
  • 4 in conversation