Extracting data for last day of the year

Accepted Solution Solved
Reply
Contributor
Posts: 33
Accepted Solution

Extracting data for last day of the year

Hi everyone,

 

I have a dataset made up of daily stock prices and daily shares outstanding and I would want to extract the stock prices and shares outstanding on the last day of trading for each fiscal year.

 

Your help will be very much appreciated. Thanks


Accepted Solutions
Solution
‎02-28-2017 11:37 PM
Super User
Posts: 787

Re: Extracting data for last day of the year

[ Edited ]

Nobody is asking for the whole dataset.  A sample of a couple of gvkey's and a couple years would be fine.

 

First, can we nail down this question?  Are you trying to do fiscal year, or calendar year?  Your sample data show no information on fiscal year.

 

HOWEVER, you mention an annual dataset in addition to your daily dataset.  Since you are using compustat data (I recognise gvkey and iid as company and stock issue identifier), your annual file almost certainly has gvkey and datadate as well - where datadate in annual is end-of-fiscal-year. 

 

So let's assume you have two datasets

  1. DAILY, sorted by gvkey and datadate, with about 200 obs per year for each gvkey.  DATADATE is the calendar date of stock trading.
  2. ANNUAL, also sorted by gvkey and datadate, also sorted by gvkey/datadate, with one record per year, where datadate is the end-of-fiscal year.
  3. You want data for last DAILY record matching or just prior to the annual record, merged with data from the annual dataset.

 

data want;
  set daily (in=in_day keep=gvkey datadate)  annual (in=in_annual);
  by gvkey datadate;
  if in_day then set daily (drop=gvkey rename=(datadate=datadate_daily));
  if in_annual;
run;

 

Notes:

  1. The first SET statement has two data sets in it: daily (but with gvkey and datadate only) and annual (with all its vars).
  2. Ordinarily all the daily would be read first, then all the annual.  But the BY statement tells sas to interleave the two data sets, by gvkey and datadate.  Now if daily happens to have an identical gvkey/datadate with annual, the daily record will still precede the annual record, because the daily dataset name is to the left of the annual dataset name.
  3. The IN_DAY and IN_ANNUAL vars are just dummies telling you whether the record-in-hand is from daily or annual.
  4. the "if in_day then set daily" rereads the daily record, this time with all the remaining vars of the daily record.  This is done separately becuase the first SET statement would convert all the daily vars to missing when an annual record is read.  That's why the first SET limits the daily vars exposed to this action.
  5. The "if in_annual" is a subsetting if, so that the data step only output all those vars when the if_annual is true.

 

Caveat: this assumes only on stock issue per gvkey.  If you daily dataset has more than one IID per gvkey, you'll need some adjustment.

 

Editted addition:  Caveat Number 2:  The above assumes that, for each gvkey, the first datadate will be from DAILY.  Otherwise there is a risk of combinining daily data from a prior gvkey with annual data from the current gvkey.  If that is a concern then add these two statement after the "IF IN_ANNUAL":

 

  output;
  call missing (of _all_);

View solution in original post


All Replies
Grand Advisor
Posts: 17,308

Re: Extracting data for last day of the year

Look up BY Group processing. Associate a BY group with the FiscalYear and Date and then use LAST. to extract the last date.

 

If you want code samples, post sample data. 

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n138da4gme3zb7n1nifp...

Super User
Posts: 787

Re: Extracting data for last day of the year

What does your dataset look like?  It must have a date variable, per your description.  Does it also already have a fiscal year variable?  If so, then you can implement by-group processing as documented in the link provided by @Reeza

 

If it doesn't already have a fiscal year var, what data does it have to determine fiscal year?  (I have dealt with dataset that have a fiscal-year-end-month variable,  i.e. 5 means May is the last month of each fiscal year, 12 means december, etc.).

Contributor
Posts: 33

Re: Extracting data for last day of the year

The dataset is too large to upload here but it looks like this :

 Date                     close price   s/oustanding

 

2 Jan 2014                   8                245

.

.

31 Dec 2014                 9               340

2 Jan 2015                   10              500

.

.

31 Dec 2015                7                 450

If I use this code:

data wanted; set dataset;

by month non sorted

if last.month;

run;

That would not give me data on 31 Dec 2014 and 2015, would it ?

Grand Advisor
Posts: 17,308

Re: Extracting data for last day of the year

What if Dec 31 is a Saturday? You don't have values for every day. 

 

The gist of code is correct but don't use the NOTSORTED option if that's what you intended, your code appears garbled. 

Super User
Posts: 787

Re: Extracting data for last day of the year

You originally wrote fiscal year, but you are not showing any data that allows the determination of fiscal year, only calendar year.

 

Your example does not show a variable named month, yet your program uses such a variable.  How about a sample of the actual data, with the actual vars?

 

If you really only have a date var and you want calendar year-end and your data is sorted chronologically, then

 

data end_of_year;

  merge have  have (firstobs=2 keep=date rename=(date=next_date));

  if year(date)^=year(next_date);

run;

 

Note this will keep the last record of have, no matter what part of the year it is from.

Super User
Posts: 787

Re: Extracting data for last day of the year

Here is an example using dataset sashelp.stocks, which is a monthly dataset sorted by ascending stock (IBM, Intel, Microsoft) and then descending chronological order of date.

proc sort data=sashelp.stocks out=mystocks;
  by stock date;
run;

data end_of_year;
  set mystocks;
  by stock;

  merge mystocks (keep=date) 
        mystocks (firstobs=2 keep=date rename=(date=next_date));
  if last.stock or year(next_date)>year(date);
run;

The SET statement followed by a BY statement tells sas to establish dummy vars first.stock and last.stock to indicate the status of the record-in-hand.  The MERGE statement (unaccompanied by a BY statement) allows the current date to be compared to the next date (i.e. a 1-period "lead") to determine change of year.

 

Super Contributor
Posts: 336

Re: Extracting data for last day of the year

[ Edited ]

I think you are thinking in the wrong direction. You probably only need proc means.

 

* .. imagine these are your data;
Data A (Drop=i);
  i=1;
  Format Date Date9.;
  Do While (i<=1500);
    Date='01JAN2010'd+i;
    Price=Round(Rannor(1)*5+100,0.01);
	Shares=Round(Ranuni(1)*1000,1);
	i=i+Ceil(Ranuni(1)*2+1); * .. want the time series to be intermittent;
	Output;
  End;
Run;

* .. then use a auxiliary variable ;
Data A;
  Set A;
  Year=Year(Date);
Run;

Proc Means Data=A NoPrint;
  By Year;
  Var Date;
  Output Out=A_Result (Drop=_: Year)
    MaxID(Date(Price) Date(Shares))=Price Shares
	Max(Date)=;
Run;
Contributor
Posts: 33

Re: Extracting data for last day of the year

Sorry, everyone. This is a sample of my dataset; the whole dataset is obviously too large to upload here. As you can see the only date variable I have is the data date. I have another dataset which is annual and has gvkey as an identifier with which I will merge this data after the extraction. I don't know if data date is an appropriate date but that's all I have as you can see.
Solution
‎02-28-2017 11:37 PM
Super User
Posts: 787

Re: Extracting data for last day of the year

[ Edited ]

Nobody is asking for the whole dataset.  A sample of a couple of gvkey's and a couple years would be fine.

 

First, can we nail down this question?  Are you trying to do fiscal year, or calendar year?  Your sample data show no information on fiscal year.

 

HOWEVER, you mention an annual dataset in addition to your daily dataset.  Since you are using compustat data (I recognise gvkey and iid as company and stock issue identifier), your annual file almost certainly has gvkey and datadate as well - where datadate in annual is end-of-fiscal-year. 

 

So let's assume you have two datasets

  1. DAILY, sorted by gvkey and datadate, with about 200 obs per year for each gvkey.  DATADATE is the calendar date of stock trading.
  2. ANNUAL, also sorted by gvkey and datadate, also sorted by gvkey/datadate, with one record per year, where datadate is the end-of-fiscal year.
  3. You want data for last DAILY record matching or just prior to the annual record, merged with data from the annual dataset.

 

data want;
  set daily (in=in_day keep=gvkey datadate)  annual (in=in_annual);
  by gvkey datadate;
  if in_day then set daily (drop=gvkey rename=(datadate=datadate_daily));
  if in_annual;
run;

 

Notes:

  1. The first SET statement has two data sets in it: daily (but with gvkey and datadate only) and annual (with all its vars).
  2. Ordinarily all the daily would be read first, then all the annual.  But the BY statement tells sas to interleave the two data sets, by gvkey and datadate.  Now if daily happens to have an identical gvkey/datadate with annual, the daily record will still precede the annual record, because the daily dataset name is to the left of the annual dataset name.
  3. The IN_DAY and IN_ANNUAL vars are just dummies telling you whether the record-in-hand is from daily or annual.
  4. the "if in_day then set daily" rereads the daily record, this time with all the remaining vars of the daily record.  This is done separately becuase the first SET statement would convert all the daily vars to missing when an annual record is read.  That's why the first SET limits the daily vars exposed to this action.
  5. The "if in_annual" is a subsetting if, so that the data step only output all those vars when the if_annual is true.

 

Caveat: this assumes only on stock issue per gvkey.  If you daily dataset has more than one IID per gvkey, you'll need some adjustment.

 

Editted addition:  Caveat Number 2:  The above assumes that, for each gvkey, the first datadate will be from DAILY.  Otherwise there is a risk of combinining daily data from a prior gvkey with annual data from the current gvkey.  If that is a concern then add these two statement after the "IF IN_ANNUAL":

 

  output;
  call missing (of _all_);
Contributor
Posts: 33

Re: Extracting data for last day of the year

Thank you so much for your patience.
I apologize that I'm not always able to make my questions clear and straightforward.
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 153 views
  • 0 likes
  • 4 in conversation