BookmarkSubscribeRSS Feed
somebody
Lapis Lazuli | Level 10

I have a dataset of stock and their active periods. There are 3 variables: STOCK, START_DATE and END_DATE. I also have another dataset of trading dates ( i.e. there is only 1 variable DATE).

What I would like to create is a dataset that contains 2 variables: STOCK and DATE where DATE are trading dates between the START_DATE and END_DATE. 

For example, if the original data set has a row about stock ABC and START_DATE = 20150101 and END_DATE = 20150131 then the new data set would be like this 

STOCK	DATE
ABC	20150101
ABC	20150102
ABC	20150103
ABC	20150104
.....
.....
.....
ABC	20150131

What would be the best way to achieve this? One way that I can think of is to First create all dates for each stock, and then remove obs where date is outside of the start and end date. But this might not be sufficient. Is there a better way?

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

There is only one obs for each stock in your first data set, right?

 

Use this as a template

 

data have;
input stock $ (startdate enddate)(:anydtdte.);
datalines;
abc 20150101 20150131
def 20160401 20160430
hij 20170801 20170831
;

data want(keep=stock date);
    set have;
    do date=startdate to enddate;
        output;
    end;
    format startdate enddate date date9.;
run;
somebody
Lapis Lazuli | Level 10

One stock could have multiple obs to indicates their active periods.

somebody
Lapis Lazuli | Level 10

I just tested your code. It works even with multiple obs for stock. One is issue is it creates all dates whereas I only want trading dates. This can easily be fixed by another dataset that merges want and tradingdates dataset. cheers

mkeintz
PROC Star

Here's an approach using some sample data:  

 

 

data have;
input stock $ (startdate enddate)(:anydtdte.);
  format startdate enddate date9.;
datalines;
abc 20160701 20161130
def 20160601 20161231
hij 20160801 20160831
run;
data trdates;
  input date :yymmdd8. @@;
  format date date9. ;
datalines;
 20160601  20160602  20160603  20160606  20160607  20160608  20160609  20160610
 20160613  20160614  20160615  20160616  20160617  20160620  20160621  20160622
 20160623  20160624  20160627  20160628  20160629  20160630  20160701  20160705
 20160706  20160707  20160708  20160711  20160712  20160713  20160714  20160715
 20160718  20160719  20160720  20160721  20160722  20160725  20160726  20160727
 20160728  20160729  20160801  20160802  20160803  20160804  20160805  20160808
 20160809  20160810  20160811  20160812  20160815  20160816  20160817  20160818
 20160819  20160822  20160823  20160824  20160825  20160826  20160829  20160830
 20160831  20160901  20160902  20160906  20160907  20160908  20160909  20160912
 20160913  20160914  20160915  20160916  20160919  20160920  20160921  20160922
 20160923  20160926  20160927  20160928  20160929  20160930  20161003  20161004
 20161005  20161006  20161007  20161010  20161011  20161012  20161013  20161014
 20161017  20161018  20161019  20161020  20161021  20161024  20161025  20161026
 20161027  20161028  20161031  20161101  20161102  20161103  20161104  20161107
 20161108  20161109  20161110  20161111  20161114  20161115  20161116  20161117
 20161118  20161121  20161122  20161123  20161125  20161128  20161129  20161130
 20161201  20161202  20161205  20161206  20161207  20161208  20161209  20161212
 20161213  20161214  20161215  20161216  20161219  20161220  20161221  20161222
 20161223  20161227  20161228  20161229  20161230
run;

%let min_trdate=01jun2016;                         /* Make earliest sample date macrovar in user-friendly format */
%let lobound=%sysfunc(inputn(&min_trdate,date9.)); /* Convert to a numeric SAS date value, also in a macrovar */
%let max_trdate=31dec2016;                        
%let hibound=%sysfunc(inputn(&max_trdate,date9.));
%let size=%eval(1+&hibound-&lobound);              /* Get number of calendare from min_trdate to max_trdate */
%put _user_;

data want ;
  array dt_dummies {&lobound:&hibound} $1 _temporary_ (&size*'N');  /* Make array indexed by the sample calendar date range */

  if _n_=1 then do until (end_of_trdates);    /* Read all the trade dates and mark corresponding array element as 'Y'       */
    set trdates end=end_of_trdates;           
    dt_dummies{date}='Y';
  end;
  set have;
  do date=startdate to enddate;
    if dt_dummies{date}='Y' then output;
  end;
run;
  1. At the start of the final data step, the program reads in all the trade dates from the TRDATES data set, and populates the corresponding element of the DT_DUMMIES array with a 'Y'  (otherwise they default to 'N').  So this array is a sequence of Y's and N's.
  2. The array is indexed by the calendar date value (from 01jun2016 through 31dec2016 in my example.
  3. Each STOCK record is read and the program checks array elements corresponding to the date range for a "Y".
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
somebody
Lapis Lazuli | Level 10

Thanks. This works well, even there are multiple obs for 1 stock. I just have 1 small problem with the format of dates. I prefer the format YYMMDD10 so I modified your code a bit but I got an error. 

data have;
input stock $ (startdate enddate)(:anydtdte.);
  format startdate enddate YYMMDD10.;
datalines;
abc 20160701 20160710
abc 20160720 20160730
def 20160601 20161231
hij 20160801 20160831
run;

data trdates;
  input date :yymmdd8. @@;
  format date YYMMDD10. ;
datalines;
 20160601  20160602  20160603  20160606  20160607  20160608  20160609  20160610
 20160613  20160614  20160615  20160616  20160617  20160620  20160621  20160622
 20160623  20160624  20160627  20160628  20160629  20160630  20160701  20160705
 20160706  20160707  20160708  20160711  20160712  20160713  20160714  20160715
 20160718  20160719  20160720  20160721  20160722  20160725  20160726  20160727
 20160728  20160729  20160801  20160802  20160803  20160804  20160805  20160808
 20160809  20160810  20160811  20160812  20160815  20160816  20160817  20160818
 20160819  20160822  20160823  20160824  20160825  20160826  20160829  20160830
 20160831  20160901  20160902  20160906  20160907  20160908  20160909  20160912
 20160913  20160914  20160915  20160916  20160919  20160920  20160921  20160922
 20160923  20160926  20160927  20160928  20160929  20160930  20161003  20161004
 20161005  20161006  20161007  20161010  20161011  20161012  20161013  20161014
 20161017  20161018  20161019  20161020  20161021  20161024  20161025  20161026
 20161027  20161028  20161031  20161101  20161102  20161103  20161104  20161107
 20161108  20161109  20161110  20161111  20161114  20161115  20161116  20161117
 20161118  20161121  20161122  20161123  20161125  20161128  20161129  20161130
 20161201  20161202  20161205  20161206  20161207  20161208  20161209  20161212
 20161213  20161214  20161215  20161216  20161219  20161220  20161221  20161222
 20161223  20161227  20161228  20161229  20161230
run;

%let min_trdate=2016-07-01;                         /* Make earliest sample date macrovar in user-friendly format */
%let lobound=%sysfunc(inputn(&min_trdate,YYMMDD10.)); /* Convert to a numeric SAS date value, also in a macrovar */
%let max_trdate=2016-12-31;                        
%let hibound=%sysfunc(inputn(&max_trdate,YYMMDD10.));
%let size=%eval(1+&hibound-&lobound);              /* Get number of calendare from min_trdate to max_trdate */
%put _user_;


data want ;
	* Make array indexed by the sample calendar date range;
	array dt_dummies {&lobound:&hibound} $1 _temporary_ (&size*'N'); 
	* Read all the trade dates and mark corresponding array element as 'Y';
	if _n_=1 then do until (end_of_trdates);    
		set trdates end=end_of_trdates;           
		dt_dummies{date}='Y';
	end;
	set have;
		do date=startdate to enddate;
		if dt_dummies{date}='Y' then output;
	end;
run;

ERROR: Array subscript out of range at line 289 column 9.
end_of_trdates=0 date=2016-06-01 stock=  startdate=. enddate=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2 observations read from the data set WORK.TRDATES.
WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0
         observations and 4 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.00 seconds

Do you have any idea on how to fix this?

My next step is to put these codes into a macro that automates this procedure. the input datasets would be the HAVE, and trading_dates dataset. 

What this macro would do is automate your macrovar declaration step ( the %let min_trade=.......... part). I can get the first and last obs of trading_dates dataset but how do I let lobound = the first obs and hibound= the last obs?

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 3238 views
  • 1 like
  • 3 in conversation