I have a dataset of stock and their active periods. There are 3 variables: STOCK, START_DATE and END_DATE. I also have another dataset of trading dates ( i.e. there is only 1 variable DATE).
What I would like to create is a dataset that contains 2 variables: STOCK and DATE where DATE are trading dates between the START_DATE and END_DATE.
For example, if the original data set has a row about stock ABC and START_DATE = 20150101 and END_DATE = 20150131 then the new data set would be like this
STOCK DATE
ABC 20150101
ABC 20150102
ABC 20150103
ABC 20150104
.....
.....
.....
ABC 20150131
What would be the best way to achieve this? One way that I can think of is to First create all dates for each stock, and then remove obs where date is outside of the start and end date. But this might not be sufficient. Is there a better way?
There is only one obs for each stock in your first data set, right?
Use this as a template
data have;
input stock $ (startdate enddate)(:anydtdte.);
datalines;
abc 20150101 20150131
def 20160401 20160430
hij 20170801 20170831
;
data want(keep=stock date);
set have;
do date=startdate to enddate;
output;
end;
format startdate enddate date date9.;
run;
One stock could have multiple obs to indicates their active periods.
I just tested your code. It works even with multiple obs for stock. One is issue is it creates all dates whereas I only want trading dates. This can easily be fixed by another dataset that merges want and tradingdates dataset. cheers
Here's an approach using some sample data:
data have;
input stock $ (startdate enddate)(:anydtdte.);
format startdate enddate date9.;
datalines;
abc 20160701 20161130
def 20160601 20161231
hij 20160801 20160831
run;
data trdates;
input date :yymmdd8. @@;
format date date9. ;
datalines;
20160601 20160602 20160603 20160606 20160607 20160608 20160609 20160610
20160613 20160614 20160615 20160616 20160617 20160620 20160621 20160622
20160623 20160624 20160627 20160628 20160629 20160630 20160701 20160705
20160706 20160707 20160708 20160711 20160712 20160713 20160714 20160715
20160718 20160719 20160720 20160721 20160722 20160725 20160726 20160727
20160728 20160729 20160801 20160802 20160803 20160804 20160805 20160808
20160809 20160810 20160811 20160812 20160815 20160816 20160817 20160818
20160819 20160822 20160823 20160824 20160825 20160826 20160829 20160830
20160831 20160901 20160902 20160906 20160907 20160908 20160909 20160912
20160913 20160914 20160915 20160916 20160919 20160920 20160921 20160922
20160923 20160926 20160927 20160928 20160929 20160930 20161003 20161004
20161005 20161006 20161007 20161010 20161011 20161012 20161013 20161014
20161017 20161018 20161019 20161020 20161021 20161024 20161025 20161026
20161027 20161028 20161031 20161101 20161102 20161103 20161104 20161107
20161108 20161109 20161110 20161111 20161114 20161115 20161116 20161117
20161118 20161121 20161122 20161123 20161125 20161128 20161129 20161130
20161201 20161202 20161205 20161206 20161207 20161208 20161209 20161212
20161213 20161214 20161215 20161216 20161219 20161220 20161221 20161222
20161223 20161227 20161228 20161229 20161230
run;
%let min_trdate=01jun2016; /* Make earliest sample date macrovar in user-friendly format */
%let lobound=%sysfunc(inputn(&min_trdate,date9.)); /* Convert to a numeric SAS date value, also in a macrovar */
%let max_trdate=31dec2016;
%let hibound=%sysfunc(inputn(&max_trdate,date9.));
%let size=%eval(1+&hibound-&lobound); /* Get number of calendare from min_trdate to max_trdate */
%put _user_;
data want ;
array dt_dummies {&lobound:&hibound} $1 _temporary_ (&size*'N'); /* Make array indexed by the sample calendar date range */
if _n_=1 then do until (end_of_trdates); /* Read all the trade dates and mark corresponding array element as 'Y' */
set trdates end=end_of_trdates;
dt_dummies{date}='Y';
end;
set have;
do date=startdate to enddate;
if dt_dummies{date}='Y' then output;
end;
run;
Thanks. This works well, even there are multiple obs for 1 stock. I just have 1 small problem with the format of dates. I prefer the format YYMMDD10 so I modified your code a bit but I got an error.
data have;
input stock $ (startdate enddate)(:anydtdte.);
format startdate enddate YYMMDD10.;
datalines;
abc 20160701 20160710
abc 20160720 20160730
def 20160601 20161231
hij 20160801 20160831
run;
data trdates;
input date :yymmdd8. @@;
format date YYMMDD10. ;
datalines;
20160601 20160602 20160603 20160606 20160607 20160608 20160609 20160610
20160613 20160614 20160615 20160616 20160617 20160620 20160621 20160622
20160623 20160624 20160627 20160628 20160629 20160630 20160701 20160705
20160706 20160707 20160708 20160711 20160712 20160713 20160714 20160715
20160718 20160719 20160720 20160721 20160722 20160725 20160726 20160727
20160728 20160729 20160801 20160802 20160803 20160804 20160805 20160808
20160809 20160810 20160811 20160812 20160815 20160816 20160817 20160818
20160819 20160822 20160823 20160824 20160825 20160826 20160829 20160830
20160831 20160901 20160902 20160906 20160907 20160908 20160909 20160912
20160913 20160914 20160915 20160916 20160919 20160920 20160921 20160922
20160923 20160926 20160927 20160928 20160929 20160930 20161003 20161004
20161005 20161006 20161007 20161010 20161011 20161012 20161013 20161014
20161017 20161018 20161019 20161020 20161021 20161024 20161025 20161026
20161027 20161028 20161031 20161101 20161102 20161103 20161104 20161107
20161108 20161109 20161110 20161111 20161114 20161115 20161116 20161117
20161118 20161121 20161122 20161123 20161125 20161128 20161129 20161130
20161201 20161202 20161205 20161206 20161207 20161208 20161209 20161212
20161213 20161214 20161215 20161216 20161219 20161220 20161221 20161222
20161223 20161227 20161228 20161229 20161230
run;
%let min_trdate=2016-07-01; /* Make earliest sample date macrovar in user-friendly format */
%let lobound=%sysfunc(inputn(&min_trdate,YYMMDD10.)); /* Convert to a numeric SAS date value, also in a macrovar */
%let max_trdate=2016-12-31;
%let hibound=%sysfunc(inputn(&max_trdate,YYMMDD10.));
%let size=%eval(1+&hibound-&lobound); /* Get number of calendare from min_trdate to max_trdate */
%put _user_;
data want ;
* Make array indexed by the sample calendar date range;
array dt_dummies {&lobound:&hibound} $1 _temporary_ (&size*'N');
* Read all the trade dates and mark corresponding array element as 'Y';
if _n_=1 then do until (end_of_trdates);
set trdates end=end_of_trdates;
dt_dummies{date}='Y';
end;
set have;
do date=startdate to enddate;
if dt_dummies{date}='Y' then output;
end;
run;
ERROR: Array subscript out of range at line 289 column 9.
end_of_trdates=0 date=2016-06-01 stock= startdate=. enddate=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2 observations read from the data set WORK.TRDATES.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0
observations and 4 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
Do you have any idea on how to fix this?
My next step is to put these codes into a macro that automates this procedure. the input datasets would be the HAVE, and trading_dates dataset.
What this macro would do is automate your macrovar declaration step ( the %let min_trade=.......... part). I can get the first and last obs of trading_dates dataset but how do I let lobound = the first obs and hibound= the last obs?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.