BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
phillylee
Calcite | Level 5

I have written code to run through data for specified dates. The problem I have is that I have to repeat the same procedure for multiple dates. My understanding is I can create a loop in which SAS will iterate the same procedure and generate output for multiple dates. Below is the subset of the code I have written.


   

/* STEP 1: RETRIEVE DAILY TRADE AND QUOTE (DTAQ) FILES */
    libname ct '/wrds/nyse/sasdata/taqms/ct';

    /* Retrieve Trade data */
    data DailyTrade;

        /* Enter Trade file names in YYYYMMDD format for the same dates */
        set ct.ctm_20140425 ct.ctm_20140426;

	/* Enter the same company tickers as above */
        where sym_root in ('AA','AAL')
		
        /* Retrieve trades during normal market hours */
        and (("7:00:00.000000000"t) <= time_m <= ("20:00:00.000000000"t));
        type='T';
        format date date9.;
        format time_m part_time trf_time TIME13.;
    run;

/* STEP 2: CLEAN DAILY TRADES DATA - DELETE ABNORMAL TRADES */

data trade2;
    set DailyTrade;
    where Tr_Corr eq '00' and price gt 0;
	drop Tr_Corr Tr_Source TR_RF Part_Time RRN TRF_Time Sym_Suffix Tr_SCond 
         Tr_StopInd;
run;

/* STEP 3: remove seconds/miliseconds */

data trade2;
	set trade2;
	rtime = time_m - mod(time_m,60);
	format rtime: time13.;
run;

/* STEP 4: calculate simple average by minute */

proc means data=trade2 noprint;
by date rtime sym_root;
var price;
output out=simple_avg(drop=_type_ _freq_) mean=simple_mean;
run;

data trade2;
merge trade2 simple_avg;
by date rtime sym_root;
run;
	
/* STEP 5: calculate weighted average by minute */

proc summary data=trade2 noprint;
by date rtime sym_root;
var price;
weight size;
output out=weight_avg(drop=_type_ _freq_) mean=weight_mean;
run;


data trade2;
merge trade2 weight_avg;
by date rtime sym_root;
run;

/* STEP 6: duplicates drop */

proc sort data=trade2
	out=trade3
	nodupkeys;
	by date rtime sym_root;
run;

/* STEP 7: 5min (15m, 30m) -interval return computation */
data temp;
	set trade3;
run;


proc sql;
	create table ret5 as
		select *,(((select weight_mean from temp where date=a.date and sym_root=a.sym_root and rtime-300=a.rtime)-weight_mean)/weight_mean) as ret5, date, sym_root, rtime
			from temp as a;
quit;

data trade3;
merge trade3 ret5;
by date rtime sym_root;
run;

proc sql;
	create table ret15 as
		select *,(((select weight_mean from temp where date=a.date and sym_root=a.sym_root and rtime-900=a.rtime)-weight_mean)/weight_mean) as ret15, date, sym_root, rtime
			from temp as a;
quit;

data trade3;
merge trade3 ret15;
by date rtime sym_root;
run;

proc sql;
	create table ret30 as
		select *,(((select weight_mean from temp where date=a.date and sym_root=a.sym_root and rtime-1800=a.rtime)-weight_mean)/weight_mean) as ret30, date, sym_root, rtime
			from temp as a;
quit;

data trade3;
merge trade3 ret30;
by date rtime sym_root;
run;

data project.DailyTrade;
	set trade3;
run; 

If you look at the code, I have specified the dates of interest. Can anyone help me loop through the dates for multiple dates?

 

*I have dates of interest in my excel file.

 

**I want individual output (DailyTrade) for each date.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I see two issues here.  One which you have stated, and another in which you can generate a much more efficient program.

 

You are using the NYSE-supplied trade-and-quotes data, with time stamps (currently) to the nanosecond.

 

  1. Multiple dates.

    The CTM datasets all have the name format of CTM_yyyymmdd.  So, if you want all the trading dates for April 2014, instead of listing all the datasets individually in a SET statement you could use

    SET ct.ctm_201404: ;

    The trailing colon tells sas to use all datasets whose names begin with CTM_201404.

    BUT be careful about specifying, say, more than a month of datasets.  That's because SAS will generate a memory buffer for each dataset.
  2. Efficiency issues
    1. Before I give advice here, I see that you have many separate steps that can be combined.  Once you start using a lot of dates, you will be repeatedly writing a lot of data to disk, only to read it back in.  Many of these steps can be consolidated.

      In fact, why are you merging means for each RTIME back with all the trades with that time-stamp only to de-duplicate later?
    2. Are you trying to generate 5, 10, and 15 minutes weighted and unweighted rolling window returns?  (or are you generating non-overlapping discrete window returns)?  Since the data are sorted by TIME, this can be done in a data step, avoiding much excess computation and record matching via proc sql.  If you clarify the data you want, we can likely improve performance by orders of magnitude - which would be very important if you start looking at portfolios of more than 2 stocks.
  3. Research issue.   What if there are no trades for a given equity for a particular minute (or worse, for a 5-minute interval?)  If there are no trades for, say,  13:44,  do you just assume that the simple and weighted mean price is inherited from 13:43?   This matters if your are doing rolling 5-minute windows.

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

11 REPLIES 11
PeterClemmensen
Tourmaline | Level 20

Hi and welcome to the SAS Communities 🙂

 

So how many dates do you want to loop over? And are they consecutive dates? 

 

Also, do you want the resulting data to be in a single data set or one data set for each date?

phillylee
Calcite | Level 5

Yes. I needed to be more specific. So I have all the dates in an excel format. If you know a way to load the dates in the excel onto my code and run it, it would be the best!

 

And I would rather have individual data output for each day. 

PaigeMiller
Diamond | Level 26

You don't specify where the multiple dates are coming from. That makes a difference in how to write the program. Are the multiple dates in an Excel file or a text file? Are they consecutive dates? Are they consecutive Tuesdays?

--
Paige Miller
phillylee
Calcite | Level 5

Yes. I needed to be more specific. So I have all the dates in an excel format. If you know a way to load the dates in the excel onto my code and run it, it would be the best!

 

PeterClemmensen
Tourmaline | Level 20

I can't see your data, so needless to say this code is untested. You can do something like this to create a data set for each date though.

 

data callstack;
   length string $500;
   do dt='01jan2014'd to '31dec2018'd;
   string=compbl(cats(
      "
       data DailyTrade_", put(dt, yymmddn8.), ";

           set ct.ctm_", put(dt, yymmddn8.), ";

           where sym_root in ('AA')

           and (('7:00:00.000000000't) <= time_m <= ('20:00:00.000000000't));
           type='T';
           format date date9.;
           format time_m part_time trf_time TIME13.;
       run;

      "
      ));
   output;
   call execute(string);
   end;
run;
phillylee
Calcite | Level 5

I tried with you code, but it gives me 0 observations. This is because retrieving data from each date worked, but SAS did not know what to do with separate data in the next procedure. In other words, the STEP 1 works, but from STEP 2, SAS did not know what to do with separate data from each date. I have edited my original post to show the entire code I constructed. I am so sorry!

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @phillylee 

 

If you want a separate output data set per day, you will have to do the analysis per day inside the loop too. I think the following code does what you want. The day list is in a macro variable, not read from Excel. This would require an extra step, where you read the spreadsheet into SAS to get the value list.

 

/* test data */
data ctm_20140525 ctm_20140526 ctm_20140527 ctm_20140528;
	format time_m 18.;
	var = 1;
	sym_root = 'AA';
	time_m = time();
run;

/* list of wanted dates and sym_roots */
%let daylist = 20140525 20140526 20140527;
%let symrootlist = 'AA' 'BB';

%macro m;
	%let nd = %sysfunc(countw(&daylist));
	%do i = 1 %to &nd;
		%let day = %scan(&daylist,&i,%str( ));
		data work.DailyTrade;
			set work.ctm_&day (where=(sym_root in ("&symrootlist") and 
				('7:00:00.000000000't <= time_m <= '20:00:00.000000000't)));
			type = 'T';
			format date date9. time_m TIME13. part_time trf_time TIME13.;
		run;

		/* example - could be replaced with any number of steps */
		data work.analysis_&day; set work.DailyTrade;
			iteration = &i;
		run;
	%end;
%mend;
%m;
			

 

mkeintz
PROC Star

I see two issues here.  One which you have stated, and another in which you can generate a much more efficient program.

 

You are using the NYSE-supplied trade-and-quotes data, with time stamps (currently) to the nanosecond.

 

  1. Multiple dates.

    The CTM datasets all have the name format of CTM_yyyymmdd.  So, if you want all the trading dates for April 2014, instead of listing all the datasets individually in a SET statement you could use

    SET ct.ctm_201404: ;

    The trailing colon tells sas to use all datasets whose names begin with CTM_201404.

    BUT be careful about specifying, say, more than a month of datasets.  That's because SAS will generate a memory buffer for each dataset.
  2. Efficiency issues
    1. Before I give advice here, I see that you have many separate steps that can be combined.  Once you start using a lot of dates, you will be repeatedly writing a lot of data to disk, only to read it back in.  Many of these steps can be consolidated.

      In fact, why are you merging means for each RTIME back with all the trades with that time-stamp only to de-duplicate later?
    2. Are you trying to generate 5, 10, and 15 minutes weighted and unweighted rolling window returns?  (or are you generating non-overlapping discrete window returns)?  Since the data are sorted by TIME, this can be done in a data step, avoiding much excess computation and record matching via proc sql.  If you clarify the data you want, we can likely improve performance by orders of magnitude - which would be very important if you start looking at portfolios of more than 2 stocks.
  3. Research issue.   What if there are no trades for a given equity for a particular minute (or worse, for a 5-minute interval?)  If there are no trades for, say,  13:44,  do you just assume that the simple and weighted mean price is inherited from 13:43?   This matters if your are doing rolling 5-minute windows.

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
phillylee
Calcite | Level 5

Thank you for your feedback. To answer some of your issues raised

 

1. multiple dates: I didn't know adding ":" would do the trick! thank you!

 

2. efficiency:

    I just started learning SAS. So I needed to break down steps to get the output I need. I am sure experts in SAS would combine, consolidate programs to make it more efficient. But, I admit I am not that good at it, yet. 

 

3. event time: stocks I am studying are mostly large firms. There will most likely be trade at a given time. I will have to see whether there is any incidence where there is no trade.

 

But thank you for your feedback!

 

mkeintz
PROC Star

@phillylee wrote:

Thank you for your feedback. To answer some of your issues raised

 

1. multiple dates: I didn't know adding ":" would do the trick! thank you!

 


 

I should have mentioned: you are ignoring the variable SYM_SUFFIX at your peril.  If a company has multiple share classes trading, then a the corresponding SYM_ROOT will have multiple SYM_SUFFIX values.   By  sorting your data by DATE RTIME SYM_ROOT, you risk interleaving multiple share classes for a given SYM_ROOT, thereby conflating PRICE and SIZE values for two different equities.  Now it might be that your two sym_root's (AA and AAL) only have one share class, so you'd be lucky.  But, for any sorting or use of BY statements, I'd use DATE SYM_ROOT SYM_SUFFIX DATE, and I'd include SYM_SUFFIX in any class statements.  And for any sql statements you should match on sym_suffix as well as sym_root.

 

The CTM datasets (one dataset per date) are sorted by SYM_ROOT SYM_SUFFIX TIME.  So I would suggest dispensing with your own  sorts.  In your first set statement you read multiple dates, so data set dailytrade is defacto sorted by DATE SYM_ROOT SYM_SUFFIX TIME.

 

Also in your by statements, use BY DATE SYM_ROOT SYM_SUFFIX TIME.  Each CTM dataset is sorted by those three variables.  I suspect in the case of your companies, there is probably only a blank SYM_SUFFIX  (i.e. only one share class trading per SYM_ROOT).   But if a given SYM_ROOT has multiple suffixes, you have a problem, because later on you sort by DATE RTIME SYM_ROOT, which means that multiple share classes can be interleaved, thereby contaminating your PRICE and SIZE values. 

 

 

Also


3. event time: stocks I am studying are mostly large firms. There will most likely be trade at a given time. I will have to see whether there is any incidence where there is no trade.

 

You assumption is probably true for market trading hours, but you asked for time stamps well outside those hours, so be aware.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User

As always, you should listen to @mkeintz suggestions. 

However, here's a quick tutorial on turning a program into a macro, which answers your question. 

 

https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

 

It would be more efficient to change your program to do it all at once, and you'll learn how to program more efficiently by refactoring your code instead of just looping it. But ultimately your choice. 

 


@phillylee wrote:

I have written code to run through data for specified dates. The problem I have is that I have to repeat the same procedure for multiple dates. My understanding is I can create a loop in which SAS will iterate the same procedure and generate output for multiple dates. Below is the subset of the code I have written.


   

/* STEP 1: RETRIEVE DAILY TRADE AND QUOTE (DTAQ) FILES */
    libname ct '/wrds/nyse/sasdata/taqms/ct';

    /* Retrieve Trade data */
    data DailyTrade;

        /* Enter Trade file names in YYYYMMDD format for the same dates */
        set ct.ctm_20140425 ct.ctm_20140426;

	/* Enter the same company tickers as above */
        where sym_root in ('AA','AAL')
		
        /* Retrieve trades during normal market hours */
        and (("7:00:00.000000000"t) <= time_m <= ("20:00:00.000000000"t));
        type='T';
        format date date9.;
        format time_m part_time trf_time TIME13.;
    run;

/* STEP 2: CLEAN DAILY TRADES DATA - DELETE ABNORMAL TRADES */

data trade2;
    set DailyTrade;
    where Tr_Corr eq '00' and price gt 0;
	drop Tr_Corr Tr_Source TR_RF Part_Time RRN TRF_Time Sym_Suffix Tr_SCond 
         Tr_StopInd;
run;

/* STEP 3: remove seconds/miliseconds */

data trade2;
	set trade2;
	rtime = time_m - mod(time_m,60);
	format rtime: time13.;
run;

/* STEP 4: calculate simple average by minute */

proc means data=trade2 noprint;
by date rtime sym_root;
var price;
output out=simple_avg(drop=_type_ _freq_) mean=simple_mean;
run;

data trade2;
merge trade2 simple_avg;
by date rtime sym_root;
run;
	
/* STEP 5: calculate weighted average by minute */

proc summary data=trade2 noprint;
by date rtime sym_root;
var price;
weight size;
output out=weight_avg(drop=_type_ _freq_) mean=weight_mean;
run;


data trade2;
merge trade2 weight_avg;
by date rtime sym_root;
run;

/* STEP 6: duplicates drop */

proc sort data=trade2
	out=trade3
	nodupkeys;
	by date rtime sym_root;
run;

/* STEP 7: 5min (15m, 30m) -interval return computation */
data temp;
	set trade3;
run;


proc sql;
	create table ret5 as
		select *,(((select weight_mean from temp where date=a.date and sym_root=a.sym_root and rtime-300=a.rtime)-weight_mean)/weight_mean) as ret5, date, sym_root, rtime
			from temp as a;
quit;

data trade3;
merge trade3 ret5;
by date rtime sym_root;
run;

proc sql;
	create table ret15 as
		select *,(((select weight_mean from temp where date=a.date and sym_root=a.sym_root and rtime-900=a.rtime)-weight_mean)/weight_mean) as ret15, date, sym_root, rtime
			from temp as a;
quit;

data trade3;
merge trade3 ret15;
by date rtime sym_root;
run;

proc sql;
	create table ret30 as
		select *,(((select weight_mean from temp where date=a.date and sym_root=a.sym_root and rtime-1800=a.rtime)-weight_mean)/weight_mean) as ret30, date, sym_root, rtime
			from temp as a;
quit;

data trade3;
merge trade3 ret30;
by date rtime sym_root;
run;

data project.DailyTrade;
	set trade3;
run; 

If you look at the code, I have specified the dates of interest. Can anyone help me loop through the dates for multiple dates?

 

*I have dates of interest in my excel file.

 

**I want individual output (DailyTrade) for each date.

 

 


 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 5104 views
  • 5 likes
  • 6 in conversation