I have written code to run through data for specified dates. The problem I have is that I have to repeat the same procedure for multiple dates. My understanding is I can create a loop in which SAS will iterate the same procedure and generate output for multiple dates. Below is the subset of the code I have written.
/* STEP 1: RETRIEVE DAILY TRADE AND QUOTE (DTAQ) FILES */ libname ct '/wrds/nyse/sasdata/taqms/ct'; /* Retrieve Trade data */ data DailyTrade; /* Enter Trade file names in YYYYMMDD format for the same dates */ set ct.ctm_20140425 ct.ctm_20140426; /* Enter the same company tickers as above */ where sym_root in ('AA','AAL') /* Retrieve trades during normal market hours */ and (("7:00:00.000000000"t) <= time_m <= ("20:00:00.000000000"t)); type='T'; format date date9.; format time_m part_time trf_time TIME13.; run; /* STEP 2: CLEAN DAILY TRADES DATA - DELETE ABNORMAL TRADES */ data trade2; set DailyTrade; where Tr_Corr eq '00' and price gt 0; drop Tr_Corr Tr_Source TR_RF Part_Time RRN TRF_Time Sym_Suffix Tr_SCond Tr_StopInd; run; /* STEP 3: remove seconds/miliseconds */ data trade2; set trade2; rtime = time_m - mod(time_m,60); format rtime: time13.; run; /* STEP 4: calculate simple average by minute */ proc means data=trade2 noprint; by date rtime sym_root; var price; output out=simple_avg(drop=_type_ _freq_) mean=simple_mean; run; data trade2; merge trade2 simple_avg; by date rtime sym_root; run; /* STEP 5: calculate weighted average by minute */ proc summary data=trade2 noprint; by date rtime sym_root; var price; weight size; output out=weight_avg(drop=_type_ _freq_) mean=weight_mean; run; data trade2; merge trade2 weight_avg; by date rtime sym_root; run; /* STEP 6: duplicates drop */ proc sort data=trade2 out=trade3 nodupkeys; by date rtime sym_root; run; /* STEP 7: 5min (15m, 30m) -interval return computation */ data temp; set trade3; run; proc sql; create table ret5 as select *,(((select weight_mean from temp where date=a.date and sym_root=a.sym_root and rtime-300=a.rtime)-weight_mean)/weight_mean) as ret5, date, sym_root, rtime from temp as a; quit; data trade3; merge trade3 ret5; by date rtime sym_root; run; proc sql; create table ret15 as select *,(((select weight_mean from temp where date=a.date and sym_root=a.sym_root and rtime-900=a.rtime)-weight_mean)/weight_mean) as ret15, date, sym_root, rtime from temp as a; quit; data trade3; merge trade3 ret15; by date rtime sym_root; run; proc sql; create table ret30 as select *,(((select weight_mean from temp where date=a.date and sym_root=a.sym_root and rtime-1800=a.rtime)-weight_mean)/weight_mean) as ret30, date, sym_root, rtime from temp as a; quit; data trade3; merge trade3 ret30; by date rtime sym_root; run; data project.DailyTrade; set trade3; run;
If you look at the code, I have specified the dates of interest. Can anyone help me loop through the dates for multiple dates?
*I have dates of interest in my excel file.
**I want individual output (DailyTrade) for each date.
I see two issues here. One which you have stated, and another in which you can generate a much more efficient program.
You are using the NYSE-supplied trade-and-quotes data, with time stamps (currently) to the nanosecond.
Hi and welcome to the SAS Communities 🙂
So how many dates do you want to loop over? And are they consecutive dates?
Also, do you want the resulting data to be in a single data set or one data set for each date?
Yes. I needed to be more specific. So I have all the dates in an excel format. If you know a way to load the dates in the excel onto my code and run it, it would be the best!
And I would rather have individual data output for each day.
You don't specify where the multiple dates are coming from. That makes a difference in how to write the program. Are the multiple dates in an Excel file or a text file? Are they consecutive dates? Are they consecutive Tuesdays?
Yes. I needed to be more specific. So I have all the dates in an excel format. If you know a way to load the dates in the excel onto my code and run it, it would be the best!
I can't see your data, so needless to say this code is untested. You can do something like this to create a data set for each date though.
data callstack;
length string $500;
do dt='01jan2014'd to '31dec2018'd;
string=compbl(cats(
"
data DailyTrade_", put(dt, yymmddn8.), ";
set ct.ctm_", put(dt, yymmddn8.), ";
where sym_root in ('AA')
and (('7:00:00.000000000't) <= time_m <= ('20:00:00.000000000't));
type='T';
format date date9.;
format time_m part_time trf_time TIME13.;
run;
"
));
output;
call execute(string);
end;
run;
I tried with you code, but it gives me 0 observations. This is because retrieving data from each date worked, but SAS did not know what to do with separate data in the next procedure. In other words, the STEP 1 works, but from STEP 2, SAS did not know what to do with separate data from each date. I have edited my original post to show the entire code I constructed. I am so sorry!
Hi @phillylee
If you want a separate output data set per day, you will have to do the analysis per day inside the loop too. I think the following code does what you want. The day list is in a macro variable, not read from Excel. This would require an extra step, where you read the spreadsheet into SAS to get the value list.
/* test data */
data ctm_20140525 ctm_20140526 ctm_20140527 ctm_20140528;
format time_m 18.;
var = 1;
sym_root = 'AA';
time_m = time();
run;
/* list of wanted dates and sym_roots */
%let daylist = 20140525 20140526 20140527;
%let symrootlist = 'AA' 'BB';
%macro m;
%let nd = %sysfunc(countw(&daylist));
%do i = 1 %to &nd;
%let day = %scan(&daylist,&i,%str( ));
data work.DailyTrade;
set work.ctm_&day (where=(sym_root in ("&symrootlist") and
('7:00:00.000000000't <= time_m <= '20:00:00.000000000't)));
type = 'T';
format date date9. time_m TIME13. part_time trf_time TIME13.;
run;
/* example - could be replaced with any number of steps */
data work.analysis_&day; set work.DailyTrade;
iteration = &i;
run;
%end;
%mend;
%m;
I see two issues here. One which you have stated, and another in which you can generate a much more efficient program.
You are using the NYSE-supplied trade-and-quotes data, with time stamps (currently) to the nanosecond.
Thank you for your feedback. To answer some of your issues raised
1. multiple dates: I didn't know adding ":" would do the trick! thank you!
2. efficiency:
I just started learning SAS. So I needed to break down steps to get the output I need. I am sure experts in SAS would combine, consolidate programs to make it more efficient. But, I admit I am not that good at it, yet.
3. event time: stocks I am studying are mostly large firms. There will most likely be trade at a given time. I will have to see whether there is any incidence where there is no trade.
But thank you for your feedback!
@phillylee wrote:
Thank you for your feedback. To answer some of your issues raised
1. multiple dates: I didn't know adding ":" would do the trick! thank you!
I should have mentioned: you are ignoring the variable SYM_SUFFIX at your peril. If a company has multiple share classes trading, then a the corresponding SYM_ROOT will have multiple SYM_SUFFIX values. By sorting your data by DATE RTIME SYM_ROOT, you risk interleaving multiple share classes for a given SYM_ROOT, thereby conflating PRICE and SIZE values for two different equities. Now it might be that your two sym_root's (AA and AAL) only have one share class, so you'd be lucky. But, for any sorting or use of BY statements, I'd use DATE SYM_ROOT SYM_SUFFIX DATE, and I'd include SYM_SUFFIX in any class statements. And for any sql statements you should match on sym_suffix as well as sym_root.
The CTM datasets (one dataset per date) are sorted by SYM_ROOT SYM_SUFFIX TIME. So I would suggest dispensing with your own sorts. In your first set statement you read multiple dates, so data set dailytrade is defacto sorted by DATE SYM_ROOT SYM_SUFFIX TIME.
Also in your by statements, use BY DATE SYM_ROOT SYM_SUFFIX TIME. Each CTM dataset is sorted by those three variables. I suspect in the case of your companies, there is probably only a blank SYM_SUFFIX (i.e. only one share class trading per SYM_ROOT). But if a given SYM_ROOT has multiple suffixes, you have a problem, because later on you sort by DATE RTIME SYM_ROOT, which means that multiple share classes can be interleaved, thereby contaminating your PRICE and SIZE values.
Also
3. event time: stocks I am studying are mostly large firms. There will most likely be trade at a given time. I will have to see whether there is any incidence where there is no trade.
You assumption is probably true for market trading hours, but you asked for time stamps well outside those hours, so be aware.
As always, you should listen to @mkeintz suggestions.
However, here's a quick tutorial on turning a program into a macro, which answers your question.
https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md
It would be more efficient to change your program to do it all at once, and you'll learn how to program more efficiently by refactoring your code instead of just looping it. But ultimately your choice.
@phillylee wrote:
I have written code to run through data for specified dates. The problem I have is that I have to repeat the same procedure for multiple dates. My understanding is I can create a loop in which SAS will iterate the same procedure and generate output for multiple dates. Below is the subset of the code I have written.
/* STEP 1: RETRIEVE DAILY TRADE AND QUOTE (DTAQ) FILES */ libname ct '/wrds/nyse/sasdata/taqms/ct'; /* Retrieve Trade data */ data DailyTrade; /* Enter Trade file names in YYYYMMDD format for the same dates */ set ct.ctm_20140425 ct.ctm_20140426; /* Enter the same company tickers as above */ where sym_root in ('AA','AAL') /* Retrieve trades during normal market hours */ and (("7:00:00.000000000"t) <= time_m <= ("20:00:00.000000000"t)); type='T'; format date date9.; format time_m part_time trf_time TIME13.; run; /* STEP 2: CLEAN DAILY TRADES DATA - DELETE ABNORMAL TRADES */ data trade2; set DailyTrade; where Tr_Corr eq '00' and price gt 0; drop Tr_Corr Tr_Source TR_RF Part_Time RRN TRF_Time Sym_Suffix Tr_SCond Tr_StopInd; run; /* STEP 3: remove seconds/miliseconds */ data trade2; set trade2; rtime = time_m - mod(time_m,60); format rtime: time13.; run; /* STEP 4: calculate simple average by minute */ proc means data=trade2 noprint; by date rtime sym_root; var price; output out=simple_avg(drop=_type_ _freq_) mean=simple_mean; run; data trade2; merge trade2 simple_avg; by date rtime sym_root; run; /* STEP 5: calculate weighted average by minute */ proc summary data=trade2 noprint; by date rtime sym_root; var price; weight size; output out=weight_avg(drop=_type_ _freq_) mean=weight_mean; run; data trade2; merge trade2 weight_avg; by date rtime sym_root; run; /* STEP 6: duplicates drop */ proc sort data=trade2 out=trade3 nodupkeys; by date rtime sym_root; run; /* STEP 7: 5min (15m, 30m) -interval return computation */ data temp; set trade3; run; proc sql; create table ret5 as select *,(((select weight_mean from temp where date=a.date and sym_root=a.sym_root and rtime-300=a.rtime)-weight_mean)/weight_mean) as ret5, date, sym_root, rtime from temp as a; quit; data trade3; merge trade3 ret5; by date rtime sym_root; run; proc sql; create table ret15 as select *,(((select weight_mean from temp where date=a.date and sym_root=a.sym_root and rtime-900=a.rtime)-weight_mean)/weight_mean) as ret15, date, sym_root, rtime from temp as a; quit; data trade3; merge trade3 ret15; by date rtime sym_root; run; proc sql; create table ret30 as select *,(((select weight_mean from temp where date=a.date and sym_root=a.sym_root and rtime-1800=a.rtime)-weight_mean)/weight_mean) as ret30, date, sym_root, rtime from temp as a; quit; data trade3; merge trade3 ret30; by date rtime sym_root; run; data project.DailyTrade; set trade3; run;If you look at the code, I have specified the dates of interest. Can anyone help me loop through the dates for multiple dates?
*I have dates of interest in my excel file.
**I want individual output (DailyTrade) for each date.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.