Hello,
I am writing to get help with an issue I am having at the moment.
I am dealing within a series of files, names opprcdyyyy, with yyyy ranging from 1996 to 2020. All the documents have exactly the same columns, and each of them has several rows (order of Million rows). I would like to combine all of them together, applying standard filters to each of them.
Moreover, I need to combine the opprcdyyyy file for every given year with another file, named optionmsecurd. After the merge, I will also apply different filters to the merged table.
I thought to use a couple of macros in doing the steps listed above:
%macro OptionM ();
%do i=1996 %to 2020;
proc sql;
create table dataset_optionm&i as
select a.*, b.index_flag, b.issue_type from
optionm.opprcd&i as a inner join optionm.securd as b
on a.secid = b.secid;
quit;
%end;
%mend;
%macro OMFiltered();
%do i=1996 %to 2020;
proc sql;
create table om_filtered&i as
select *, (best_offer-best_bid) as bid_ask from dataset_optionm&i
where index_flag in ('0') and issue_type = '0' and ss_flag = '0' and volume > 0 and calculated bid_ask > 0;
quit;
%end;
%mend;
Still, it takes ages, and I do not know it is the most correct way.
Does anyone have better suggestions?
Thanks,
Riccardo
It takes ages because you are performing JOINs on millions of records, times 25 years. It doesn't take ages because you wrote a macro.
However, you could very easily apply the idea that you want to pass through the data as few times as possible. Especially when you have millions of rows, passing through the data as few times is possible is a very good thing to do! Right now you are passing through each data set twice, once in each SQL. You combine the two SQLs into one SQL that does all the calculations and filtering. Now you pass through the data once for each data set, and you should be able to cut the time roughly in half.
It takes ages because you are performing JOINs on millions of records, times 25 years. It doesn't take ages because you wrote a macro.
However, you could very easily apply the idea that you want to pass through the data as few times as possible. Especially when you have millions of rows, passing through the data as few times is possible is a very good thing to do! Right now you are passing through each data set twice, once in each SQL. You combine the two SQLs into one SQL that does all the calculations and filtering. Now you pass through the data once for each data set, and you should be able to cut the time roughly in half.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.