BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RDellaVilla
Fluorite | Level 6

Hello,

I am writing to get help with an issue I am having at the moment. 

I am dealing within a series of files, names opprcdyyyy, with yyyy ranging from 1996 to 2020. All the documents have exactly the same columns, and each of them has several rows (order of Million rows). I would like to combine all of them together, applying standard filters to each of them. 

Moreover, I need to combine the opprcdyyyy file for every given year with another file, named optionmsecurd. After the merge, I will also apply different filters to the merged table.

 

I thought to use a couple of macros in doing the steps listed above:

 

%macro OptionM ();
%do i=1996 %to 2020;
proc sql;
create table dataset_optionm&i as
select a.*, b.index_flag, b.issue_type from
optionm.opprcd&i as a inner join optionm.securd as b
on a.secid = b.secid;
quit;
%end;
%mend;

 

%macro OMFiltered();
%do i=1996 %to 2020;
proc sql;
create table om_filtered&i as
select *, (best_offer-best_bid) as bid_ask from dataset_optionm&i
where index_flag in ('0') and issue_type = '0' and ss_flag = '0' and volume > 0 and calculated bid_ask > 0;
quit;
%end;
%mend;

 

Still, it takes ages, and I do not know it is the most correct way. 

 

Does anyone have better suggestions? 

 

Thanks, 

Riccardo 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

It takes ages because you are performing JOINs on millions of records, times 25 years. It doesn't take ages because you wrote a macro.

 

However, you could very easily apply the idea that you want to pass through the data as few times as possible. Especially when you have millions of rows, passing through the data as few times is possible is a very good thing to do! Right now you are passing through each data set twice, once in each SQL. You combine the two SQLs into one SQL that does all the calculations and filtering. Now you pass through the data once for each data set, and you should be able to cut the time roughly in half.

--
Paige Miller

View solution in original post

1 REPLY 1
PaigeMiller
Diamond | Level 26

It takes ages because you are performing JOINs on millions of records, times 25 years. It doesn't take ages because you wrote a macro.

 

However, you could very easily apply the idea that you want to pass through the data as few times as possible. Especially when you have millions of rows, passing through the data as few times is possible is a very good thing to do! Right now you are passing through each data set twice, once in each SQL. You combine the two SQLs into one SQL that does all the calculations and filtering. Now you pass through the data once for each data set, and you should be able to cut the time roughly in half.

--
Paige Miller

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 534 views
  • 0 likes
  • 2 in conversation