BookmarkSubscribeRSS Feed
Kaos
Fluorite | Level 6

Hello -

 

I need help writing a macro to perform data manipulations on a variable to improve performance of my code. 

 

Simplified Code:

 

Proc sql;

select A.*, B.*

from Table1   A

left join Table2   B

on upcase(transtrn(transtrn(A.col1,"abcde-05",""),"123-A","") = B.col1

Quit;

 

Problem: Applying functions on join condition has degraded the performance of my code as the tables are big.

 

Question: Is there a way I can create a macro that takes care of data manipulation externally? 

 

Any other ideas or help will be much appreciated.

 

Thanks!

10 REPLIES 10
Tom
Super User Tom
Super User

In general NO.  You can make a macro to generate SAS code for you, but it does not REPLACE the SAS code.

Now if you figured out some complex series of steps that made your query work better you can write a macro that would make it easier for users to implement that complex series of steps.

 

kiranv_
Rhodochrosite | Level 12

I would do one more step, where in, I will create a temporay work table with upcase(transtrn(transtrn(A.col1,"abcde-05",""),"123-A","")  as somecol and then I will create an index on the somecol and then join with other table.

Kaos
Fluorite | Level 6

 

Thank you for your response. I cannot create any additional table due to space limitations. I am joining on ten different tables so creating temp columns for ten tables with 14 million rows would take up a lot space.

Tom
Super User Tom
Super User

Sounds like your real problem is how to optimize a query and has nothing to do with macro programming (at this time).

Once you have defined how to optimize the query then perhaps a macro could help implement it.  For example you might find that you  need to split your query into many separate queries that are then combined. You could use a macro to automate that splitting for you.

 

If you need help with optimizing your query you will need to provide more information, like the structure and size of the tables involved, the expected output of the query. Also where your data is stored. Perhaps it is already in a DBMS system like Oracle and you just need to push the query into the database and only return the result set to SAS, or even better do everything in the database and return nothing to SAS.

 

Kaos
Fluorite | Level 6

Hi Tom,

 

Thanks for your input, you're right, I am trying to improve performance of the code. In the actual code, there are 10 SAS datasets in joins, the final ouput is in SAS and has about 14Million records, and took nearly two and a half hours to execute the code. I am just curious, would creating a macro variable help in anyway?? Something like this -

 

%let new_col1 = %sysfunc(compress(transtrn(col1, "abcd", "")));

 

Proc sql;

select A.*

from Table1   A

join Table2   B

on &new_column. = B.col2;

Quit;

 

Please let me know your thoughts.

 

Thanks!

 

Kurt_Bremser
Super User

@Kaos wrote:

Hi Tom,

 

Thanks for your input, you're right, I am trying to improve performance of the code. In the actual code, there are 10 SAS datasets in joins, the final ouput is in SAS and has about 14Million records, and took nearly two and a half hours to execute the code. I am just curious, would creating a macro variable help in anyway?? Something like this -

 

%let new_col1 = %sysfunc(compress(transtrn(col1, "abcd", "")));

 

Proc sql;

select A.*

from Table1   A

join Table2   B

on &new_column. = B.col2;

Quit;

 

Please let me know your thoughts.

 

Thanks!

 


This does not work. The macro processor only sees the TEXT(!) col1 in the %sysfunc, and so the transtrm will not work.

Keep in mind that the macro processor is a PREprocessor for creating dynamic code, not for data manipulation. It's work is done long before the variable col1 will even exist in the data step.

 

As I already suggested, create a proper column when dataset A is created, and use that in subsequent operations. And if all datasets are in SAS, use sorting and data step merging if feasible. If you run out of space, get more storage. You always need at least 3 times the size of your biggest dataset as free space when working with SAS.

Patrick
Opal | Level 21

@Kaos

As @ChrisNZ writes "performance is very data dependent" and to tweak code for performance one needs to understand the data volumes and relationships between the tables.

 

You need to provide us with as much real detail as possible for us to support you. What we need to fully understand are the volumes of your tables (rows, bytes per row), the relationships between the tables and where the data is stored (SAS tables or a data base).

 

To give you an example:

Joining two SAS tables requires the tables to get sorted. If you execute a SQL such a sort will happen implicitly. Joining 10 tables can create a lot of resource intensive sorting. In a case where one table is low in volume and the other table is huge it can be much more efficient to use a data step hash table look-up as there you don't have to sort the high volume table. For us to propose such approaches we need to understand your tables so you need to provide us with this information.

 

Ideally (and that will be a bit of work for you): Create sample data for all the tables (with the real column names), provide the volumes for the tables and provide the underperforming but working SQL so we can understand the relationships between the tables and your desired result.

If you can provide this level of information then we should be able to propose to you better performing code and also provide working and tested code samples for our proposals.

Tom
Super User Tom
Super User

@Kaos wrote:

I am just curious, would creating a macro variable help in anyway?? Something like this -

 

%let new_col1 = %sysfunc(compress(transtrn(col1, "abcd", "")));

 


That will NOT help and in fact would make the program not work as intended.  If you try creating that macro vraiable you will see that it always has just the value col1.   So the generated SQL code will only have the column name and will no longer contain any code to convert the value of the column to make the comparison in the way that you wanted.

 

Kurt_Bremser
Super User

Move the function (upcase(transtrn(transtrn(A.col1,"abcde-05",""),"123-A",""))) into the step that creates table a, so you already have the column in there.

Depending on the relationship between the tables (if it is not many-to-many), sorting with proc sort and merging with a data step might outperform the SQL. If the datasets are all in SAS, of course.

ChrisNZ
Tourmaline | Level 20

Performance is very data-dependant, so we won't be able to replicate your conditions.

You just have to try. What's for sure is that macros on their own won't help, as described before. 

They just generate code. You have to come up with the code.

 

Where macros may help is, for example, if your code splits the tasks into smaller bits.

In this case, macros can automate the process of running all the bits.

 

Another angle is to try other processing options.

For instance WANT2 is generated twice as fast on my PC as WANT1 below:

data HAVE1(keep=I COL1)
     HAVE2(keep=I COL2);      
 retain COL1 'a abcde-05 123-A ' COL2 'A    ';
 do i=1 to 4e6; output; end;
run;

proc sql magic=102 _method; create table WANT1 as select a.* from HAVE1 a , HAVE2 b where a.I=b.I and upcase(transtrn(transtrn(a.COL1,"abcde-05",""),"123-A","")) = b.COL2; quit;

data _V/view=_V; set HAVE1; COL1=upcase(transtrn(transtrn(COL1,"abcde-05",""),"123-A","")) ; run; proc sql magic=102 _method; create table WANT2 as select a.* from _V a , HAVE2 b where a.I=b.I and a.COL1 = b.COL2; quit;

NOTE: Table WORK.WANT1 created, with 4000000 rows and 2 columns.

NOTE: PROCEDURE SQL used (Total process time):
real time 11.39 seconds
user cpu time 12.27 seconds
system cpu time 4.66 seconds

 

 

NOTE: Table WORK.WANT2 created, with 4000000 rows and 2 columns.

NOTE: PROCEDURE SQL used (Total process time):
real time 6.19 seconds
user cpu time 7.20 seconds

 

But which conditions will work best for you, we cannot guess.

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 808 views
  • 1 like
  • 6 in conversation