DATA Step, Macro, Functions and more

Macro Performace

Reply
N/A
Posts: 0

Macro Performace

In order to simplify a data step I placed all the repeated code in a macro.
The problem is that the program got 10 times slower. The call execute() method and the macro code seams to have a lower performance than replicating all code in a data step.
How can obtain the desired performance without having to replicate all code in the data set?
N/A
Posts: 0

Re: Macro Performace

Posted in reply to deleted_user
1) Would be helpful to see an example of the code.

2) macro's are not the same thing as functions.

3) Call execute splits work. Macro code is resolved during the execution of the calling data step. Generated code is executed after the data step has completed. This means additional compilation overhead of the generated code.
N/A
Posts: 0

Re: Macro Performace

Posted in reply to deleted_user
I would recommend a different approach to what you are trying to do.

Something along this line
[pre]
proc sort data=work01;
by account_rk datetime;

data work02;
set work01;
by account_rk datetime;

retain first_value first_datetime;

if first.account_rk then do;
first_value = value;
first_datetime = datetime;
end;

if last.account_rk then do;
last_value = value;
last_datetime = datetime;
output;
end;

keep account_rk first_value first_datetime last_value last_datetime;
run;

No need for macro madness.
N/A
Posts: 0

Re: Macro Performace

Posted in reply to deleted_user
Thanks. that really helps. However i still have some questions.


1. In this example i want to know the first and last values and datetimes for an account. How can i have the first and last values for a group of variables... like account and transaction type?

2. In this example i have to repeat the code for interest and principal amt. I will have other amt's and other operations (not only first and last). How can i avoid repeating the same code for each amt?
N/A
Posts: 0

Re: Macro Performace

Posted in reply to deleted_user
Always remember the following rules:

KISS -- keep it simple

SAS has probably already done it for you.
SAS has many ways to accomplish the same thing.
SAS usually/generally has a simple way to get/do what you want.

Sometimes you have to split things up and then put it back together.

Read, read, read, because SAS has all you wanted (not just need) to know documented somewhere.

You eat an elephant a bite at a time.

Think about what you are trying to do, don't just rush ahead. Brick walls hurt.

[pre]
proc sort data=work01;
by account_rk tran_type datetime;
run;

data work03; *---> note this is a different dataset, explicity ;
set work01;
by account_rk tran_type datetime;
retain first_value first_datetime;

if first.tran_type then do;
first_value = value;
first_datetime = datetime;
end;

if last.tran_type then do;
last_value = value;
last_datetime = datetime;
output;
end;

keep account_rk first_value last_value first_datetime last_datetime tran_type;
run;

/* this step interleaves the two datasets together */
data work04;
set work02 work03;
by account_rk;
run;
[/pre]

So, first, determine what is the goal, what is the desire, what is the end result you need or are after.
Second, what do you need to get there?

When I program, I have found that it is always helpful to have the end goal, report, whatever well defined, then to work backwards. If I need this piece of data, where did it come from? How do I get it? if it is calculated, then what is the calculation and what are the pieces/terms? Where do they come from?

I also tend work from the general to the specific -- top down.
Here's the question, ....
What is the answer? a chart? a report? what kind of chart will show the information best?
What do I need to produce the chart and report?
Where do I get what I need?
What process/tool will produce the chart or report?
What do those tools require?
etc.

Of course, when I am investigating data, to see if I can discover something interesting in it, then I start from the front and work to the end, selecting, charting, calculating, charting, etc. until I see something of use.
N/A
Posts: 0

Re: Macro Performace

Posted in reply to deleted_user
Here goes a sample of the code. The macro updates the first and last amount and datetime for each account, for interest and principal amounts.
The data set work01 is ordered by account and transaction datetime.


%macro upd_first_last( val_trans_amt, val_trans_dttm, val_first_dttm, var_first_amt, var_first_dttm, var_last_amt, var_last_dttm);
/*check if first transaction*/
%if &val_first_dttm = . %then
%do;
%let &var_first_amt = &val_trans_amt;
%let &var_first_dttm = &val_trans_dttm;
%end;
/*update as last transaction*/
%let &var_last_amt = &val_trans_amt;
%let &var_last_dttm = &val_trans_dttm;
%mend;

data work02(keep = LAST_ACCOUNT_RK FIRST_PRINCIPAL_AMT FIRST_PRINCIPAL_DTTM FIRST_INTEREST_AMT FIRST_INTEREST_DTTM LAST_PRINCIPAL_AMT LAST_PRINCIPAL_DTTM LAST_INTEREST_AMT LAST_INTEREST_DTTM);
set work01 END=is_last;
/*get last control vars*/
last_account_rk = symget('last_account_rk');
if (last_account_rk ne ACCOUNT_RK) & _n_ > 1 then
do;
/*output*/
first_principal_amt = symget('first_principal_amt');
first_principal_dttm = symget('first_principal_dttm');
first_interest_amt = symget('first_interest_amt');
first_interest_dttm = symget('first_interest_dttm');
last_principal_amt = symget('last_principal_amt');
last_principal_dttm = symget('last_principal_dttm');
last_interest_amt = symget('last_interest_amt');
last_interest_dttm = symget('last_interest_dttm');
output;

/**
reset vars
*/
call symput('first_principal_amt', 0);
call symput('first_principal_dttm', .);
call symput('first_interest_amt', 0);
call symput('first_interest_dttm', .);
call symput('last_principal_amt', 0);
call symput('last_principal_dttm', .);
call symput('last_interest_amt', 0);
call symput('last_interest_dttm', .);
end;
call execute('%upd_first_last('||
PRINCIPAL_AMT ||',' || TRANSACTION_DTTM ||',' || symget('first_principal_dttm') ||
',first_principal_amt,first_principal_dttm,last_principal_amt,last_principal_dttm)');
call execute('%upd_first_last('||
INTEREST_AMT ||',' || TRANSACTION_DTTM ||',' || symget('first_interest_dttm') ||
',first_interest_amt,first_interest_dttm,last_interest_amt,last_interest_dttm)');
/*update control vars*/
call symput('last_account_rk', account_rk);
if is_last = 1 then
do;
/*update values and output*/
first_principal_amt = symget('first_principal_amt');
first_principal_dttm = symget('first_principal_dttm');
first_interest_amt = symget('first_interest_amt');
first_interest_dttm = symget('first_interest_dttm');
last_principal_amt = symget('last_principal_amt');
last_principal_dttm = symget('last_principal_dttm');
last_interest_amt = symget('last_interest_amt');
last_interest_dttm = symget('last_interest_dttm');
output;
end;
run;
N/A
Posts: 0

Re: Macro Performace

Posted in reply to deleted_user
Thanks for your help. The "retain" "first" and "last" are very usefull tips and they really save some work.

Another thing I was trying to find out is how to avoid replicating code. Supose that each transaction is divided in 5 types of amounts and for each pair (acount, transaction), I have to make a lot of equal calculations for each type amout. Do I have to hard code for each type of amout or is there a way to avoid repeating this code? I was using a macro for that purpose but the program got a lot slower.
N/A
Posts: 0

Re: Macro Performance

Posted in reply to deleted_user
That is one of the few frustrating things about SAS.

They should have provided a "FUNCTION" statement many many years ago to go along with the IF...THEN...ELSE, DO...END, etc. structures. It would have saved a lot of headaches over the decades.

MACRO's are not functions.

But, you can use them for what you want in this case, just not the way that you think.

MACRO's are not functions, they are code generators.
SAS is not system control oriented, but data centric.
SAS is not an interpreted language, but is one of the first, if not the first, Just-In-Time compiled languages.
SAS has probably already/implemented what you need.

Think about these things and what they mean.

SAS reads in a block of text, compiles it to very efficient machine code, executes it, reads in another block of text, compiles, executes, etc. until it reaches the end.

Text block boundaries are marked by DATA, PROC, RUN, and a few other statements. OPTIONS is more of a directive than executable code.
Macro's on the otherhand are somewhat different.
%MACRO name ... %MEND; delineate a block of text that is compiled, but retained in memory and then "executed" by SAS when it runs across the %name macro statement. At that point, the macro is executed, and if it generates code, then that code is injected/inserted in the text block at that point, before the text block is compiled. Then the text block is compiled and executed, but the %name macro has already been executed, and is not again; its work has already been done.

How did I learn all this? By reading the SAS manuals, SAS documentation, which used to be excellent in its completeness. This cannot be said for the MetaData Server and SAS's use of MetaData. But the stuff from before 2000 was all very very good. In fact, I keep a copy of the SAS Documentation up on my workstation all day because I am constantly referring to it, even for this response. I have programmed and do program in so much different stuff on so many different platforms that at my age, I can't keep it all straight all the time, so I constantly refer back to the documentation to make sure I'm not confusing C, ksh, awk, Java, VBScript, Perl, SAS, etc. with each other. I also don't remember as well, partly because I am also continually researching how some system works. Details matter, and my memory is more conceptual than rote. SAS is so big, rich and deep, it would be a very rare individual who could keep the whole of it in their head and remember everything about it.

Anyway, to accomplish what you want.
[pre]
%macro standard_calc(var1,var2,var3);
&var3 = &var1 + &var2 ;
%mend;

DATA dummy;
set indata;
%standard_calc(field1,field2,sum1);
%standard_calc(field3,field4,sum2);
%standard_calc(field5,field6,sum3);
%standard_calc(field7,field8,sum4);
run;
[/pre]

Now, with cut/copy and paste readily available, think about what would be the least amount of work to accomplish what you want. Does the macro save typing and potentially simplify coding and maintenance? Or will it actually add more work and take more time to implement?
Look at my example again. For that specific situation, the right thing to do would have been to go to the SAS documentation for 9.1 (Help in PC SAS), SAS Products, Base SAS, SAS Language Dictionary, Dictionary of Language Elements, Functions and CALL Routines and found that SAS already has a "SUM" function. So, all we would really need to do is type "sum1 = sum(field1,field2);" and then copy and past it few times, and then update the numeric parts of the variable names.

SAS is so unique, that you have to play with it a little to get to know it, but playing with SAS is usually simple, especially within EG and in PC SAS. Running SAS batch processes on Unix or a Mainframe is a little more work, and Batch SAS on Windows can be even more tricky.

I would strongly recommend that you read the real SAS documentation. It is readily available online (off the web) now. Also, search for threads by Joshua. He is new to SAS this year and has posted an excellent list of books that have helped him. Message was edited by: Chuck
N/A
Posts: 0

Re: Macro Performace

Posted in reply to deleted_user
Thanks for your time and explanation. I used the macro to generate the code and it worked quit good.
N/A
Posts: 0

Re: Macro Performace

Posted in reply to deleted_user
Your welcome.

I know I can get rather wordy at times, so I hope what I wrote wasn't too much.
Ask a Question
Discussion stats
  • 9 replies
  • 201 views
  • 0 likes
  • 1 in conversation