Help using Base SAS procedures

Resolving a formula dynamically

Reply
N/A
Posts: 0

Resolving a formula dynamically

Hi there. I'd like to know if there's anybody out there who's got a better idea than mine to resolve a formula dynamically. Unfortunately, a macro couldn't help me. Or could it?

The problem: A table contains values in columns a, b and a formula in column c which decided the result which is stored in column d:
For index 1, columns a, b, and c contain 1, 2, and a+b, thus d should be 3.
For index 2, columns a, b, and c contain 1, 2, and a*b, thus d should be 2.
Columns a and be may contain other numeric values, c may contain other formulas in standard notation, and d should end up showing the result of the resolved formula in c.

One solution that works, but is incredibly slow when the source data set is, say, more than 10.000 rows:
data _null_;
set abcd;
call execute( "proc sql; update problem; set d = " || c || " where index = || index || "; quit;");
run;

I also tried three call executes, where the first one created a temporary 1-row data set, the second appended this to the master data set, and the third deleted the temporary data set. The append was slow, so I tried the update routine. Unfortunately, this was slow, too.

A short explanation to why I'm using call execute: Macros couldn't help me, since they evaluated in a completely different context from when my data steps did.

If you have any questions, be my guest, and if you have any new ideas, I think I love you!
SAS Super FREQ
Posts: 8,868

Re: Resolving a formula dynamically

Posted in reply to deleted_user
Hi:
Did you try the "old school" approach instead of CALL EXECUTE?? See the program and log below. Note that the formula used for C must be a valid expression for an assignment statement and that for more complex formulas, you may end up needing to use macro quoting functions to protect the operators. Also, I used a "brute force" set of 4 IF statements -- there are ways to be more elegant with the macro code and probably more brief.

cynthia

The Program:
[pre]
data abcd;
infile datalines;
input index a b c $;
call symput('form'||put(index,1.0),trim(c));
call symput('lastind',put(index,2.0));
return;
datalines;
1 1 2 a+b
2 1 2 a*b
3 1 2 a/b
4 1 2 a**b
;
run;

%put form1= &form1;
%put form2= &form2;
%put lastind= &lastind;

** Use the macro variable to "type" the formula into the program;
** You could make this program automatically generated with a macro;
** program, but depending on how many values you have for INDEX, this;
** approach is the quickest and easiest to code;
data new;
set abcd;
if index = 1 then d = &form1;
else if index = 2 then d = &form2;
else if index = 3 then d = &form3;
else if index = 4 then d = &form4;
putlog index= a= b= c= d=;
run;

ods listing;
proc print data=new;
title 'What is in the new dataset?';
run;
[/pre]

The LOG showing the use of the macro variable in PUTLOG:
[pre]
1065 %put form1= &form1;
form1= a+b
1066 %put form2= &form2;
form2= a*b
1067 %put lastind= &lastind;
lastind= 4
1068
1069 ** Use the macro variable to "type" the formula into the
1069! program;
1070 ** You could make this program automatically generated
1070! with a macro;
1071 ** program, but depending on how many values you have for
1071! INDEX, this;
1072 ** approach is the quickest and easiest to code;
1073 data new;
1074 set abcd;
1075 if index = 1 then d = &form1;
1076 else if index = 2 then d = &form2;
1077 else if index = 3 then d = &form3;
1078 else if index = 4 then d = &form4;
1079 putlog index= a= b= c= d=;
1080 run;

index=1 a=1 b=2 c=a+b d=3
index=2 a=1 b=2 c=a*b d=2
index=3 a=1 b=2 c=a/b d=0.5
index=4 a=1 b=2 c=a**b d=1
NOTE: There were 4 observations read from the data set
WORK.ABCD.
NOTE: The data set WORK.NEW has 4 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds


1081
1082 ods listing;
1083 proc print data=new;
1084 title 'What is in the new dataset?';
1085 run;

NOTE: There were 4 observations read from the data set WORK.NEW.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

[/pre]
Super Contributor
Super Contributor
Posts: 365

Re: Resolving a formula dynamically

Posted in reply to deleted_user
Hello Anders Eggum,

I do not understand why macro is not working. How about this solution:
[pre]
data i;
i=1; a=1; b=1; c="a*b"; output;
i=2; a=1; b=1; c="a+b"; output;
run;
proc SQL;
select COUNT(*) as n into :n from i;
%let n=%TRIM(&n);
select c into :c1-:c&n from i;
quit;
%macro a;
data r;
set i;
%do i=1 %to &n;
if &i EQ _n_ then d=&&c&i;
%end;
run;
%mend;
%a;
[/pre]
Sincerely,
SPR
Super User
Posts: 10,046

Re: Resolving a formula dynamically

Hi.
your code has lots lots of if statement which is a big mistake for a programmer,especially when have lots of if statement.And I am sure @chang_y_chung@hotmail.com will mock you very much.

I agree with Cynthia ,it is only execute one if statement each data step.And yours will execute lots lots of if statements for each data step to exhaust system source.
So I supported Cynthia.

And I also am curious that why your code can not use macro ? Especia for your situation which is not included function.

Ksharp
N/A
Posts: 0

Re: Resolving a formula dynamically

The problem I had with my macro variables is that they resolved before or after my data step. Somehow, this didn't work. Unfortunately, I didn't keep the bad code.

Anyway, thank you, both Cynthia and SPR for your replies. I'll check those solutions out. Here's another solution, which did work well (still a bit slow, but much faster than my call execute solution):

*a test dataset (I tried this program out with 100,000 rows, which took about 5 minutes. So one million rows will still take a while...);
data test;
a=1; b=2; c='a+b'; d=0; output;
a=1; b=2; c='a-b'; d=0; output;
run;

*create a temporary file;
filename test temp;

*build a sas program in the temporary file;
*for each row in input dataset, resolve the function in c into d;
data _null_;
set test end=last;
file test;
if _n_=1 then put 'data out;' /
'set test;' /
'if _n_=' _n_ 'then d=' c ';';
else put 'else if _n_ = ' _n_ 'then d=' c ';';
if last then put 'run;';
run;

*include the generated sas program, which runs the resolved functions;
%inc test;
Super Contributor
Super Contributor
Posts: 365

Re: Resolving a formula dynamically

Hello Ksharp,

I did not see mocking remarks on my code from Chang_y_chung. However, I tried to optimize my code and got the following result:[pre]
data i;
do l=1 to 500000;
i=1; a=1; b=1; c="a*b"; output;
i=2; a=1; b=1; c="a+b"; output;
end;
run;
options nomlogic nosymbolgen;
proc SQL ;
select COUNT(*) as n into :n from i;
quit;
%let n=%TRIM(&n);
%macro b(p=);
%do i=&p %to &n %by &p;
data a;
set i;
if %EVAL(&i-&p) < _n_ <= &i then output;
run;
proc SQL noprint;
select COUNT(*) as n into :n1 from a;
%let n1=%TRIM(&n1);
select c into :c1-:c&n1 from i;
quit;
data b;
set a;
%do j=1 %to &n1;
if &j EQ _n_ then do;
d=&&c&j;
end;
%end;
run;
%if &i=&p %then %do; data r; set b; run; %end;
%else %do; data r; set r b; run; %end;
%end;
%mend b;
%b(p=5000);
;
It does not compete with Chang's code and runs about 5 minutes for 1M observations.
Sincerely,
SPR
Regular Contributor
Posts: 241

Re: Resolving a formula dynamically

Posted in reply to deleted_user
If you don't need to maintain much numeric precision, then you can do something like below to take advantage of %sysevalf(). On my moderate PC, the second data step has 1M obs but ran in less than 30 seconds. Hope this helps a bit.



   /* test data */


   data one;


     do i = 1 to 1e6/2;


       a=1; b=2; c='a+b'; d=0output;


       a=1; b=2; c='a-b'; d=0output;


     end;


   run;


 


   data two;


     set one;


     length m $200;


     retain popen '(' pclose ')';


     m = c;


     m = transtrn(m, 'a', catt(popen, put(a,best.-l), pclose));


     m = transtrn(m, 'b', catt(popen, put(b,best.-l), pclose));


     m = catt('%sysevalf', popen, m, pclose);


     d = input(resolve(m), best.);


     keep a b c d;


   run;


   /* on log


   NOTE: There were 1000000 observations read from the data set WORK.ONE.


   NOTE: The data set WORK.TWO has 1000000 observations and 4 variables.


   NOTE: DATA statement used (Total process time):


         real time           25.53 seconds


         cpu time            25.52 seconds


   */


 


   proc print data=two(obs=3);


   run;


   /* on lst


    Obs    a    b     c      d


      1    1    2    a+b     3


      2    1    2    a-b    -1


      3    1    2    a+b     3


  */

N/A
Posts: 0

Re: Resolving a formula dynamically

Posted in reply to chang_y_chung_hotmail_com
Thanks for a very interesting little piece of code. I'll give it a go. Thanks a lot to all of you. I hope I can contribute back to the forum, too... I'm impressed with the quick response and your good ideas. I'll make sure to provide some feedback when I try this out. For right now, I have to finish up some tasks that depended on these results, but I need to optimize the code afterwards, and that is when this comes in handy. Maybe tomorrow, maybe next week.
Super User
Posts: 10,046

Re: Resolving a formula dynamically

Posted in reply to deleted_user
Just one more choice.
Under my IBM R50 notebook, It spents several second to process ten thousands observations.

[pre]
data _null_;
set temp end=last;
call symput(cats('d',_n_),c);
if last then call symput('nobs',_n_);
run;

%macro data;
data temp;
set temp;
select(_n_);
%do i=1 %to &nobs;
when (&i) d = &&d&i ;
%end;
end;
run;
%mend;

%data

[pre]





[pre]

194 data _null_;
195 set temp end=last;
196 call symput(cats('d',_n_),c);
197 if last then call symput('nobs',_n_);
198 run;

NOTE: Numeric values have been converted to character values at the places given by: (Line)Smiley SadColumn).
197:34
NOTE: There were 10000 observations read from the data set WORK.TEMP.
NOTE: DATA statement used (Total process time):
real time 0.25 seconds
cpu time 0.26 seconds


199
200 %macro data;
201 data temp;
202 set temp;
203 select(_n_);
204 %do i=1 %to &nobs;
205 when (&i) d = &&d&i ;
206 %end;
207 end;
208 run;
209 %mend;
210
211 %data

NOTE: There were 10000 observations read from the data set WORK.TEMP.
NOTE: The data set WORK.TEMP has 10000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 5.39 seconds
cpu time 5.32 seconds

[pre]




Ksharp
Super User
Posts: 10,046

Re: Resolving a formula dynamically

Posted in reply to deleted_user
Hi.
I found another way.It looks like 'call execute' can do it absolutely.

[pre]
options nonote;
data temp;
do i=1 to 100;
a=2; b=5; c='a*b'; output;
a=5; b=6; c='a+b'; output;
end;
run;

data temp;
set temp end=last;
if _n_ eq 1 then call execute('data result;');
call execute(cats('a=',a,';'));
call execute(cats('b=',b,';'));
call execute(cats("c='",c,"';"));
call execute(cats('d=',c,';'));
call execute('output;');
if last then call execute('run;');
run;
[/pre]




Ksharp
PROC Star
Posts: 1,760

Re: Resolving a formula dynamically

chang_y_chung's code is lot more efficient: you only process the data once, and you don't create a huge data step afterward.

Creating TWO is 10 times faster than creating THREE.

If you increase the number of loopings to 1e5, THREE runs out of memory.
[pre]
data ONE;
do _I = 1 to 1e4;
VAR1=ranuni(0); VAR2=ranuni(1); VAR3='VAR1+VAR2'; output;
VAR1=ranuni(2); VAR2=ranuni(3); VAR3='VAR1-VAR2'; output;
end;
run;

data TWO;
set ONE;
retain POPEN '(' PCLOSE ')';
length _M $200;
_M = tranwrd(VAR3, 'VAR1', cats(POPEN, VAR1, PCLOSE));
_M = tranwrd(_M , 'VAR2', cats(POPEN, VAR2, PCLOSE));
_M = cats('%sysevalf', POPEN, _M, PCLOSE);
VAR4 = input(resolve(_M),32.);
keep VAR1 VAR2 VAR3 VAR4;
run;

option nosource;
data _null_;
set ONE end=LASTOBS;
if _N_ eq 1 then call execute('data THREE;');
call execute(cats('VAR1=',VAR1 ,';'));
call execute(cats('VAR2=',VAR2 ,';'));
call execute(cats('VAR3=',quote(VAR3),';'));
call execute(cats('VAR4=',VAR3 ,';'));
call execute('output;');
if LASTOBS then call execute('run;');
run;
option source;
Super User
Posts: 10,046

Re: Resolving a formula dynamically

You are right.


Regards.

Ksharp
Ask a Question
Discussion stats
  • 11 replies
  • 241 views
  • 0 likes
  • 6 in conversation