BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi there. I'd like to know if there's anybody out there who's got a better idea than mine to resolve a formula dynamically. Unfortunately, a macro couldn't help me. Or could it?

The problem: A table contains values in columns a, b and a formula in column c which decided the result which is stored in column d:
For index 1, columns a, b, and c contain 1, 2, and a+b, thus d should be 3.
For index 2, columns a, b, and c contain 1, 2, and a*b, thus d should be 2.
Columns a and be may contain other numeric values, c may contain other formulas in standard notation, and d should end up showing the result of the resolved formula in c.

One solution that works, but is incredibly slow when the source data set is, say, more than 10.000 rows:
data _null_;
set abcd;
call execute( "proc sql; update problem; set d = " || c || " where index = || index || "; quit;");
run;

I also tried three call executes, where the first one created a temporary 1-row data set, the second appended this to the master data set, and the third deleted the temporary data set. The append was slow, so I tried the update routine. Unfortunately, this was slow, too.

A short explanation to why I'm using call execute: Macros couldn't help me, since they evaluated in a completely different context from when my data steps did.

If you have any questions, be my guest, and if you have any new ideas, I think I love you!
11 REPLIES 11
Cynthia_sas
SAS Super FREQ
Hi:
Did you try the "old school" approach instead of CALL EXECUTE?? See the program and log below. Note that the formula used for C must be a valid expression for an assignment statement and that for more complex formulas, you may end up needing to use macro quoting functions to protect the operators. Also, I used a "brute force" set of 4 IF statements -- there are ways to be more elegant with the macro code and probably more brief.

cynthia

The Program:
[pre]
data abcd;
infile datalines;
input index a b c $;
call symput('form'||put(index,1.0),trim(c));
call symput('lastind',put(index,2.0));
return;
datalines;
1 1 2 a+b
2 1 2 a*b
3 1 2 a/b
4 1 2 a**b
;
run;

%put form1= &form1;
%put form2= &form2;
%put lastind= &lastind;

** Use the macro variable to "type" the formula into the program;
** You could make this program automatically generated with a macro;
** program, but depending on how many values you have for INDEX, this;
** approach is the quickest and easiest to code;
data new;
set abcd;
if index = 1 then d = &form1;
else if index = 2 then d = &form2;
else if index = 3 then d = &form3;
else if index = 4 then d = &form4;
putlog index= a= b= c= d=;
run;

ods listing;
proc print data=new;
title 'What is in the new dataset?';
run;
[/pre]

The LOG showing the use of the macro variable in PUTLOG:
[pre]
1065 %put form1= &form1;
form1= a+b
1066 %put form2= &form2;
form2= a*b
1067 %put lastind= &lastind;
lastind= 4
1068
1069 ** Use the macro variable to "type" the formula into the
1069! program;
1070 ** You could make this program automatically generated
1070! with a macro;
1071 ** program, but depending on how many values you have for
1071! INDEX, this;
1072 ** approach is the quickest and easiest to code;
1073 data new;
1074 set abcd;
1075 if index = 1 then d = &form1;
1076 else if index = 2 then d = &form2;
1077 else if index = 3 then d = &form3;
1078 else if index = 4 then d = &form4;
1079 putlog index= a= b= c= d=;
1080 run;

index=1 a=1 b=2 c=a+b d=3
index=2 a=1 b=2 c=a*b d=2
index=3 a=1 b=2 c=a/b d=0.5
index=4 a=1 b=2 c=a**b d=1
NOTE: There were 4 observations read from the data set
WORK.ABCD.
NOTE: The data set WORK.NEW has 4 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds


1081
1082 ods listing;
1083 proc print data=new;
1084 title 'What is in the new dataset?';
1085 run;

NOTE: There were 4 observations read from the data set WORK.NEW.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

[/pre]
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Anders Eggum,

I do not understand why macro is not working. How about this solution:
[pre]
data i;
i=1; a=1; b=1; c="a*b"; output;
i=2; a=1; b=1; c="a+b"; output;
run;
proc SQL;
select COUNT(*) as n into :n from i;
%let n=%TRIM(&n);
select c into :c1-:c&n from i;
quit;
%macro a;
data r;
set i;
%do i=1 %to &n;
if &i EQ _n_ then d=&&c&i;
%end;
run;
%mend;
%a;
[/pre]
Sincerely,
SPR
Ksharp
Super User
Hi.
your code has lots lots of if statement which is a big mistake for a programmer,especially when have lots of if statement.And I am sure @chang_y_chung@hotmail.com will mock you very much.

I agree with Cynthia ,it is only execute one if statement each data step.And yours will execute lots lots of if statements for each data step to exhaust system source.
So I supported Cynthia.

And I also am curious that why your code can not use macro ? Especia for your situation which is not included function.

Ksharp
deleted_user
Not applicable
The problem I had with my macro variables is that they resolved before or after my data step. Somehow, this didn't work. Unfortunately, I didn't keep the bad code.

Anyway, thank you, both Cynthia and SPR for your replies. I'll check those solutions out. Here's another solution, which did work well (still a bit slow, but much faster than my call execute solution):

*a test dataset (I tried this program out with 100,000 rows, which took about 5 minutes. So one million rows will still take a while...);
data test;
a=1; b=2; c='a+b'; d=0; output;
a=1; b=2; c='a-b'; d=0; output;
run;

*create a temporary file;
filename test temp;

*build a sas program in the temporary file;
*for each row in input dataset, resolve the function in c into d;
data _null_;
set test end=last;
file test;
if _n_=1 then put 'data out;' /
'set test;' /
'if _n_=' _n_ 'then d=' c ';';
else put 'else if _n_ = ' _n_ 'then d=' c ';';
if last then put 'run;';
run;

*include the generated sas program, which runs the resolved functions;
%inc test;
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Ksharp,

I did not see mocking remarks on my code from Chang_y_chung. However, I tried to optimize my code and got the following result:[pre]
data i;
do l=1 to 500000;
i=1; a=1; b=1; c="a*b"; output;
i=2; a=1; b=1; c="a+b"; output;
end;
run;
options nomlogic nosymbolgen;
proc SQL ;
select COUNT(*) as n into :n from i;
quit;
%let n=%TRIM(&n);
%macro b(p=);
%do i=&p %to &n %by &p;
data a;
set i;
if %EVAL(&i-&p) < _n_ <= &i then output;
run;
proc SQL noprint;
select COUNT(*) as n into :n1 from a;
%let n1=%TRIM(&n1);
select c into :c1-:c&n1 from i;
quit;
data b;
set a;
%do j=1 %to &n1;
if &j EQ _n_ then do;
d=&&c&j;
end;
%end;
run;
%if &i=&p %then %do; data r; set b; run; %end;
%else %do; data r; set r b; run; %end;
%end;
%mend b;
%b(p=5000);
;
It does not compete with Chang's code and runs about 5 minutes for 1M observations.
Sincerely,
SPR
chang_y_chung_hotmail_com
Obsidian | Level 7
If you don't need to maintain much numeric precision, then you can do something like below to take advantage of %sysevalf(). On my moderate PC, the second data step has 1M obs but ran in less than 30 seconds. Hope this helps a bit.



   /* test data */


   data one;


     do i = 1 to 1e6/2;


       a=1; b=2; c='a+b'; d=0output;


       a=1; b=2; c='a-b'; d=0output;


     end;


   run;


 


   data two;


     set one;


     length m $200;


     retain popen '(' pclose ')';


     m = c;


     m = transtrn(m, 'a', catt(popen, put(a,best.-l), pclose));


     m = transtrn(m, 'b', catt(popen, put(b,best.-l), pclose));


     m = catt('%sysevalf', popen, m, pclose);


     d = input(resolve(m), best.);


     keep a b c d;


   run;


   /* on log


   NOTE: There were 1000000 observations read from the data set WORK.ONE.


   NOTE: The data set WORK.TWO has 1000000 observations and 4 variables.


   NOTE: DATA statement used (Total process time):


         real time           25.53 seconds


         cpu time            25.52 seconds


   */


 


   proc print data=two(obs=3);


   run;


   /* on lst


    Obs    a    b     c      d


      1    1    2    a+b     3


      2    1    2    a-b    -1


      3    1    2    a+b     3


  */

deleted_user
Not applicable
Thanks for a very interesting little piece of code. I'll give it a go. Thanks a lot to all of you. I hope I can contribute back to the forum, too... I'm impressed with the quick response and your good ideas. I'll make sure to provide some feedback when I try this out. For right now, I have to finish up some tasks that depended on these results, but I need to optimize the code afterwards, and that is when this comes in handy. Maybe tomorrow, maybe next week.
Ksharp
Super User
Just one more choice.
Under my IBM R50 notebook, It spents several second to process ten thousands observations.

[pre]
data _null_;
set temp end=last;
call symput(cats('d',_n_),c);
if last then call symput('nobs',_n_);
run;

%macro data;
data temp;
set temp;
select(_n_);
%do i=1 %to &nobs;
when (&i) d = &&d&i ;
%end;
end;
run;
%mend;

%data

[pre]





[pre]

194 data _null_;
195 set temp end=last;
196 call symput(cats('d',_n_),c);
197 if last then call symput('nobs',_n_);
198 run;

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
197:34
NOTE: There were 10000 observations read from the data set WORK.TEMP.
NOTE: DATA statement used (Total process time):
real time 0.25 seconds
cpu time 0.26 seconds


199
200 %macro data;
201 data temp;
202 set temp;
203 select(_n_);
204 %do i=1 %to &nobs;
205 when (&i) d = &&d&i ;
206 %end;
207 end;
208 run;
209 %mend;
210
211 %data

NOTE: There were 10000 observations read from the data set WORK.TEMP.
NOTE: The data set WORK.TEMP has 10000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 5.39 seconds
cpu time 5.32 seconds

[pre]




Ksharp
Ksharp
Super User
Hi.
I found another way.It looks like 'call execute' can do it absolutely.

[pre]
options nonote;
data temp;
do i=1 to 100;
a=2; b=5; c='a*b'; output;
a=5; b=6; c='a+b'; output;
end;
run;

data temp;
set temp end=last;
if _n_ eq 1 then call execute('data result;');
call execute(cats('a=',a,';'));
call execute(cats('b=',b,';'));
call execute(cats("c='",c,"';"));
call execute(cats('d=',c,';'));
call execute('output;');
if last then call execute('run;');
run;
[/pre]




Ksharp
ChrisNZ
Tourmaline | Level 20
chang_y_chung's code is lot more efficient: you only process the data once, and you don't create a huge data step afterward.

Creating TWO is 10 times faster than creating THREE.

If you increase the number of loopings to 1e5, THREE runs out of memory.
[pre]
data ONE;
do _I = 1 to 1e4;
VAR1=ranuni(0); VAR2=ranuni(1); VAR3='VAR1+VAR2'; output;
VAR1=ranuni(2); VAR2=ranuni(3); VAR3='VAR1-VAR2'; output;
end;
run;

data TWO;
set ONE;
retain POPEN '(' PCLOSE ')';
length _M $200;
_M = tranwrd(VAR3, 'VAR1', cats(POPEN, VAR1, PCLOSE));
_M = tranwrd(_M , 'VAR2', cats(POPEN, VAR2, PCLOSE));
_M = cats('%sysevalf', POPEN, _M, PCLOSE);
VAR4 = input(resolve(_M),32.);
keep VAR1 VAR2 VAR3 VAR4;
run;

option nosource;
data _null_;
set ONE end=LASTOBS;
if _N_ eq 1 then call execute('data THREE;');
call execute(cats('VAR1=',VAR1 ,';'));
call execute(cats('VAR2=',VAR2 ,';'));
call execute(cats('VAR3=',quote(VAR3),';'));
call execute(cats('VAR4=',VAR3 ,';'));
call execute('output;');
if LASTOBS then call execute('run;');
run;
option source;
Ksharp
Super User
You are right.


Regards.

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1420 views
  • 0 likes
  • 6 in conversation