Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- Resolving a formula dynamically

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 12-07-2010 11:17 AM
(1766 views)

Hi there. I'd like to know if there's anybody out there who's got a better idea than mine to resolve a formula dynamically. Unfortunately, a macro couldn't help me. Or could it?

The problem: A table contains values in columns a, b and a formula in column c which decided the result which is stored in column d:

For index 1, columns a, b, and c contain 1, 2, and a+b, thus d should be 3.

For index 2, columns a, b, and c contain 1, 2, and a*b, thus d should be 2.

Columns a and be may contain other numeric values, c may contain other formulas in standard notation, and d should end up showing the result of the resolved formula in c.

One solution that works, but is incredibly slow when the source data set is, say, more than 10.000 rows:

data _null_;

set abcd;

call execute( "proc sql; update problem; set d = " || c || " where index = || index || "; quit;");

run;

I also tried three call executes, where the first one created a temporary 1-row data set, the second appended this to the master data set, and the third deleted the temporary data set. The append was slow, so I tried the update routine. Unfortunately, this was slow, too.

A short explanation to why I'm using call execute: Macros couldn't help me, since they evaluated in a completely different context from when my data steps did.

If you have any questions, be my guest, and if you have any new ideas, I think I love you!

The problem: A table contains values in columns a, b and a formula in column c which decided the result which is stored in column d:

For index 1, columns a, b, and c contain 1, 2, and a+b, thus d should be 3.

For index 2, columns a, b, and c contain 1, 2, and a*b, thus d should be 2.

Columns a and be may contain other numeric values, c may contain other formulas in standard notation, and d should end up showing the result of the resolved formula in c.

One solution that works, but is incredibly slow when the source data set is, say, more than 10.000 rows:

data _null_;

set abcd;

call execute( "proc sql; update problem; set d = " || c || " where index = || index || "; quit;");

run;

I also tried three call executes, where the first one created a temporary 1-row data set, the second appended this to the master data set, and the third deleted the temporary data set. The append was slow, so I tried the update routine. Unfortunately, this was slow, too.

A short explanation to why I'm using call execute: Macros couldn't help me, since they evaluated in a completely different context from when my data steps did.

If you have any questions, be my guest, and if you have any new ideas, I think I love you!

11 REPLIES 11

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi:

Did you try the "old school" approach instead of CALL EXECUTE?? See the program and log below. Note that the formula used for C must be a valid expression for an assignment statement and that for more complex formulas, you may end up needing to use macro quoting functions to protect the operators. Also, I used a "brute force" set of 4 IF statements -- there are ways to be more elegant with the macro code and probably more brief.

cynthia

The Program:

[pre]

data abcd;

infile datalines;

input index a b c $;

call symput('form'||put(index,1.0),trim(c));

call symput('lastind',put(index,2.0));

return;

datalines;

1 1 2 a+b

2 1 2 a*b

3 1 2 a/b

4 1 2 a**b

;

run;

%put form1= &form1;

%put form2= &form2;

%put lastind= &lastind;

** Use the macro variable to "type" the formula into the program;

** You could make this program automatically generated with a macro;

** program, but depending on how many values you have for INDEX, this;

** approach is the quickest and easiest to code;

data new;

set abcd;

if index = 1 then d = &form1;

else if index = 2 then d = &form2;

else if index = 3 then d = &form3;

else if index = 4 then d = &form4;

putlog index= a= b= c= d=;

run;

ods listing;

proc print data=new;

title 'What is in the new dataset?';

run;

[/pre]

The LOG showing the use of the macro variable in PUTLOG:

[pre]

1065 %put form1= &form1;

form1= a+b

1066 %put form2= &form2;

form2= a*b

1067 %put lastind= &lastind;

lastind= 4

1068

1069 ** Use the macro variable to "type" the formula into the

1069! program;

1070 ** You could make this program automatically generated

1070! with a macro;

1071 ** program, but depending on how many values you have for

1071! INDEX, this;

1072 ** approach is the quickest and easiest to code;

1073 data new;

1074 set abcd;

1075 if index = 1 then d = &form1;

1076 else if index = 2 then d = &form2;

1077 else if index = 3 then d = &form3;

1078 else if index = 4 then d = &form4;

1079 putlog index= a= b= c= d=;

1080 run;

index=1 a=1 b=2 c=a+b d=3

index=2 a=1 b=2 c=a*b d=2

index=3 a=1 b=2 c=a/b d=0.5

index=4 a=1 b=2 c=a**b d=1

NOTE: There were 4 observations read from the data set

WORK.ABCD.

NOTE: The data set WORK.NEW has 4 observations and 5 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

1081

1082 ods listing;

1083 proc print data=new;

1084 title 'What is in the new dataset?';

1085 run;

NOTE: There were 4 observations read from the data set WORK.NEW.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

[/pre]

Did you try the "old school" approach instead of CALL EXECUTE?? See the program and log below. Note that the formula used for C must be a valid expression for an assignment statement and that for more complex formulas, you may end up needing to use macro quoting functions to protect the operators. Also, I used a "brute force" set of 4 IF statements -- there are ways to be more elegant with the macro code and probably more brief.

cynthia

The Program:

[pre]

data abcd;

infile datalines;

input index a b c $;

call symput('form'||put(index,1.0),trim(c));

call symput('lastind',put(index,2.0));

return;

datalines;

1 1 2 a+b

2 1 2 a*b

3 1 2 a/b

4 1 2 a**b

;

run;

%put form1= &form1;

%put form2= &form2;

%put lastind= &lastind;

** Use the macro variable to "type" the formula into the program;

** You could make this program automatically generated with a macro;

** program, but depending on how many values you have for INDEX, this;

** approach is the quickest and easiest to code;

data new;

set abcd;

if index = 1 then d = &form1;

else if index = 2 then d = &form2;

else if index = 3 then d = &form3;

else if index = 4 then d = &form4;

putlog index= a= b= c= d=;

run;

ods listing;

proc print data=new;

title 'What is in the new dataset?';

run;

[/pre]

The LOG showing the use of the macro variable in PUTLOG:

[pre]

1065 %put form1= &form1;

form1= a+b

1066 %put form2= &form2;

form2= a*b

1067 %put lastind= &lastind;

lastind= 4

1068

1069 ** Use the macro variable to "type" the formula into the

1069! program;

1070 ** You could make this program automatically generated

1070! with a macro;

1071 ** program, but depending on how many values you have for

1071! INDEX, this;

1072 ** approach is the quickest and easiest to code;

1073 data new;

1074 set abcd;

1075 if index = 1 then d = &form1;

1076 else if index = 2 then d = &form2;

1077 else if index = 3 then d = &form3;

1078 else if index = 4 then d = &form4;

1079 putlog index= a= b= c= d=;

1080 run;

index=1 a=1 b=2 c=a+b d=3

index=2 a=1 b=2 c=a*b d=2

index=3 a=1 b=2 c=a/b d=0.5

index=4 a=1 b=2 c=a**b d=1

NOTE: There were 4 observations read from the data set

WORK.ABCD.

NOTE: The data set WORK.NEW has 4 observations and 5 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

1081

1082 ods listing;

1083 proc print data=new;

1084 title 'What is in the new dataset?';

1085 run;

NOTE: There were 4 observations read from the data set WORK.NEW.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

[/pre]

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello Anders Eggum,

I do not understand why macro is not working. How about this solution:

[pre]

data i;

i=1; a=1; b=1; c="a*b"; output;

i=2; a=1; b=1; c="a+b"; output;

run;

proc SQL;

select COUNT(*) as n into :n from i;

%let n=%TRIM(&n);

select c into :c1-:c&n from i;

quit;

%macro a;

data r;

set i;

%do i=1 %to &n;

if &i EQ _n_ then d=&&c&i;

%end;

run;

%mend;

%a;

[/pre]

Sincerely,

SPR

I do not understand why macro is not working. How about this solution:

[pre]

data i;

i=1; a=1; b=1; c="a*b"; output;

i=2; a=1; b=1; c="a+b"; output;

run;

proc SQL;

select COUNT(*) as n into :n from i;

%let n=%TRIM(&n);

select c into :c1-:c&n from i;

quit;

%macro a;

data r;

set i;

%do i=1 %to &n;

if &i EQ _n_ then d=&&c&i;

%end;

run;

%mend;

%a;

[/pre]

Sincerely,

SPR

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi.

your code has lots lots of if statement which is a big mistake for a programmer,especially when have lots of if statement.And I am sure @chang_y_chung@hotmail.com will mock you very much.

I agree with Cynthia ,it is only execute one if statement each data step.And yours will execute lots lots of if statements for each data step to exhaust system source.

So I supported Cynthia.

And I also am curious that why your code can not use macro ? Especia for your situation which is not included function.

Ksharp

your code has lots lots of if statement which is a big mistake for a programmer,especially when have lots of if statement.And I am sure @chang_y_chung@hotmail.com will mock you very much.

I agree with Cynthia ,it is only execute one if statement each data step.And yours will execute lots lots of if statements for each data step to exhaust system source.

So I supported Cynthia.

And I also am curious that why your code can not use macro ? Especia for your situation which is not included function.

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The problem I had with my macro variables is that they resolved before or after my data step. Somehow, this didn't work. Unfortunately, I didn't keep the bad code.

Anyway, thank you, both Cynthia and SPR for your replies. I'll check those solutions out. Here's another solution, which did work well (still a bit slow, but much faster than my call execute solution):

*a test dataset (I tried this program out with 100,000 rows, which took about 5 minutes. So one million rows will still take a while...);

data test;

a=1; b=2; c='a+b'; d=0; output;

a=1; b=2; c='a-b'; d=0; output;

run;

*create a temporary file;

filename test temp;

*build a sas program in the temporary file;

*for each row in input dataset, resolve the function in c into d;

data _null_;

set test end=last;

file test;

if _n_=1 then put 'data out;' /

'set test;' /

'if _n_=' _n_ 'then d=' c ';';

else put 'else if _n_ = ' _n_ 'then d=' c ';';

if last then put 'run;';

run;

*include the generated sas program, which runs the resolved functions;

%inc test;

Anyway, thank you, both Cynthia and SPR for your replies. I'll check those solutions out. Here's another solution, which did work well (still a bit slow, but much faster than my call execute solution):

*a test dataset (I tried this program out with 100,000 rows, which took about 5 minutes. So one million rows will still take a while...);

data test;

a=1; b=2; c='a+b'; d=0; output;

a=1; b=2; c='a-b'; d=0; output;

run;

*create a temporary file;

filename test temp;

*build a sas program in the temporary file;

*for each row in input dataset, resolve the function in c into d;

data _null_;

set test end=last;

file test;

if _n_=1 then put 'data out;' /

'set test;' /

'if _n_=' _n_ 'then d=' c ';';

else put 'else if _n_ = ' _n_ 'then d=' c ';';

if last then put 'run;';

run;

*include the generated sas program, which runs the resolved functions;

%inc test;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello Ksharp,

I did not see mocking remarks on my code from Chang_y_chung. However, I tried to optimize my code and got the following result:[pre]

data i;

do l=1 to 500000;

i=1; a=1; b=1; c="a*b"; output;

i=2; a=1; b=1; c="a+b"; output;

end;

run;

options nomlogic nosymbolgen;

proc SQL ;

select COUNT(*) as n into :n from i;

quit;

%let n=%TRIM(&n);

%macro b(p=);

%do i=&p %to &n %by &p;

data a;

set i;

if %EVAL(&i-&p) < _n_ <= &i then output;

run;

proc SQL noprint;

select COUNT(*) as n into :n1 from a;

%let n1=%TRIM(&n1);

select c into :c1-:c&n1 from i;

quit;

data b;

set a;

%do j=1 %to &n1;

if &j EQ _n_ then do;

d=&&c&j;

end;

%end;

run;

%if &i=&p %then %do; data r; set b; run; %end;

%else %do; data r; set r b; run; %end;

%end;

%mend b;

%b(p=5000);

;

It does not compete with Chang's code and runs about 5 minutes for 1M observations.

Sincerely,

SPR

I did not see mocking remarks on my code from Chang_y_chung. However, I tried to optimize my code and got the following result:[pre]

data i;

do l=1 to 500000;

i=1; a=1; b=1; c="a*b"; output;

i=2; a=1; b=1; c="a+b"; output;

end;

run;

options nomlogic nosymbolgen;

proc SQL ;

select COUNT(*) as n into :n from i;

quit;

%let n=%TRIM(&n);

%macro b(p=);

%do i=&p %to &n %by &p;

data a;

set i;

if %EVAL(&i-&p) < _n_ <= &i then output;

run;

proc SQL noprint;

select COUNT(*) as n into :n1 from a;

%let n1=%TRIM(&n1);

select c into :c1-:c&n1 from i;

quit;

data b;

set a;

%do j=1 %to &n1;

if &j EQ _n_ then do;

d=&&c&j;

end;

%end;

run;

%if &i=&p %then %do; data r; set b; run; %end;

%else %do; data r; set r b; run; %end;

%end;

%mend b;

%b(p=5000);

;

It does not compete with Chang's code and runs about 5 minutes for 1M observations.

Sincerely,

SPR

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you don't need to maintain much numeric precision, then you can do something like below to take advantage of %sysevalf(). On my moderate PC, the second data step has 1M obs but ran in less than 30 seconds. Hope this helps a bit.

/* test data */

**data** one;

do i = **1** to **1e6**/**2**;

a=**1**; b=**2**; c='a+b'; d=**0**; output;

a=**1**; b=**2**; c='a-b'; d=**0**; output;

end;

**run**;

**data** two;

set one;

length m $**200**;

retain popen '(' pclose ')';

m = c;

m = transtrn(m, 'a', catt(popen, put(a,best.-l), pclose));

m = transtrn(m, 'b', catt(popen, put(b,best.-l), pclose));

m = catt('%sysevalf', popen, m, pclose);

d = input(resolve(m), best.);

keep a b c d;

**run**;

/* on log

NOTE: There were 1000000 observations read from the data set WORK.ONE.

NOTE: The data set WORK.TWO has 1000000 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 25.53 seconds

cpu time 25.52 seconds

*/

**proc** **print** data=two(obs=**3**);

**run**;

/* on lst

Obs a b c d

1 1 2 a+b 3

2 1 2 a-b -1

3 1 2 a+b 3

*/

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for a very interesting little piece of code. I'll give it a go. Thanks a lot to all of you. I hope I can contribute back to the forum, too... I'm impressed with the quick response and your good ideas. I'll make sure to provide some feedback when I try this out. For right now, I have to finish up some tasks that depended on these results, but I need to optimize the code afterwards, and that is when this comes in handy. Maybe tomorrow, maybe next week.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Just one more choice.

Under my IBM R50 notebook, It spents several second to process ten thousands observations.

[pre]

data _null_;

set temp end=last;

call symput(cats('d',_n_),c);

if last then call symput('nobs',_n_);

run;

%macro data;

data temp;

set temp;

select(_n_);

%do i=1 %to &nobs;

when (&i) d = &&d&i ;

%end;

end;

run;

%mend;

%data

[pre]

[pre]

194 data _null_;

195 set temp end=last;

196 call symput(cats('d',_n_),c);

197 if last then call symput('nobs',_n_);

198 run;

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).

197:34

NOTE: There were 10000 observations read from the data set WORK.TEMP.

NOTE: DATA statement used (Total process time):

real time 0.25 seconds

cpu time 0.26 seconds

199

200 %macro data;

201 data temp;

202 set temp;

203 select(_n_);

204 %do i=1 %to &nobs;

205 when (&i) d = &&d&i ;

206 %end;

207 end;

208 run;

209 %mend;

210

211 %data

NOTE: There were 10000 observations read from the data set WORK.TEMP.

NOTE: The data set WORK.TEMP has 10000 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 5.39 seconds

cpu time 5.32 seconds

[pre]

Ksharp

Under my IBM R50 notebook, It spents several second to process ten thousands observations.

[pre]

data _null_;

set temp end=last;

call symput(cats('d',_n_),c);

if last then call symput('nobs',_n_);

run;

%macro data;

data temp;

set temp;

select(_n_);

%do i=1 %to &nobs;

when (&i) d = &&d&i ;

%end;

end;

run;

%mend;

%data

[pre]

[pre]

194 data _null_;

195 set temp end=last;

196 call symput(cats('d',_n_),c);

197 if last then call symput('nobs',_n_);

198 run;

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).

197:34

NOTE: There were 10000 observations read from the data set WORK.TEMP.

NOTE: DATA statement used (Total process time):

real time 0.25 seconds

cpu time 0.26 seconds

199

200 %macro data;

201 data temp;

202 set temp;

203 select(_n_);

204 %do i=1 %to &nobs;

205 when (&i) d = &&d&i ;

206 %end;

207 end;

208 run;

209 %mend;

210

211 %data

NOTE: There were 10000 observations read from the data set WORK.TEMP.

NOTE: The data set WORK.TEMP has 10000 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 5.39 seconds

cpu time 5.32 seconds

[pre]

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi.

I found another way.It looks like 'call execute' can do it absolutely.

[pre]

options nonote;

data temp;

do i=1 to 100;

a=2; b=5; c='a*b'; output;

a=5; b=6; c='a+b'; output;

end;

run;

data temp;

set temp end=last;

if _n_ eq 1 then call execute('data result;');

call execute(cats('a=',a,';'));

call execute(cats('b=',b,';'));

call execute(cats("c='",c,"';"));

call execute(cats('d=',c,';'));

call execute('output;');

if last then call execute('run;');

run;

[/pre]

Ksharp

I found another way.It looks like 'call execute' can do it absolutely.

[pre]

options nonote;

data temp;

do i=1 to 100;

a=2; b=5; c='a*b'; output;

a=5; b=6; c='a+b'; output;

end;

run;

data temp;

set temp end=last;

if _n_ eq 1 then call execute('data result;');

call execute(cats('a=',a,';'));

call execute(cats('b=',b,';'));

call execute(cats("c='",c,"';"));

call execute(cats('d=',c,';'));

call execute('output;');

if last then call execute('run;');

run;

[/pre]

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

chang_y_chung's code is lot more efficient: you only process the data once, and you don't create a huge data step afterward.

Creating TWO is 10 times faster than creating THREE.

If you increase the number of loopings to 1e5, THREE runs out of memory.

[pre]

data ONE;

do _I = 1 to 1e4;

VAR1=ranuni(0); VAR2=ranuni(1); VAR3='VAR1+VAR2'; output;

VAR1=ranuni(2); VAR2=ranuni(3); VAR3='VAR1-VAR2'; output;

end;

run;

data TWO;

set ONE;

retain POPEN '(' PCLOSE ')';

length _M $200;

_M = tranwrd(VAR3, 'VAR1', cats(POPEN, VAR1, PCLOSE));

_M = tranwrd(_M , 'VAR2', cats(POPEN, VAR2, PCLOSE));

_M = cats('%sysevalf', POPEN, _M, PCLOSE);

VAR4 = input(resolve(_M),32.);

keep VAR1 VAR2 VAR3 VAR4;

run;

option nosource;

data _null_;

set ONE end=LASTOBS;

if _N_ eq 1 then call execute('data THREE;');

call execute(cats('VAR1=',VAR1 ,';'));

call execute(cats('VAR2=',VAR2 ,';'));

call execute(cats('VAR3=',quote(VAR3),';'));

call execute(cats('VAR4=',VAR3 ,';'));

call execute('output;');

if LASTOBS then call execute('run;');

run;

option source;

Creating TWO is 10 times faster than creating THREE.

If you increase the number of loopings to 1e5, THREE runs out of memory.

[pre]

data ONE;

do _I = 1 to 1e4;

VAR1=ranuni(0); VAR2=ranuni(1); VAR3='VAR1+VAR2'; output;

VAR1=ranuni(2); VAR2=ranuni(3); VAR3='VAR1-VAR2'; output;

end;

run;

data TWO;

set ONE;

retain POPEN '(' PCLOSE ')';

length _M $200;

_M = tranwrd(VAR3, 'VAR1', cats(POPEN, VAR1, PCLOSE));

_M = tranwrd(_M , 'VAR2', cats(POPEN, VAR2, PCLOSE));

_M = cats('%sysevalf', POPEN, _M, PCLOSE);

VAR4 = input(resolve(_M),32.);

keep VAR1 VAR2 VAR3 VAR4;

run;

option nosource;

data _null_;

set ONE end=LASTOBS;

if _N_ eq 1 then call execute('data THREE;');

call execute(cats('VAR1=',VAR1 ,';'));

call execute(cats('VAR2=',VAR2 ,';'));

call execute(cats('VAR3=',quote(VAR3),';'));

call execute(cats('VAR4=',VAR3 ,';'));

call execute('output;');

if LASTOBS then call execute('run;');

run;

option source;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You are right.

Regards.

Ksharp

Regards.

Ksharp

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.