Thanks for the interesting consideration of assignment statements.
Here's an example showing that short-circuiting (if it exists at all) has certain limitations:
%let n=300; /* compare to 301 */
data have;
array a[&n];
a1=1;
run;
%macro cond;
%do i=1 %to &n;
a&i+0=0 &
%end;
%mend cond;
data _null_;
set have;
if %cond 1;
run;
So the condition a&i+0=0 is never met and this could already be seen from the beginning "a1+0=0 & ..." since a1=1. Yet, for &n=301, ..., 1000 the note "operation on missing values" does occur (with a list of &n-1 "(Line):(Column)" positions), demonstrating that no short-circuiting has happened at all. However, for &n=2, ..., 300 the note does not occur, suggesting that short-circuiting has been applied. (Checked in a macro with Windows SAS 9.4M5.)
Using a different HAVE dataset, the (not-)short-circuiting can be observed in more detail:
%let n=300;
%let k=123;
data have;
array a[&n] (&k*0 %eval(&n-&k)*.);
run;
data _null_;
set have;
if %cond 1;
run;
Now the condition a&i+0=0 is met for the first &k variables, but not for the rest. With 2<=&n<=300 and 1<=&k<=&n-1 the log reports the "operation on missing values" for exactly one place:
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 370:5
where the line number (here: 370) always equals 3*&k+1, suggesting that the first missing value in the array causes the note (and terminates the check of the IF condition), which is plausible assuming short-circuiting and a sequential check of the individual conditions a&i+0=0. As in the original example, no short-circuiting seems to occur with &n>300: operations on missing values are then detected in all lines 3*i+1, i=&k, ..., &n-1.
Given the above results, I would have thought that there might be a performance gain when in a Boolean expression of the form B1 & B2 & ... & B300 the first check (B1) fails vs. only the last (B300). But even with 10 million observations (22 GB) the run time differences were not very convincing:
1 %let n=300;
2
3 data have;
4 array a[&n] (&n*0);
5 a1=1;
6 do _n_=1 to 10000000;
7 output;
8 end;
9 run;
NOTE: The data set WORK.HAVE has 10000000 observations and 300 variables.
NOTE: DATA statement used (Total process time):
real time 17.77 seconds
cpu time 17.76 seconds
10
11 %macro cond;
12 %do i=1 %to &n;
13 a&i=0 &
14 %end;
15 %mend cond;
16
17 data _null_;
18 set have;
19 if %cond 1;
20 run;
NOTE: There were 10000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 15.88 seconds
cpu time 15.86 seconds
21
22 data have;
23 array a[&n] (&n*0);
24 a&n=1;
25 do _n_=1 to 10000000;
26 output;
27 end;
28 run;
NOTE: The data set WORK.HAVE has 10000000 observations and 300 variables.
NOTE: DATA statement used (Total process time):
real time 20.66 seconds
cpu time 20.67 seconds
29
30 data _null_;
31 set have;
32 if %cond 1;
33 run;
NOTE: There were 10000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 18.52 seconds
cpu time 18.51 seconds
... View more