Consider the following log:
1552 if ( anrhi > .z
1553 and anrhi < aval <= 3.0 * anrhi
1554 and bnrlo > .z
1555 and bnrlo <= base <= bnrhi
1556 )
1557 or ( base > .z
1558 and 1.5 * base <= aval <= 3.0 * base
1559 and ~ ( bnrlo > .z
1560 and bnrlo <= base <= bnrhi
1561 )
1562 )
1563 then atoxgrh = "1" ;
1564
1565 if ( anrhi > .z
1566 and anrhi < aval <= 3.0 * anrhi
1567 and bnrlo > .z
1568 and bnrlo <= base <= bnrhi
1569 )
1570 or ( base > .z
1571 and 1.5 * base <= aval <= 3.0 * base
1572 and ( bnrlo > .z
1573 and bnrlo <= base <= bnrhi
1574 )
1575 )
1576 then atoxgrh = "T" ;
1577
1578 end ;
1579 end ;
1580 run ;
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
18 at 1553:33 36 at 1558:17 36 at 1558:39
NOTE: There were 105 observations read from the data set WORK.BCP.
WHERE paramcd='ALT';
NOTE: The data set WORK.LB_TOX has 105 observations and 97 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
I copied the statement and deleted the tilde (~) for the second. SAS does not generate the NOTE: for the second statement.
Column 33 is the space after "*" and before "anrhi":
3.0 * anrhi
^
^
I was thinking that since anrhi > .z failed that the subsequent conditions would not be executed. That seems to be true if I omit the tilde. What am I overlooking?
Any corrections or citations are appreciated.
PS This is part of a version 5.0 CTCAE toxicity grade macro.
Hey Kevin,
I don't think short-circuiting is actually defined for SAS. I did see a bit in the DS2 documentation where it says:
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/ds2pg/p0vinswrxk1819n1qizkibfin0cy.htm SAS does not guarantee short-circuit evaluation. When using Boolean operators to join expressions, you might get undesired results if your intention is to short circuit, or avoid the evaluation of, the second expression. To guarantee the order in which SAS evaluates an expression, you can rewrite the expression using nested IF statements.
My guess is a similar thing might be happing on your IF statement.
Even this simple statement doesn't seem to short-circuit:
1 data _null_ ; 2 x=. ; 3 if 0 and x*1 then foo=1; 4 run ; NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 3:13
I think of it kind of like the SQL optimizer, where I don't get to tell SAS how to evaluate an expression, so SAS can decide whether to short-circuit or not.
Notice that you received a "NOTE" not an "ERROR" message, so SAS will do exactly what it is telling you in the log, fill in a missing value and continue processing.
Indeed, but I want a "clean" log because I report the log check to the lst file.
In general, I want our programmers to command their programs and to know when a value is missing. Such a message may indicate an issue that I may need to investigate.
You should include the entire log of that data step.
And best to provide some example of data that creates the same messages.
Then, what exactly is your question? SAS found missing values where you attempted to use them in the code.
When you start using lots of conditions nested inside () then the behavior of "short-cutting" a comparison may change because your code imposes additional order constraints.
The * is a priority 2 instruction and is evaluated before the AND or any of the value comparisons.
Search your online documentation for "Order of Evaluation in Compound Expressions"
I can't get the documentation site to return a search with a small enough set to find this or I would include this.
If you really want a part of an expression evaluated before another you may need to use some () to enforce your required order.
Hey Kevin,
I don't think short-circuiting is actually defined for SAS. I did see a bit in the DS2 documentation where it says:
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/ds2pg/p0vinswrxk1819n1qizkibfin0cy.htm SAS does not guarantee short-circuit evaluation. When using Boolean operators to join expressions, you might get undesired results if your intention is to short circuit, or avoid the evaluation of, the second expression. To guarantee the order in which SAS evaluates an expression, you can rewrite the expression using nested IF statements.
My guess is a similar thing might be happing on your IF statement.
Even this simple statement doesn't seem to short-circuit:
1 data _null_ ; 2 x=. ; 3 if 0 and x*1 then foo=1; 4 run ; NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 3:13
I think of it kind of like the SQL optimizer, where I don't get to tell SAS how to evaluate an expression, so SAS can decide whether to short-circuit or not.
SAS doesn't shortcut, at least to my knowledge. That's why it is safe to use LAG in a compound condition.
Hi @Kurt_Bremser ,
I think SAS does sometimes short-circuit. See e.g. Rick Wicklin's post https://blogs.sas.com/content/iml/2019/08/14/short-circuit-evaluation-and-logical-ligatures-in-sas.h... and Chris Hemedinger's post where he says it's not safe to use LAG in a compound expression, due to the potential for short-circuiting: https://blogs.sas.com/content/sasdummy/2012/01/03/pitfalls-of-the-lag-function/
Here's a little example of apparent short-circuiting:
1 data _null_; 2 x= -1 ; 3 y= . ; 4 5 if (x>0) & ((y+1)>0) then foo=1; *This does short circuit ; 6 if (0) & ((y+1)>0) then foo=1; *This does NOT short circuit ; 7 if (x>0) & ~((y+1)>0) then foo=1; *This does NOT short circuit ; 8 run; NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 6:18 1 at 7:18
In addition, the example shows short-circuiting. The tilde (not) is short-circuiting the short circuit, like the "if 0" did in your first example.
I will re-write that.
I note, without example, that %IF statements (in macros) suffer the same. I nest %IF statements instead of using and AND with %SYMEXIST.
@Quentin wrote:
I think SAS does sometimes short-circuit. See e.g. Rick Wicklin's post https://blogs.sas.com/content/iml/2019/08/14/short-circuit-evaluation-and-logical-ligatures-in-sas.h... and Chris Hemedinger's post where he says it's not safe to use LAG in a compound expression, due to the potential for short-circuiting: https://blogs.sas.com/content/sasdummy/2012/01/03/pitfalls-of-the-lag-function/
Hi @Quentin,
I agree that short-circuiting sometimes occurs, but I believe that @Kurt_Bremser is right about the LAG function. At least that was my conclusion regarding the program from Chris Hemedinger's 2012 blog post in the 2018 discussion "If statement Short circuiting and Lag function". I would hope that the occurrence of the LAG function in an IF condition prevents short-circuiting as in the example below.
data have;
input a b c;
cards;
0 1 0
1 2 1
;
data _null_;
set have;
if a=1 & lag(b)=1 then put _n_; /* not short-circuiting */
run;
data _null_;
set have;
if a=1 & 1/c>0 then put _n_; /* short-circuiting */
run;
data _null_;
set have;
if a=1 & 1/c>0 & lag(b)=1 then put _n_; /* not short-circuiting */
run;
Log:
613 data have; 614 input a b c; 615 cards; NOTE: The data set WORK.HAVE has 2 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.04 seconds cpu time 0.04 seconds 618 ; 619 620 data _null_; 621 set have; 622 if a=1 & lag(b)=1 then put _n_; /* not short-circuiting */ 623 run; 2 NOTE: There were 2 observations read from the data set WORK.HAVE. NOTE: DATA statement used (Total process time): real time 0.08 seconds cpu time 0.07 seconds 624 625 data _null_; 626 set have; 627 if a=1 & 1/c>0 then put _n_; /* short-circuiting */ 628 run; 2 NOTE: There were 2 observations read from the data set WORK.HAVE. NOTE: DATA statement used (Total process time): real time 0.07 seconds cpu time 0.07 seconds 629 630 data _null_; 631 set have; 632 if a=1 & 1/c>0 & lag(b)=1 then put _n_; /* not short-circuiting */ 633 run; NOTE: Division by zero detected at line 632 column 11. a=0 b=1 c=0 _ERROR_=1 _N_=1 2 NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 632:11 NOTE: There were 2 observations read from the data set WORK.HAVE. NOTE: DATA statement used (Total process time): real time 0.12 seconds cpu time 0.10 seconds
Thanks @FreelanceReinh . I appreciate your examples, and the links to prior discussion.
I share your hope the occurrence of the LAG function in an IF statement expression prevents short-circuiting. I wasn't able to make an example where an IF statement short-circuits with a lag, and your example is a nice one. So hopefully whatever the un-documented short circuiting rules are, they include some logic to say that if an expression involves LAG or DIF, then don't short-circuit.
I don't think that SAS short circuits evaluations in data steps. It might simplify obviously true or false WHERE conditions, but expressions in IF clauses or assignments statements have all of the expressions evaluated. Looking quickly are either of those links and I do not see anything that is actually about short circuiting. Chris's blog post is about user logic that does not always execute the LAG() function, not that SAS is short circuiting a calculation and causing it not to execute the LAG() function.
Here is an example where SAS could have short circuited the logic and does not. (Note perhaps the example is too simple as SAS might not bother with even attempting to short circuit).
data test;
do a=.,1;
do b=.,1;
output;
end;
end;
run;
data test2;
set test;
c = a*1 and b*1 ;
run;
proc print data= test2;
run;
The notes clearly show that both missing values of A and B cause multiplication with missing values. Even though for the either the second or third observations (depending on order of operation) one of them could have been avoided.
123 data test2; 124 set test; 125 c = a*1 and b*1 ; 126 run; NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 2 at 125:8 2 at 125:16 NOTE: There were 4 observations read from the data set WORK.TEST. NOTE: The data set WORK.TEST2 has 4 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Data:
Obs a b c 1 . . 0 2 . 1 0 3 1 . 0 4 1 1 1
I'm confused @Tom . I agree that Chris's post was not about short-circuiting in the sense I had assumed from the title (my fault for not reading much of it before citing it. : )
But I still see my example, and @FreelanceReinh 's better example, as examples of the IF statement short-circuiting:
1
2 data _null_;
3 x= 0 ;
4 y= . ;
5
6 if (x > 0) & (y+1) then foo=1; *This does short circuit ;
7 if (x > 0) & (1/x) then foo=1; *This does short circuit ;
8 run;
You don't get a missing values note or a division by error.
If you add a NOT, SAS decides not to short-circuit:
10 data _null_; 11 x= 0 ; 12 y= . ; 13 14 if (x > 0) & NOT (y+1) then foo=1; *This does NOT short circuit ; 15 if (x > 0) & NOT (1/x) then foo=1; *This does NOT short circuit ; 16 run; NOTE: Division by zero detected at line 15 column 22. x=0 y=. foo=. _ERROR_=1 _N_=1 NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 14:22 NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 15:22
I suppose it's possible that the assignment statement doesn't short-circuit, but it seems clear to me that the IF statement does.
Thanks for the interesting consideration of assignment statements.
Here's an example showing that short-circuiting (if it exists at all) has certain limitations:
%let n=300; /* compare to 301 */
data have;
array a[&n];
a1=1;
run;
%macro cond;
%do i=1 %to &n;
a&i+0=0 &
%end;
%mend cond;
data _null_;
set have;
if %cond 1;
run;
So the condition a&i+0=0 is never met and this could already be seen from the beginning "a1+0=0 & ..." since a1=1. Yet, for &n=301, ..., 1000 the note "operation on missing values" does occur (with a list of &n-1 "(Line):(Column)" positions), demonstrating that no short-circuiting has happened at all. However, for &n=2, ..., 300 the note does not occur, suggesting that short-circuiting has been applied. (Checked in a macro with Windows SAS 9.4M5.)
Using a different HAVE dataset, the (not-)short-circuiting can be observed in more detail:
%let n=300;
%let k=123;
data have;
array a[&n] (&k*0 %eval(&n-&k)*.);
run;
data _null_;
set have;
if %cond 1;
run;
Now the condition a&i+0=0 is met for the first &k variables, but not for the rest. With 2<=&n<=300 and 1<=&k<=&n-1 the log reports the "operation on missing values" for exactly one place:
NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 370:5
where the line number (here: 370) always equals 3*&k+1, suggesting that the first missing value in the array causes the note (and terminates the check of the IF condition), which is plausible assuming short-circuiting and a sequential check of the individual conditions a&i+0=0. As in the original example, no short-circuiting seems to occur with &n>300: operations on missing values are then detected in all lines 3*i+1, i=&k, ..., &n-1.
Given the above results, I would have thought that there might be a performance gain when in a Boolean expression of the form B1 & B2 & ... & B300 the first check (B1) fails vs. only the last (B300). But even with 10 million observations (22 GB) the run time differences were not very convincing:
1 %let n=300; 2 3 data have; 4 array a[&n] (&n*0); 5 a1=1; 6 do _n_=1 to 10000000; 7 output; 8 end; 9 run; NOTE: The data set WORK.HAVE has 10000000 observations and 300 variables. NOTE: DATA statement used (Total process time): real time 17.77 seconds cpu time 17.76 seconds 10 11 %macro cond; 12 %do i=1 %to &n; 13 a&i=0 & 14 %end; 15 %mend cond; 16 17 data _null_; 18 set have; 19 if %cond 1; 20 run; NOTE: There were 10000000 observations read from the data set WORK.HAVE. NOTE: DATA statement used (Total process time): real time 15.88 seconds cpu time 15.86 seconds 21 22 data have; 23 array a[&n] (&n*0); 24 a&n=1; 25 do _n_=1 to 10000000; 26 output; 27 end; 28 run; NOTE: The data set WORK.HAVE has 10000000 observations and 300 variables. NOTE: DATA statement used (Total process time): real time 20.66 seconds cpu time 20.67 seconds 29 30 data _null_; 31 set have; 32 if %cond 1; 33 run; NOTE: There were 10000000 observations read from the data set WORK.HAVE. NOTE: DATA statement used (Total process time): real time 18.52 seconds cpu time 18.51 seconds
If you want to prevent SAS from operating on the missing values then just write the code so that you prevent SAS from operating on the missing values. Make sure to test ALL of the variables that could have missing values.
If 3=n(of anzi bnrlo base) then
if ( anrhi < aval <= 3.0 * anrhi
and bnrlo <= base <= bnrhi
)
or ( 1.5 * base <= aval <= 3.0 * base
and not (bnrlo <= base <= bnrhi)
)
then atoxgrh = "1" ;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.