BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
KevinViel
Pyrite | Level 9

Consider the following log:

 

1552  if    (     anrhi > .z
1553          and anrhi < aval <= 3.0 * anrhi
1554          and bnrlo > .z
1555          and bnrlo <= base <= bnrhi
1556        )
1557     or (     base > .z
1558          and 1.5 * base <= aval <= 3.0 * base
1559          and ~ (     bnrlo > .z
1560                  and bnrlo <= base <= bnrhi
1561                )
1562        )
1563  then atoxgrh = "1" ;
1564
1565  if    (     anrhi > .z
1566          and anrhi < aval <= 3.0 * anrhi
1567          and bnrlo > .z
1568          and bnrlo <= base <= bnrhi
1569        )
1570     or (     base > .z
1571          and 1.5 * base <= aval <= 3.0 * base
1572          and   (     bnrlo > .z
1573                  and bnrlo <= base <= bnrhi
1574                )
1575        )
1576  then atoxgrh = "T" ;
1577
1578  end ;
1579  end ;
1580  run ;

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      18 at 1553:33   36 at 1558:17   36 at 1558:39
NOTE: There were 105 observations read from the data set WORK.BCP.
      WHERE paramcd='ALT';
NOTE: The data set WORK.LB_TOX has 105 observations and 97 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds

I copied the statement and deleted the tilde (~) for the second.  SAS does not generate the NOTE: for the second statement.

Column 33 is the space after "*" and before "anrhi":

3.0 * anrhi
     ^
     ^

 I was thinking that since anrhi > .z failed that the subsequent conditions would not be executed.  That seems to be true if I omit the tilde.  What am I overlooking?

 

Any corrections or citations are appreciated.

 

PS This is part of a version 5.0 CTCAE toxicity grade macro.

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

Hey Kevin,

 

I don't think short-circuiting is actually defined for SAS.  I did see a bit in the DS2 documentation where it says:

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/ds2pg/p0vinswrxk1819n1qizkibfin0cy.htm

SAS does not guarantee short-circuit evaluation. When using Boolean operators to join expressions, you might get undesired results if your intention is to short circuit, or avoid the evaluation of, the second expression. To guarantee the order in which SAS evaluates an expression, you can rewrite the expression using nested IF statements.

My guess is a similar thing might be happing on your IF statement.

 

Even this simple statement doesn't seem to short-circuit:

1    data _null_ ;
2      x=. ;
3      if 0 and x*1 then foo=1;
4    run ;

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 3:13

I think of it kind of like the SQL optimizer, where I don't get to tell SAS how to evaluate an expression, so SAS can decide whether to short-circuit or not.

View solution in original post

13 REPLIES 13
JOL
SAS Employee JOL
SAS Employee

Notice that you received a "NOTE" not an "ERROR" message, so SAS will do exactly what it is telling you in the log, fill in a missing value and continue processing.  

KevinViel
Pyrite | Level 9

Indeed, but I want a "clean" log because I report the log check to the lst file.

 

In general, I want our programmers to command their programs and to know when a value is missing.  Such a message may indicate an issue that I may need to investigate.

ballardw
Super User

You should include the entire log of that data step.

And best to provide some example of data that creates the same messages.

Then, what exactly is your question? SAS found missing values where you attempted to use them in the code.

 

When you start using lots of conditions nested inside () then the behavior of "short-cutting" a comparison may change because your code imposes additional order constraints.

 

The * is a priority 2 instruction and is evaluated before the AND or any of the value comparisons.

Search your online documentation for "Order of Evaluation in Compound Expressions"

 

I can't get the documentation site to return a search with a small enough set to find this or I would include this.

 

If you really want a part of an expression evaluated before another you may need to use some () to enforce your required order.

Quentin
Super User

Hey Kevin,

 

I don't think short-circuiting is actually defined for SAS.  I did see a bit in the DS2 documentation where it says:

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/ds2pg/p0vinswrxk1819n1qizkibfin0cy.htm

SAS does not guarantee short-circuit evaluation. When using Boolean operators to join expressions, you might get undesired results if your intention is to short circuit, or avoid the evaluation of, the second expression. To guarantee the order in which SAS evaluates an expression, you can rewrite the expression using nested IF statements.

My guess is a similar thing might be happing on your IF statement.

 

Even this simple statement doesn't seem to short-circuit:

1    data _null_ ;
2      x=. ;
3      if 0 and x*1 then foo=1;
4    run ;

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 3:13

I think of it kind of like the SQL optimizer, where I don't get to tell SAS how to evaluate an expression, so SAS can decide whether to short-circuit or not.

Quentin
Super User

Hi @Kurt_Bremser ,

 

I think SAS does sometimes short-circuit.  See e.g. Rick Wicklin's post https://blogs.sas.com/content/iml/2019/08/14/short-circuit-evaluation-and-logical-ligatures-in-sas.h...   and Chris Hemedinger's post where he says it's not safe to use LAG in a compound expression, due to the potential for short-circuiting: https://blogs.sas.com/content/sasdummy/2012/01/03/pitfalls-of-the-lag-function/  

 

Here's a little example of apparent short-circuiting:

1    data _null_;
2      x= -1 ;
3      y= .  ;
4
5      if (x>0) &  ((y+1)>0) then foo=1; *This does short circuit ;
6      if (0)   &  ((y+1)>0) then foo=1; *This does NOT short circuit ;
7      if (x>0) & ~((y+1)>0) then foo=1; *This does NOT short circuit ;
8    run;

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 6:18   1 at 7:18
KevinViel
Pyrite | Level 9

In addition, the example shows short-circuiting.  The tilde (not) is short-circuiting the short circuit, like the "if 0" did in your first example.

 

I will re-write that.

 

I note, without example, that %IF statements (in macros) suffer the same.  I nest %IF statements instead of using and AND with %SYMEXIST.

FreelanceReinh
Jade | Level 19

@Quentin wrote:

I think SAS does sometimes short-circuit.  See e.g. Rick Wicklin's post https://blogs.sas.com/content/iml/2019/08/14/short-circuit-evaluation-and-logical-ligatures-in-sas.h...   and Chris Hemedinger's post where he says it's not safe to use LAG in a compound expression, due to the potential for short-circuiting: https://blogs.sas.com/content/sasdummy/2012/01/03/pitfalls-of-the-lag-function/  

Hi @Quentin,

 

I agree that short-circuiting sometimes occurs, but I believe that @Kurt_Bremser is right about the LAG function. At least that was my conclusion regarding the program from Chris Hemedinger's 2012 blog post in the 2018 discussion "If statement Short circuiting and Lag function". I would hope that the occurrence of the LAG function in an IF condition prevents short-circuiting as in the example below.

data have;
input a b c;
cards;
0 1 0
1 2 1
;

data _null_;
set have;
if a=1 & lag(b)=1 then put _n_; /* not short-circuiting */
run;

data _null_;
set have;
if a=1 & 1/c>0 then put _n_; /* short-circuiting */
run;

data _null_;
set have;
if a=1 & 1/c>0 & lag(b)=1 then put _n_; /* not short-circuiting */
run;

Log:

Spoiler
613   data have;
614   input a b c;
615   cards;

NOTE: The data set WORK.HAVE has 2 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.04 seconds


618   ;
619
620   data _null_;
621   set have;
622   if a=1 & lag(b)=1 then put _n_; /* not short-circuiting */
623   run;

2
NOTE: There were 2 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           0.08 seconds
      cpu time            0.07 seconds


624
625   data _null_;
626   set have;
627   if a=1 & 1/c>0 then put _n_; /* short-circuiting */
628   run;

2
NOTE: There were 2 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.07 seconds


629
630   data _null_;
631   set have;
632   if a=1 & 1/c>0 & lag(b)=1 then put _n_; /* not short-circuiting */
633   run;

NOTE: Division by zero detected at line 632 column 11.
a=0 b=1 c=0 _ERROR_=1 _N_=1
2
NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 632:11
NOTE: There were 2 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           0.12 seconds
      cpu time            0.10 seconds
Quentin
Super User

Thanks  @FreelanceReinh .  I appreciate your examples, and the links to prior discussion. 

 

I share your hope the occurrence of the LAG function in an IF statement expression prevents short-circuiting.  I wasn't able to make an example where an IF statement short-circuits with a lag, and your example is a nice one.  So hopefully whatever the un-documented short circuiting rules are, they include some logic to say that if an expression involves LAG or DIF, then don't short-circuit.

 

 

 

Tom
Super User Tom
Super User

I don't think that SAS short circuits evaluations in data steps.  It might simplify obviously true or false WHERE conditions, but expressions in IF clauses or assignments statements have all of the expressions evaluated.  Looking quickly are either of those links and I do not see anything that is actually about short circuiting.  Chris's blog post is about user logic that does not always execute the LAG() function, not that SAS is short circuiting a calculation and causing it not to execute the LAG() function.

 

Here is an example where SAS could have short circuited the logic and does not.  (Note perhaps the example is too simple as SAS might not bother with even attempting to short circuit).

data test;
  do a=.,1;
    do b=.,1;
      output;
    end;
  end;
run;

data test2;
  set test;
  c = a*1 and b*1 ;
run;

proc print data= test2;
run;

The notes clearly show that both missing values of A and B cause multiplication with missing values.  Even though for the either the second or third observations (depending on order of operation) one of them could have been avoided.

123  data test2;
124    set test;
125    c = a*1 and b*1 ;
126  run;

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      2 at 125:8    2 at 125:16
NOTE: There were 4 observations read from the data set WORK.TEST.
NOTE: The data set WORK.TEST2 has 4 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Data:

Obs    a    b    c

 1     .    .    0
 2     .    1    0
 3     1    .    0
 4     1    1    1
Quentin
Super User

I'm confused @Tom .  I agree that Chris's post was not about short-circuiting in the sense I had assumed from the title (my fault for not reading much of it before citing it. : )

 

But I still see my example, and @FreelanceReinh 's better example, as examples of the IF statement short-circuiting:

1
2    data _null_;
3      x= 0 ;
4      y= . ;
5
6      if (x > 0) & (y+1) then foo=1; *This does short circuit ;
7      if (x > 0) & (1/x) then foo=1; *This does short circuit ;
8    run;

You don't get a missing values note or a division by error. 

 

If you add a NOT, SAS decides not to short-circuit:

 

10   data _null_;
11     x= 0 ;
12     y= . ;
13
14     if (x > 0) & NOT (y+1) then foo=1; *This does NOT short circuit ;
15     if (x > 0) & NOT (1/x) then foo=1; *This does NOT short circuit ;
16   run;

NOTE: Division by zero detected at line 15 column 22.
x=0 y=. foo=. _ERROR_=1 _N_=1
NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 14:22
NOTE: Mathematical operations could not be performed at the following places. The results of the
      operations have been set to missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 15:22

I suppose it's possible that the assignment statement doesn't short-circuit, but it seems clear to me that the IF statement does.

FreelanceReinh
Jade | Level 19

Thanks for the interesting consideration of assignment statements.

 

Here's an example showing that short-circuiting (if it exists at all) has certain limitations:

%let n=300; /* compare to 301 */

data have;
array a[&n];
a1=1;
run;

%macro cond;
%do i=1 %to &n;
a&i+0=0 &
%end;
%mend cond;

data _null_;
set have;
if %cond 1;
run;

So the condition a&i+0=0 is never met and this could already be seen from the beginning "a1+0=0 & ..." since a1=1. Yet, for &n=301, ..., 1000 the note "operation on missing values" does occur (with a list of &n-1 "(Line):(Column)" positions), demonstrating that no short-circuiting has happened at all. However, for &n=2, ..., 300 the note does not occur, suggesting that short-circuiting has been applied. (Checked in a macro with Windows SAS 9.4M5.)

 

Using a different HAVE dataset, the (not-)short-circuiting can be observed in more detail:

%let n=300;
%let k=123;

data have;
array a[&n] (&k*0 %eval(&n-&k)*.);
run;

data _null_;
set have;
if %cond 1;
run;

Now the condition a&i+0=0 is met for the first &k variables, but not for the rest. With 2<=&n<=300 and 1<=&k<=&n-1 the log reports the "operation on missing values" for exactly one place:

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 370:5

where the line number (here: 370) always equals 3*&k+1, suggesting that the first missing value in the array causes the note (and terminates the check of the IF condition), which is plausible assuming short-circuiting and a sequential check of the individual conditions a&i+0=0. As in the original example, no short-circuiting seems to occur with &n>300: operations on missing values are then detected in all lines 3*i+1, i=&k, ..., &n-1.

 

Given the above results, I would have thought that there might be a performance gain when in a Boolean expression of the form B1 & B2 & ... & B300 the first check (B1) fails vs. only the last (B300). But even with 10 million observations (22 GB) the run time differences were not very convincing:

Spoiler
1     %let n=300;
2
3     data have;
4     array a[&n] (&n*0);
5     a1=1;
6     do _n_=1 to 10000000;
7       output;
8     end;
9     run;

NOTE: The data set WORK.HAVE has 10000000 observations and 300 variables.
NOTE: DATA statement used (Total process time):
      real time           17.77 seconds
      cpu time            17.76 seconds


10
11    %macro cond;
12    %do i=1 %to &n;
13    a&i=0 &
14    %end;
15    %mend cond;
16
17    data _null_;
18    set have;
19    if %cond 1;
20    run;

NOTE: There were 10000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           15.88 seconds
      cpu time            15.86 seconds


21
22    data have;
23    array a[&n] (&n*0);
24    a&n=1;
25    do _n_=1 to 10000000;
26      output;
27    end;
28    run;

NOTE: The data set WORK.HAVE has 10000000 observations and 300 variables.
NOTE: DATA statement used (Total process time):
      real time           20.66 seconds
      cpu time            20.67 seconds


29
30    data _null_;
31    set have;
32    if %cond 1;
33    run;

NOTE: There were 10000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           18.52 seconds
      cpu time            18.51 seconds

 

 

Tom
Super User Tom
Super User

If you want to prevent SAS from operating on the missing values then just write the code so that you prevent SAS from operating on the missing values.  Make sure to test ALL of the variables that could have missing values.

 

If 3=n(of anzi bnrlo base) then
if    ( anrhi < aval <= 3.0 * anrhi
    and bnrlo <= base <= bnrhi
      )
   or ( 1.5 * base <= aval <= 3.0 * base
    and not (bnrlo <= base <= bnrhi)
      )
then atoxgrh = "1" ;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 4665 views
  • 11 likes
  • 7 in conversation