Solved: Re: in () affecting missing values

Kastchei · Posted 10-09-2019 07:16 PM

Edited to make the example simpler.

I'm seeing a difference in how SAS is handling missing values when evaluated alongside a condition using the in () operator. The below code is really simplistic. The only difference between data steps ___b and ___c is how y is evaluated in the if statement: y = 6 vs. y in (6). However, the log shows the below missing NOTE for data step ___b, but not for data step ___c.

NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 10619:8

What it is about adding a condition using the in () operator that suddenly makes SAS try to resolve abs(x) whereas using = does not?

data ___a;
    x = .; y = 5; output;
run;

data ___b;
    set ___a;

    if y in (6) and 
       abs(x) = 5;
run;

data ___c;
    set ___a;

    if y = 6 and
       abs(x) = 5;
run;

Tom · Posted 10-10-2019 12:16 PM

Looks like SAS has not implemented building short circuit logic into the code generated when the condition uses the IN operator.

Remember that IN could refer to an array of variables rather than a simple list of constant values, so the compiler would need to know the difference between:

x in (1,2,3)

and

x in array_name

It needs to know that since for the first it could hard code the numbers at compile time, but would need to generate code to check the values of the variables in the array at execution time for the second example since they could change.

It is also probably more work to understand

x in (1,2,3)

than a simple single value comparison:

x = 6

View solution in original post

Shmuel · Posted 10-09-2019 08:57 PM

Sas checks condition one by one.

If y>5 then of course y is not in (1,2,3) so sas skips checking the next condition.

When you mask the IN the sas have to check is abs(x) =5 and write the note.

Kastchei · Posted 10-10-2019 10:39 AM

I'm not sure I'm following. If SAS checks each condition one by one, then, in both ___b and ___c, SAS should stop checking after y > 5, meaning that in neither case should abs(.) be calculated. However, it is being calculated in the uncommented version, but not in the commented version. It's like the existence of the in () operator causes SAS to check all conditions even if the previous ones are already false.

I edited the original post to make the example simpler and the issue more clear.

Shmuel · Posted 10-10-2019 11:43 AM

It seems that you are right

"the existence of the in () operator causes SAS to check all conditions even if the previous ones are already false."

it may be because when using in operator sas turns on a flag telling it to check all values in the () list, but does not turn it off moving to

the next condition.

Quentin · Posted 10-10-2019 11:53 AM

That is surprising. Unfortunately, SAS rules for short-circuiting (or not) are not well defined/documented. This blog post discusses it, and has some links. https://blogs.sas.com/content/iml/2019/08/14/short-circuit-evaluation-and-logical-ligatures-in-sas.h...

Tom · Posted 10-10-2019 12:16 PM

Looks like SAS has not implemented building short circuit logic into the code generated when the condition uses the IN operator.

Remember that IN could refer to an array of variables rather than a simple list of constant values, so the compiler would need to know the difference between:

x in (1,2,3)

and

x in array_name

It needs to know that since for the first it could hard code the numbers at compile time, but would need to generate code to check the values of the variables in the array at execution time for the second example since they could change.

It is also probably more work to understand

x in (1,2,3)

than a simple single value comparison:

x = 6

ChrisNZ · Posted 10-10-2019 05:10 PM

1.>but would need to generate code to check the values of the variables in the array at execution time for the second example

I would be disappointed if the compiler did not translate

if VAL in ARR

to

if VAL=ARR[1] or VAL=ARR[2]

but it seems that it doesn't.

data ___b;           * realtime=2.5 seconds, no shortcut ;
  set ___a;
  do i=1 to 1e7;
    if Y in(1, 2) & abs(X)=5 then;
  end; 
run;
       
data ___b;           * realtime=3 seconds, no shortcut;
  set ___a;
  array ARR [2] _temporary_ (1,2) ;
  do i=1 to 1e7;
    if Y in ARR & abs(X)=5 then;
  end;
run;

data ___b;           * realtime=1 second, shortcut used;
  set ___a;
  array ARR [2] _temporary_ (1,2) ;
  do i=1 to 1e7;
    if (Y=ARR[1] | Y=ARR[2]) & abs(X)=5 then;
  end;
run;

High-Performance SAS Coding - Third Edition

FreelanceReinh · Posted 10-11-2019 05:24 AM

Sadly, not even the most obvious FALSE conditions, i.e. a literal zero or numeric missing, trigger short-circuiting in an IF statement:

538  data _null_;
539  set ___a;
540  if 0 and abs(x)=5;
541  run;

NOTE: Missing values were generated as a result of performing an operation on missing values.

(Similar example here: https://communities.sas.com/t5/SAS-Programming/If-statement-Short-circuiting-and-Lag-function/m-p/47...)

Quentin · Posted 10-11-2019 09:47 AM

@FreelanceReinh wrote:

Sadly, not even the most obvious FALSE conditions, i.e. a literal zero or numeric missing, trigger short-circuiting in an IF statement:
538  data _null_;
539  set ___a;
540  if 0 and abs(x)=5;
541  run;

NOTE: Missing values were generated as a result of performing an operation on missing values.
(Similar example here: https://communities.sas.com/t5/SAS-Programming/If-statement-Short-circuiting-and-Lag-function/m-p/47...)

Funny that IF 0 will not shortcut but IF 0=1 will shortcut:

25   data ___c;
26       set ___a;
27
28       if 0=1 and
29          abs(x) = 5;
30   run;

NOTE: There were 1 observations read from the data set WORK.___A.
NOTE: The data set WORK.___C has 0 observations and 2 variables.

Kastchei · Posted 10-11-2019 01:12 PM

Thanks everyone for the information. I didn't realize short-circuiting had a term, so I've learned a bit. I appreciate it. It looks like the summary is simply that SAS has not set up short-circuiting for use with the in() operator at this time.

Registration is open

SAS Training: Just a Click Away