Edited to make the example simpler.
I'm seeing a difference in how SAS is handling missing values when evaluated alongside a condition using the in () operator. The below code is really simplistic. The only difference between data steps ___b and ___c is how y is evaluated in the if statement: y = 6 vs. y in (6). However, the log shows the below missing NOTE for data step ___b, but not for data step ___c.
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 10619:8
What it is about adding a condition using the in () operator that suddenly makes SAS try to resolve abs(x) whereas using = does not?
data ___a;
x = .; y = 5; output;
run;
data ___b;
set ___a;
if y in (6) and
abs(x) = 5;
run;
data ___c;
set ___a;
if y = 6 and
abs(x) = 5;
run;
Looks like SAS has not implemented building short circuit logic into the code generated when the condition uses the IN operator.
Remember that IN could refer to an array of variables rather than a simple list of constant values, so the compiler would need to know the difference between:
x in (1,2,3)
and
x in array_name
It needs to know that since for the first it could hard code the numbers at compile time, but would need to generate code to check the values of the variables in the array at execution time for the second example since they could change.
It is also probably more work to understand
x in (1,2,3)
than a simple single value comparison:
x = 6
Sas checks condition one by one.
If y>5 then of course y is not in (1,2,3) so sas skips checking the next condition.
When you mask the IN the sas have to check is abs(x) =5 and write the note.
I'm not sure I'm following. If SAS checks each condition one by one, then, in both ___b and ___c, SAS should stop checking after y > 5, meaning that in neither case should abs(.) be calculated. However, it is being calculated in the uncommented version, but not in the commented version. It's like the existence of the in () operator causes SAS to check all conditions even if the previous ones are already false.
I edited the original post to make the example simpler and the issue more clear.
It seems that you are right
"the existence of the in () operator causes SAS to check all conditions even if the previous ones are already false."
it may be because when using in operator sas turns on a flag telling it to check all values in the () list, but does not turn it off moving to
the next condition.
That is surprising. Unfortunately, SAS rules for short-circuiting (or not) are not well defined/documented. This blog post discusses it, and has some links. https://blogs.sas.com/content/iml/2019/08/14/short-circuit-evaluation-and-logical-ligatures-in-sas.h...
Looks like SAS has not implemented building short circuit logic into the code generated when the condition uses the IN operator.
Remember that IN could refer to an array of variables rather than a simple list of constant values, so the compiler would need to know the difference between:
x in (1,2,3)
and
x in array_name
It needs to know that since for the first it could hard code the numbers at compile time, but would need to generate code to check the values of the variables in the array at execution time for the second example since they could change.
It is also probably more work to understand
x in (1,2,3)
than a simple single value comparison:
x = 6
1.>but would need to generate code to check the values of the variables in the array at execution time for the second example
I would be disappointed if the compiler did not translate
if VAL in ARR
to
if VAL=ARR[1] or VAL=ARR[2]
but it seems that it doesn't.
data ___b; * realtime=2.5 seconds, no shortcut ;
set ___a;
do i=1 to 1e7;
if Y in(1, 2) & abs(X)=5 then;
end;
run;
data ___b; * realtime=3 seconds, no shortcut;
set ___a;
array ARR [2] _temporary_ (1,2) ;
do i=1 to 1e7;
if Y in ARR & abs(X)=5 then;
end;
run;
data ___b; * realtime=1 second, shortcut used;
set ___a;
array ARR [2] _temporary_ (1,2) ;
do i=1 to 1e7;
if (Y=ARR[1] | Y=ARR[2]) & abs(X)=5 then;
end;
run;
Sadly, not even the most obvious FALSE conditions, i.e. a literal zero or numeric missing, trigger short-circuiting in an IF statement:
538 data _null_; 539 set ___a; 540 if 0 and abs(x)=5; 541 run; NOTE: Missing values were generated as a result of performing an operation on missing values.
(Similar example here: https://communities.sas.com/t5/SAS-Programming/If-statement-Short-circuiting-and-Lag-function/m-p/47...)
@FreelanceReinh wrote:
Sadly, not even the most obvious FALSE conditions, i.e. a literal zero or numeric missing, trigger short-circuiting in an IF statement:
538 data _null_; 539 set ___a; 540 if 0 and abs(x)=5; 541 run; NOTE: Missing values were generated as a result of performing an operation on missing values.(Similar example here: https://communities.sas.com/t5/SAS-Programming/If-statement-Short-circuiting-and-Lag-function/m-p/47...)
Funny that IF 0 will not shortcut but IF 0=1 will shortcut:
25 data ___c; 26 set ___a; 27 28 if 0=1 and 29 abs(x) = 5; 30 run; NOTE: There were 1 observations read from the data set WORK.___A. NOTE: The data set WORK.___C has 0 observations and 2 variables.
Thanks everyone for the information. I didn't realize short-circuiting had a term, so I've learned a bit. I appreciate it. It looks like the summary is simply that SAS has not set up short-circuiting for use with the in() operator at this time.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.