BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kastchei
Pyrite | Level 9

Edited to make the example simpler.

 

I'm seeing a difference in how SAS is handling missing values when evaluated alongside a condition using the in () operator.  The below code is really simplistic.  The only difference between data steps ___b and ___c is how y is evaluated in the if statement: y = 6 vs. y in (6).  However, the log shows the below missing NOTE for data step ___b, but not for data step ___c.

 

NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 10619:8

 

What it is about adding a condition using the  in () operator that suddenly makes SAS try to resolve abs(x) whereas using = does not?

 

data ___a;
    x = .; y = 5; output;
run;

data ___b;
    set ___a;

    if y in (6) and 
       abs(x) = 5;
run;

data ___c;
    set ___a;

    if y = 6 and
       abs(x) = 5;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Looks like SAS has not implemented building short circuit logic into the code generated when the condition uses the IN operator. 

 

Remember that IN could refer to an array of variables rather than a simple list of constant values, so the compiler would need to know the difference between:

x in (1,2,3)

and 

x in array_name

It needs to know that since for the first it could hard code the numbers at compile time, but would need to generate code to check the values of the variables in the array at execution time for the second example since they could change.

 

It is also probably more work to understand  

x in (1,2,3)

than a simple single value comparison:

x = 6

View solution in original post

9 REPLIES 9
Shmuel
Garnet | Level 18

Sas checks condition one by one.

If y>5 then of course y is not in (1,2,3) so sas skips checking the next condition.

When you mask the IN the sas have to check is abs(x) =5 and write the note.

Kastchei
Pyrite | Level 9

I'm not sure I'm following.  If SAS checks each condition one by one, then, in both ___b and ___c, SAS should stop checking after y > 5, meaning that in neither case should abs(.) be calculated.  However, it is being calculated in the uncommented version, but not in the commented version.  It's like the existence of the in () operator causes SAS to check all conditions even if the previous ones are already false.

 

I edited the original post to make the example simpler and the issue more clear.

Shmuel
Garnet | Level 18

It seems that you are right

       "the existence of the in () operator causes SAS to check all conditions even if the previous ones are already false."

it may be because when using in operator sas turns on a flag telling it to check all values in the () list, but does not turn it off moving to 

the next condition.

Quentin
Super User

That is surprising.  Unfortunately, SAS rules for short-circuiting (or not) are not well defined/documented.  This blog post discusses it, and has some links.  https://blogs.sas.com/content/iml/2019/08/14/short-circuit-evaluation-and-logical-ligatures-in-sas.h...

BASUG is hosting free webinars Next up: Mike Sale presenting Data Warehousing with SAS April 10 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Tom
Super User Tom
Super User

Looks like SAS has not implemented building short circuit logic into the code generated when the condition uses the IN operator. 

 

Remember that IN could refer to an array of variables rather than a simple list of constant values, so the compiler would need to know the difference between:

x in (1,2,3)

and 

x in array_name

It needs to know that since for the first it could hard code the numbers at compile time, but would need to generate code to check the values of the variables in the array at execution time for the second example since they could change.

 

It is also probably more work to understand  

x in (1,2,3)

than a simple single value comparison:

x = 6
ChrisNZ
Tourmaline | Level 20

1.>but would need to generate code to check the values of the variables in the array at execution time for the second example 

I would be disappointed if the compiler did not translate 

if VAL in ARR

to

if VAL=ARR[1] or VAL=ARR[2]

but it seems that it doesn't.

 

data ___b;           * realtime=2.5 seconds, no shortcut ;
  set ___a;
  do i=1 to 1e7;
    if Y in(1, 2) & abs(X)=5 then;
  end; 
run;
       
data ___b;           * realtime=3 seconds, no shortcut;
  set ___a;
  array ARR [2] _temporary_ (1,2) ;
  do i=1 to 1e7;
    if Y in ARR & abs(X)=5 then;
  end;
run;

data ___b;           * realtime=1 second, shortcut used;
  set ___a;
  array ARR [2] _temporary_ (1,2) ;
  do i=1 to 1e7;
    if (Y=ARR[1] | Y=ARR[2]) & abs(X)=5 then;
  end;
run;

 

 

FreelanceReinh
Jade | Level 19

Sadly, not even the most obvious FALSE conditions, i.e. a literal zero or numeric missing, trigger short-circuiting in an IF statement:

538  data _null_;
539  set ___a;
540  if 0 and abs(x)=5;
541  run;

NOTE: Missing values were generated as a result of performing an operation on missing values.

(Similar example here: https://communities.sas.com/t5/SAS-Programming/If-statement-Short-circuiting-and-Lag-function/m-p/47...)

Quentin
Super User

@FreelanceReinh wrote:

Sadly, not even the most obvious FALSE conditions, i.e. a literal zero or numeric missing, trigger short-circuiting in an IF statement:

538  data _null_;
539  set ___a;
540  if 0 and abs(x)=5;
541  run;

NOTE: Missing values were generated as a result of performing an operation on missing values.

(Similar example here: https://communities.sas.com/t5/SAS-Programming/If-statement-Short-circuiting-and-Lag-function/m-p/47...)


 

Funny that IF 0 will not shortcut but IF 0=1 will shortcut: 

 

25   data ___c;
26       set ___a;
27
28       if 0=1 and
29          abs(x) = 5;
30   run;

NOTE: There were 1 observations read from the data set WORK.___A.
NOTE: The data set WORK.___C has 0 observations and 2 variables.

 

BASUG is hosting free webinars Next up: Mike Sale presenting Data Warehousing with SAS April 10 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Kastchei
Pyrite | Level 9

Thanks everyone for the information.  I didn't realize short-circuiting had a term, so I've learned a bit.  I appreciate it.  It looks like the summary is simply that SAS has not set up short-circuiting for use with the in() operator at this time.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1304 views
  • 9 likes
  • 6 in conversation