DATA Step, Macro, Functions and more

if vs where

Accepted Solution Solved
Reply
Super Contributor
Posts: 484
Accepted Solution

if vs where

Why does 

if ECLASS IN ("F1","F2","M1","M2","O1","O2","VQF") AND  RANKREV NOT IN ("5","6") ;

produce a different number of records than

Where ECLASS IN ("F1","F2","M1","M2","O1","O2","VQF") AND  RANKREV NOT IN ("5","6") ;

 

The only difference is the where vs if.


Accepted Solutions
Solution
‎12-07-2016 07:50 PM
Valued Guide
Posts: 797

Re: if vs where

[ Edited ]

DavidPhillips2 wrote:

Its because of nulls.  If statements keep nulls where's do not.



Not True, Not True, Not True.

 

Consider:  In the logical expression

      ECLASS IN ("F1","F2","M1","M2","O1","O2","VQF") AND  RANKREV NOT IN ("5","6")

the only part where a null value should pass is in the "RANKREV NOT IN ("5","6")

 

So let's test that "NOT IN" clause with SASHELP.CLASS.  Set up data set HAVE with the first six observations from sashelp.class and blank out the name variable for the 5th obs, yielding these six name values, in order:   "Alfred", "Alice", "Barbara", "Carol", "", "James".

 

Then run the logical condition;
    NOT IN ("Alfred","Barbara")
once via subsetting IF and once via WHERE.  The program below shows that results are the same - i.e. null character values are treated the same by where and subsetting if:

data have;
  set sashelp.class (keep=name obs=6);
  if _n_=5 then name=' ';
  put (_n_ name) (:);
run;


data _null_;
  set have;
  if name not in ('Alfred','Carol');  
  put (_n_ name) (:);
run;


data _null_;
  set have;
  where name not in ('Alfred','Carol'); 
  put (_n_ name) (:);
run;

 

So what could explain the difference that you get?  Notice that the values of _N_ reported via the PUT statements in the program differ between the IF and WHERE tests.  That's because the subsetting IF brings every data record into the PDV, assigns an incremented value of _N_ and then deletes it when the logical condition is not satisfied..  But the WHERE statement outsources the filtering to the input/output engine, so only the successful records are brought into the PDV and therefore assigned an incremented value of _N_.

 

To test this rerun the above program with OPTIONS OBS=3 preceding the first data _null_, and you'll see that the subsetting IF does not report the null value.  The reason is not because it is NULL, it's because the null is in record 5 - i.e. beyond the global OBS limit.

 

Would that explain your results?

 

 

If this is your problem, then issue the statement

   options obs=max;

and rerun your programs.

 

regards,

Mark

 

 

*** Edited addition ***

My response explained why you would get a blank in the WHERE but not the IF but you report the opposite.  That can also be reproduced in the program above.  Make these changes:

 

  1. Change "IF _N_=5 ..." to "IF _N_=2 ...".
  2. Prior to "data have" make sure you have
    1. OPTIONS OBS=MAX;
    2. OPTIONS FIRSTOBS=1;
  3. Prior to the first DATA _NULL_ add:
    1. OPTIONS FIRSTOBS=2;

 

Rerun the program and you'll see a null character value pass the IF test, but not the WHERE test.  Again it's not because of the value, it's due to the location of the record containing the null value vs the value of FIRSTOBS.

 

So I'm betting you might have a global OPTIONS FIRSTOBS set in your program. And also that your IF test produces the NULL value at the beginning of the resulting data set.  Is that the case?

 

So the take-home message is

   WHERE is honored before FIRSTOBS and OBS

 

MK

 

 

View solution in original post


All Replies
Super Contributor
Posts: 484

If vs Where

Why does 

if ECLASS IN ("F1","F2","M1","M2","O1","O2","VQF") AND  RANKREV NOT IN ("5","6") ;

produce a different number of records than

Where ECLASS IN ("F1","F2","M1","M2","O1","O2","VQF") AND  RANKREV NOT IN ("5","6") ;

 

The only difference is the where vs if.

PROC Star
Posts: 554

Re: if vs where

Can you post the context where it yields a different number of observations? Smiley Happy

 

Generally the where statement specifies which observations are read into the PDV while the subsetting if statement specifies that you output that record only if the condition is met.

Super Contributor
Posts: 484

Re: if vs where

Its because of nulls.  If statements keep nulls where's do not.

Solution
‎12-07-2016 07:50 PM
Valued Guide
Posts: 797

Re: if vs where

[ Edited ]

DavidPhillips2 wrote:

Its because of nulls.  If statements keep nulls where's do not.



Not True, Not True, Not True.

 

Consider:  In the logical expression

      ECLASS IN ("F1","F2","M1","M2","O1","O2","VQF") AND  RANKREV NOT IN ("5","6")

the only part where a null value should pass is in the "RANKREV NOT IN ("5","6")

 

So let's test that "NOT IN" clause with SASHELP.CLASS.  Set up data set HAVE with the first six observations from sashelp.class and blank out the name variable for the 5th obs, yielding these six name values, in order:   "Alfred", "Alice", "Barbara", "Carol", "", "James".

 

Then run the logical condition;
    NOT IN ("Alfred","Barbara")
once via subsetting IF and once via WHERE.  The program below shows that results are the same - i.e. null character values are treated the same by where and subsetting if:

data have;
  set sashelp.class (keep=name obs=6);
  if _n_=5 then name=' ';
  put (_n_ name) (:);
run;


data _null_;
  set have;
  if name not in ('Alfred','Carol');  
  put (_n_ name) (:);
run;


data _null_;
  set have;
  where name not in ('Alfred','Carol'); 
  put (_n_ name) (:);
run;

 

So what could explain the difference that you get?  Notice that the values of _N_ reported via the PUT statements in the program differ between the IF and WHERE tests.  That's because the subsetting IF brings every data record into the PDV, assigns an incremented value of _N_ and then deletes it when the logical condition is not satisfied..  But the WHERE statement outsources the filtering to the input/output engine, so only the successful records are brought into the PDV and therefore assigned an incremented value of _N_.

 

To test this rerun the above program with OPTIONS OBS=3 preceding the first data _null_, and you'll see that the subsetting IF does not report the null value.  The reason is not because it is NULL, it's because the null is in record 5 - i.e. beyond the global OBS limit.

 

Would that explain your results?

 

 

If this is your problem, then issue the statement

   options obs=max;

and rerun your programs.

 

regards,

Mark

 

 

*** Edited addition ***

My response explained why you would get a blank in the WHERE but not the IF but you report the opposite.  That can also be reproduced in the program above.  Make these changes:

 

  1. Change "IF _N_=5 ..." to "IF _N_=2 ...".
  2. Prior to "data have" make sure you have
    1. OPTIONS OBS=MAX;
    2. OPTIONS FIRSTOBS=1;
  3. Prior to the first DATA _NULL_ add:
    1. OPTIONS FIRSTOBS=2;

 

Rerun the program and you'll see a null character value pass the IF test, but not the WHERE test.  Again it's not because of the value, it's due to the location of the record containing the null value vs the value of FIRSTOBS.

 

So I'm betting you might have a global OPTIONS FIRSTOBS set in your program. And also that your IF test produces the NULL value at the beginning of the resulting data set.  Is that the case?

 

So the take-home message is

   WHERE is honored before FIRSTOBS and OBS

 

MK

 

 

Community Manager
Posts: 2,771

Re: if vs where

You can see the behavior differences of IF and WHERE in the DATA step debugger, now available in SAS Enterprise Guide (and still available in PC SAS if you're running that).

 

WHERE rules are determined with the input/output criteria at compile time, so any record that doesn't meet the WHERE test is never loaded into the PDV.  IF statements are evaluated on the data values after they are loaded into the PDV.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 313 views
  • 0 likes
  • 4 in conversation