BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DavidPhillips2
Rhodochrosite | Level 12

Why does 

if ECLASS IN ("F1","F2","M1","M2","O1","O2","VQF") AND  RANKREV NOT IN ("5","6") ;

produce a different number of records than

Where ECLASS IN ("F1","F2","M1","M2","O1","O2","VQF") AND  RANKREV NOT IN ("5","6") ;

 

The only difference is the where vs if.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@DavidPhillips2 wrote:

Its because of nulls.  If statements keep nulls where's do not.



Not True, Not True, Not True.

 

Consider:  In the logical expression

      ECLASS IN ("F1","F2","M1","M2","O1","O2","VQF") AND  RANKREV NOT IN ("5","6")

the only part where a null value should pass is in the "RANKREV NOT IN ("5","6")

 

So let's test that "NOT IN" clause with SASHELP.CLASS.  Set up data set HAVE with the first six observations from sashelp.class and blank out the name variable for the 5th obs, yielding these six name values, in order:   "Alfred", "Alice", "Barbara", "Carol", "", "James".

 

Then run the logical condition;
    NOT IN ("Alfred","Barbara")
once via subsetting IF and once via WHERE.  The program below shows that results are the same - i.e. null character values are treated the same by where and subsetting if:

data have;
  set sashelp.class (keep=name obs=6);
  if _n_=5 then name=' ';
  put (_n_ name) (:);
run;


data _null_;
  set have;
  if name not in ('Alfred','Carol');  
  put (_n_ name) (:);
run;


data _null_;
  set have;
  where name not in ('Alfred','Carol'); 
  put (_n_ name) (:);
run;

 

So what could explain the difference that you get?  Notice that the values of _N_ reported via the PUT statements in the program differ between the IF and WHERE tests.  That's because the subsetting IF brings every data record into the PDV, assigns an incremented value of _N_ and then deletes it when the logical condition is not satisfied..  But the WHERE statement outsources the filtering to the input/output engine, so only the successful records are brought into the PDV and therefore assigned an incremented value of _N_.

 

To test this rerun the above program with OPTIONS OBS=3 preceding the first data _null_, and you'll see that the subsetting IF does not report the null value.  The reason is not because it is NULL, it's because the null is in record 5 - i.e. beyond the global OBS limit.

 

Would that explain your results?

 

 

If this is your problem, then issue the statement

   options obs=max;

and rerun your programs.

 

regards,

Mark

 

 

*** Edited addition ***

My response explained why you would get a blank in the WHERE but not the IF but you report the opposite.  That can also be reproduced in the program above.  Make these changes:

 

  1. Change "IF _N_=5 ..." to "IF _N_=2 ...".
  2. Prior to "data have" make sure you have
    1. OPTIONS OBS=MAX;
    2. OPTIONS FIRSTOBS=1;
  3. Prior to the first DATA _NULL_ add:
    1. OPTIONS FIRSTOBS=2;

 

Rerun the program and you'll see a null character value pass the IF test, but not the WHERE test.  Again it's not because of the value, it's due to the location of the record containing the null value vs the value of FIRSTOBS.

 

So I'm betting you might have a global OPTIONS FIRSTOBS set in your program. And also that your IF test produces the NULL value at the beginning of the resulting data set.  Is that the case?

 

So the take-home message is

   WHERE is honored before FIRSTOBS and OBS

 

MK

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
DavidPhillips2
Rhodochrosite | Level 12

Why does 

if ECLASS IN ("F1","F2","M1","M2","O1","O2","VQF") AND  RANKREV NOT IN ("5","6") ;

produce a different number of records than

Where ECLASS IN ("F1","F2","M1","M2","O1","O2","VQF") AND  RANKREV NOT IN ("5","6") ;

 

The only difference is the where vs if.

PeterClemmensen
Tourmaline | Level 20

Can you post the context where it yields a different number of observations? 🙂

 

Generally the where statement specifies which observations are read into the PDV while the subsetting if statement specifies that you output that record only if the condition is met.

DavidPhillips2
Rhodochrosite | Level 12

Its because of nulls.  If statements keep nulls where's do not.

mkeintz
PROC Star

@DavidPhillips2 wrote:

Its because of nulls.  If statements keep nulls where's do not.



Not True, Not True, Not True.

 

Consider:  In the logical expression

      ECLASS IN ("F1","F2","M1","M2","O1","O2","VQF") AND  RANKREV NOT IN ("5","6")

the only part where a null value should pass is in the "RANKREV NOT IN ("5","6")

 

So let's test that "NOT IN" clause with SASHELP.CLASS.  Set up data set HAVE with the first six observations from sashelp.class and blank out the name variable for the 5th obs, yielding these six name values, in order:   "Alfred", "Alice", "Barbara", "Carol", "", "James".

 

Then run the logical condition;
    NOT IN ("Alfred","Barbara")
once via subsetting IF and once via WHERE.  The program below shows that results are the same - i.e. null character values are treated the same by where and subsetting if:

data have;
  set sashelp.class (keep=name obs=6);
  if _n_=5 then name=' ';
  put (_n_ name) (:);
run;


data _null_;
  set have;
  if name not in ('Alfred','Carol');  
  put (_n_ name) (:);
run;


data _null_;
  set have;
  where name not in ('Alfred','Carol'); 
  put (_n_ name) (:);
run;

 

So what could explain the difference that you get?  Notice that the values of _N_ reported via the PUT statements in the program differ between the IF and WHERE tests.  That's because the subsetting IF brings every data record into the PDV, assigns an incremented value of _N_ and then deletes it when the logical condition is not satisfied..  But the WHERE statement outsources the filtering to the input/output engine, so only the successful records are brought into the PDV and therefore assigned an incremented value of _N_.

 

To test this rerun the above program with OPTIONS OBS=3 preceding the first data _null_, and you'll see that the subsetting IF does not report the null value.  The reason is not because it is NULL, it's because the null is in record 5 - i.e. beyond the global OBS limit.

 

Would that explain your results?

 

 

If this is your problem, then issue the statement

   options obs=max;

and rerun your programs.

 

regards,

Mark

 

 

*** Edited addition ***

My response explained why you would get a blank in the WHERE but not the IF but you report the opposite.  That can also be reproduced in the program above.  Make these changes:

 

  1. Change "IF _N_=5 ..." to "IF _N_=2 ...".
  2. Prior to "data have" make sure you have
    1. OPTIONS OBS=MAX;
    2. OPTIONS FIRSTOBS=1;
  3. Prior to the first DATA _NULL_ add:
    1. OPTIONS FIRSTOBS=2;

 

Rerun the program and you'll see a null character value pass the IF test, but not the WHERE test.  Again it's not because of the value, it's due to the location of the record containing the null value vs the value of FIRSTOBS.

 

So I'm betting you might have a global OPTIONS FIRSTOBS set in your program. And also that your IF test produces the NULL value at the beginning of the resulting data set.  Is that the case?

 

So the take-home message is

   WHERE is honored before FIRSTOBS and OBS

 

MK

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisHemedinger
Community Manager

You can see the behavior differences of IF and WHERE in the DATA step debugger, now available in SAS Enterprise Guide (and still available in PC SAS if you're running that).

 

WHERE rules are determined with the input/output criteria at compile time, so any record that doesn't meet the WHERE test is never loaded into the PDV.  IF statements are evaluated on the data values after they are loaded into the PDV.

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1360 views
  • 0 likes
  • 4 in conversation