Help using Base SAS procedures

Do “subsetting if” and “where” clauses do the same thing?

Accepted Solution Solved
Reply
Super Contributor
Posts: 338
Accepted Solution

Do “subsetting if” and “where” clauses do the same thing?

Hi SAS Community,

I have applied “subsetting if” and “where” clauses like below and they produced the same number of records in both table “want_1” and “want_2”.

1). Approach I - “subsetting if”

data want_1;

SET have;

if bank_number = 40;

run;

2). Approach II - “where”

data want_2;

SET have;

Where bank_number = 40;

run;

Question

However, SAS sometimes produce correct results for wrong reasons if we do not use it with full knowledge what is happening inside. This happened to me many times.

So, could any expert confirm me if the Approach I and II above produce same results always?

Thank you

Mirisage


Accepted Solutions
Solution
‎08-13-2012 03:36 PM
Super Contributor
Posts: 1,636

Re: Do “subsetting if” and “where” clauses do the same thing?

if variable bank_number is already in dataset have then if and where would produce the same result.

if variable bank_number is created in the same data step then only if works.

example:

data class;
  set sashelp.class;
  if age <14 then group=1;
    else group=2;
  if group=1;/* you can't use "where" */
  proc print;run;
 
                   Obs    Name       Sex    Age    Height    Weight    group

                      1    Alice       F      13     56.5       84.0      1
                      2    Barbara     F      13     65.3       98.0      1
                      3    James       M      12     57.3       83.0      1
                      4    Jane        F      12     59.8       84.5      1
                      5    Jeffrey     M      13     62.5       84.0      1
                      6    John        M      12     59.0       99.5      1
                      7    Joyce       F      11     51.3       50.5      1
                      8    Louise      F      12     56.3       77.0      1
                      9    Robert      M      12     64.8      128.0      1
                     10    Thomas      M      11     57.5       85.0      1

View solution in original post


All Replies
Super User
Posts: 19,770

Re: Do “subsetting if” and “where” clauses do the same thing?

No they don't.

See the following paper for some examples/details:

http://www.lexjansen.com/wuss/2005/sas_solutions/sol_where_vs_if_statements.pdf

Solution
‎08-13-2012 03:36 PM
Super Contributor
Posts: 1,636

Re: Do “subsetting if” and “where” clauses do the same thing?

if variable bank_number is already in dataset have then if and where would produce the same result.

if variable bank_number is created in the same data step then only if works.

example:

data class;
  set sashelp.class;
  if age <14 then group=1;
    else group=2;
  if group=1;/* you can't use "where" */
  proc print;run;
 
                   Obs    Name       Sex    Age    Height    Weight    group

                      1    Alice       F      13     56.5       84.0      1
                      2    Barbara     F      13     65.3       98.0      1
                      3    James       M      12     57.3       83.0      1
                      4    Jane        F      12     59.8       84.5      1
                      5    Jeffrey     M      13     62.5       84.0      1
                      6    John        M      12     59.0       99.5      1
                      7    Joyce       F      11     51.3       50.5      1
                      8    Louise      F      12     56.3       77.0      1
                      9    Robert      M      12     64.8      128.0      1
                     10    Thomas      M      11     57.5       85.0      1

PROC Star
Posts: 7,468

Re: Do “subsetting if” and “where” clauses do the same thing?

Someone will have to correct me if I'm wrong but, in the example you provided, I think that the two methods WILL always produce the same result.

And, the paper that Fareeza referenced and example that Linlin gave, the paper and example didn't take into account the where data step option.  e.g.,

data class (where=(group eq 1));

  set sashelp.class;

  if age <14 then group=1;

    else group=2;

run;

Super User
Posts: 19,770

Re: Do “subsetting if” and “where” clauses do the same thing?

And also a where in a data step that has a merge doesn't always do what I think it should sometimes.

The overall answer to your question is 'it depends on the situation'.

Super User
Posts: 5,498

Re: Do “subsetting if” and “where” clauses do the same thing?

In your examples, you get the same result as long as BANK_NUMBER already exists in your incoming data set, and as long as it is defined as being numeric.

The problem is really that you may add features to your DATA step without really understanding how the results could suddenly become different.  For example, here are two common situations.  If you add END= on the SET statement, or if you add a BY statement, the results could very easily be different.  While you are relatively safe for this particular example that you posted, those simple changes will suddenly produce different results.  Moreover, you may not even notice that the results would be different because they might depend on the data.  For some versions of HAVE, adding END= or BY still produces identical results.  For other versions of HAVE, the results could be different.  There is no substitute for understanding how IF and WHERE work.

Good luck.

Trusted Advisor
Posts: 1,018

Re: Do “subsetting if” and “where” clauses do the same thing?

The "where" clause is applied in the data engine.  Think of it as being applied BEFORE the data step sees the data - so cases that don't qualify are never in the PDV, which is why this won't work:

data want;
   set sashelp.class;

   hw=height*weight;

   where hw>0;
run;

But an "if hw>0" would work.  It doesn't get invoked until data is in the PDV.  This understanding also help to explain the impact on options like FIRSTOBS and OBS in the presence of WHERE.

Super User
Posts: 10,020

Re: Do “subsetting if” and “where” clauses do the same thing?

One more thing, WHERE is always faster than IF .not only it pick up obs before obs enter PDV (not like IF,it is after obs enter PDV) ,but also WHERE can use INDEX for you .

Ksharp

Respected Advisor
Posts: 3,156

Re: Do “subsetting if” and “where” clauses do the same thing?

Quote from Keshan: 'WHERE is always faster than IF'.

NOT always, especially when dealing huge data set. In my line of work (Part D claim data), multimillion level of records is the routine. Depending on the data retrieval rate, I found 'if' is faster than 'where' in many cases, sometimes a lot faster if the rate is a bit extreme. 'where' bears an edge when the retrieval rate is less than 2-3%. (sounds familiar? reminding me of  index)

Here is a example (95% data retrieval), please note, when data structure is more complicated (more variables, more variable types and length), the difference tends to be larger:

40   DATA test;

41   do i=1 to 100000000;

42   output;

43   end;

44   run;

NOTE: The data set WORK.TEST has 100000000 observations and 1 variables.

NOTE: DATA statement used (Total process time):

      real time           7.95 seconds

      cpu time            5.29 seconds

45

46   data h1;

47   set test;

48   where i>5000000;

49   run;

NOTE: There were 95000000 observations read from the data set WORK.TEST.

      WHERE i>5000000;

NOTE: The data set WORK.H1 has 95000000 observations and 1 variables.

NOTE: DATA statement used (Total process time):

      real time           20.37 seconds

      cpu time            16.98 seconds

50

51   data h2;

52   set test;

53   if i>5000000;

54   run;

NOTE: There were 100000000 observations read from the data set WORK.TEST.

NOTE: The data set WORK.H2 has 95000000 observations and 1 variables.

NOTE: DATA statement used (Total process time):

      real time           11.82 seconds

      cpu time            10.20 seconds

Haikuo

PROC Star
Posts: 7,468

Re: Do “subsetting if” and “where” clauses do the same thing?

haikuo: I get similar results with just 1,000,000 records (i.e., if being faster than where) .. even with a one variable indexed file.

Respected Advisor
Posts: 3,156

Re: Do “subsetting if” and “where” clauses do the same thing?

Yes, Art, I agree. Difference gets obvious when large data set involved. In general, it is a matter of personal preference, while I tremendously prefer 'if'. My major reason is that 'if' makes my life a lot easier, for 'if' can be used with '_n_', last-first.variable and 'by' statement, while 'where' can't. To me, the only case I use 'where' (besides procs of course) is that I have to use 'like' operator, which is really rare.

Haikuo

Contributor
Posts: 30

Re: Do “subsetting if” and “where” clauses do the same thing?

The two code examples you provide should give identical results, though WHERE may be faster to run.

However there are many cases where WHERE and subsetting IF do not give identical output. Compare:

data test;

length x $5;

input x;

cards;

One

Two

Three

Four

Five

;

run;

data subset_if;

set test;

sequence_number=_n_;

previous_x=lag(x);

if not(x="Three");

run;

data subset_where;

set test;

sequence_number=_n_;

previous_x=lag(x);

where not(x="Three");

run;

In creating SUBSET_IF, we read in and process all lines from TEST before dropping the third observation. Even though the third observation isn't output directly, it still affects what is output: sequence_number goes 1,2,4,5 and previous_x goes " ", "One", "Three", "Four".

In creating SUBSET_WHERE, the observation with x="Three" is deleted at an earlier stage, and there's no sign in the output that it ever existed.

Super Contributor
Posts: 338

Re: Do “subsetting if” and “where” clauses do the same thing?

Posted in reply to GeoffreyBrent

Hi Everyone,

Thank everyone of you for your time and expertise.

This generated a tremondous knowledge base on the subject.

Best regards

Mirisage

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 456 views
  • 8 likes
  • 9 in conversation