DATA Step, Macro, Functions and more

Performance Comparison IN vs BETWEEN

Reply
Super Contributor
Posts: 326

Performance Comparison IN vs BETWEEN

Hi,

Need help to determine which query is faster between the IN and BETWEEN operators.

select * from tbl where var in(1,2,3,4,n)

VS

select * from tbl where var between 1 and n


Thanks!

Super User
Posts: 10,041

Re: Performance Comparison IN vs BETWEEN

I guess FIRST.

For the FIRST, you only compare several sparse integer number ,while at SECOND , you define a range which also consider the float number . So My opinion is FIRST .

Ksharp

Respected Advisor
Posts: 3,799

Re: Performance Comparison IN vs BETWEEN

Performance questions are answered by testing.

Frequent Contributor
Posts: 78

Re: Performance Comparison IN vs BETWEEN

Posted in reply to data_null__

Are the benchmarking routines that will help with such testing?

Super User
Posts: 11,343

Re: Performance Comparison IN vs BETWEEN

First is make sure the STIMER option is turned on.

For SQL you then add the STIMER option to the procedure statemen and you'll get a report how long each statement takes.

proc sql stimer;

   <your sql code goes here>;

   <your other sql code goes here>;

quit;

Results will show real and cpu time for each sql code section.

Valued Guide
Posts: 765

Re: Performance Comparison IN vs BETWEEN

hi ... only one variable in the data set ... using a data set placed in memory, looks like a tie (lot slower with a range using 3:8) ...

4392  data big;

4393  do _n_ = 1 to 1e8;

4394     var = ceil(10 * ranuni(999));

4395     output;

4396  end;

4397  run;

NOTE: The data set WORK.BIG has 100000000 observations and 1 variables.

NOTE: DATA statement used (Total process time):

      real time           24.40 seconds

      cpu time            15.06 seconds

4398

4399  sasfile big load;

NOTE: The file WORK.BIG.DATA has been loaded into memory by the SASFILE statement.

4400

4401  options stimer;

4402

4403  proc sql;

4404  create table table1 as select * from big where var between 3 and 8;

NOTE: Table WORK.TABLE1 created, with 59996934 rows and 1 columns.

4405  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           23.04 seconds

      cpu time            18.37 seconds

4406

4407

4408  proc sql;

4409  create table table2 as select * from big where var in (3:8);

NOTE: Table WORK.TABLE2 created, with 59996934 rows and 1 columns.

4410  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           38.54 seconds

      cpu time            33.85 seconds

4411

4412  proc sql;

4413  create table table3 as select * from big where var in (3 4 5 6 7 8);

NOTE: Table WORK.TABLE3 created, with 59996934 rows and 1 columns.

4414  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           22.54 seconds

      cpu time            19.59 seconds



Respected Advisor
Posts: 3,799

Re: Performance Comparison IN vs BETWEEN

I would suggest FULLSTIMER.

I would not expect the use of range (index operator) to be significantly slower.  What's up with that.

Valued Guide
Posts: 765

Re: Performance Comparison IN vs BETWEEN

Posted in reply to data_null__

hi ... FULLSTIMER and same data (added BETWEEN 8 AND 3) ... agreed, what's with the RANGE time

4536  proc sql;

4537  create table table1 as select * from big where var between 3 and 8;

NOTE: Table WORK.TABLE1 created, with 59996934 rows and 1 columns.

4538  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           22.50 seconds

      user cpu time       14.54 seconds

      system cpu time     3.93 seconds

      memory              133.17k

      OS Memory           950128.00k

      Timestamp           05/18/2012 01:26:52 PM

4539

4540  proc sql;

4541  create table table1 as select * from big where var between 8 and 3;

NOTE: Table WORK.TABLE1 created, with 59996934 rows and 1 columns.

4542  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           22.17 seconds

      user cpu time       15.03 seconds

      system cpu time     3.37 seconds

      memory              133.17k

      OS Memory           950128.00k

      Timestamp           05/18/2012 01:27:28 PM

4543

4544  proc sql;

4545  create table table2 as select * from big where var in (3:8);

NOTE: Table WORK.TABLE2 created, with 59996934 rows and 1 columns.

4546  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           34.92 seconds

      user cpu time       30.04 seconds

      system cpu time     3.54 seconds

      memory              134.80k

      OS Memory           950128.00k

      Timestamp           05/18/2012 01:28:03 PM

4547

4548  proc sql;

4549  create table table3 as select * from big where var in (3 4 5 6 7 8);

NOTE: Table WORK.TABLE3 created, with 59996934 rows and 1 columns.

4550  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           22.10 seconds

      user cpu time       16.35 seconds

      system cpu time     3.29 seconds

      memory              133.17k

      OS Memory           950128.00k

      Timestamp           05/18/2012 01:28:25 PM


and with some character data ...


4565  proc sql;

4566  create table table1 as select * from big where var between '03' and '08';

NOTE: Table WORK.TABLE1 created, with 59996934 rows and 1 columns.

4567  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           17.75 seconds

      user cpu time       15.28 seconds

      system cpu time     1.68 seconds

      memory              133.17k

      OS Memory           276080.00k

      Timestamp           05/18/2012 01:33:35 PM

4568

4569  proc sql;

4570  create table table3 as select * from big where var in ('03' '04' '05' '06' '07' '08');

NOTE: Table WORK.TABLE3 created, with 59996934 rows and 1 columns.

4571  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           18.32 seconds

      user cpu time       15.51 seconds

      system cpu time     1.86 seconds

      memory              133.17k

      OS Memory           276080.00k

      Timestamp           05/18/2012 01:34:05 PM

PROC Star
Posts: 1,760

Re: Performance Comparison IN vs BETWEEN

Wow, the range operator does seem to behave strangely.

I would ask tech support to check that behaviour and record the defect if confirmed.

The in() operator is slightly slower it seems and I would expect it to grow slower as you add values to the list:

between only requires 2 tests (should be equivalent to Z >= X and X >= Y)

in() requires as many tests as there are values.

PROC Star
Posts: 1,760

Re: Performance Comparison IN vs BETWEEN

The range operator is slower because it calls the int() function.

 

data T;

  set T;

  where X in(1:4);

run;

generates:

 

NOTE: There were 0 observations read from the data set WORK.T.

  WHERE (x=INT(x)) and (x>=1 and x<=4);

NOTE: DATA statement used (Total process time):

real time 0.07 seconds

cpu time 0.00 seconds

So this shortcut is best avoided for large tables.

Ask a Question
Discussion stats
  • 9 replies
  • 362 views
  • 2 likes
  • 7 in conversation