BookmarkSubscribeRSS Feed
milts
Pyrite | Level 9

Hi,

Need help to determine which query is faster between the IN and BETWEEN operators.

select * from tbl where var in(1,2,3,4,n)

VS

select * from tbl where var between 1 and n


Thanks!

9 REPLIES 9
Ksharp
Super User

I guess FIRST.

For the FIRST, you only compare several sparse integer number ,while at SECOND , you define a range which also consider the float number . So My opinion is FIRST .

Ksharp

data_null__
Jade | Level 19

Performance questions are answered by testing.

jdmarino
Fluorite | Level 6

Are the benchmarking routines that will help with such testing?

ballardw
Super User

First is make sure the STIMER option is turned on.

For SQL you then add the STIMER option to the procedure statemen and you'll get a report how long each statement takes.

proc sql stimer;

   <your sql code goes here>;

   <your other sql code goes here>;

quit;

Results will show real and cpu time for each sql code section.

MikeZdeb
Rhodochrosite | Level 12

hi ... only one variable in the data set ... using a data set placed in memory, looks like a tie (lot slower with a range using 3:8) ...

4392  data big;

4393  do _n_ = 1 to 1e8;

4394     var = ceil(10 * ranuni(999));

4395     output;

4396  end;

4397  run;

NOTE: The data set WORK.BIG has 100000000 observations and 1 variables.

NOTE: DATA statement used (Total process time):

      real time           24.40 seconds

      cpu time            15.06 seconds

4398

4399  sasfile big load;

NOTE: The file WORK.BIG.DATA has been loaded into memory by the SASFILE statement.

4400

4401  options stimer;

4402

4403  proc sql;

4404  create table table1 as select * from big where var between 3 and 8;

NOTE: Table WORK.TABLE1 created, with 59996934 rows and 1 columns.

4405  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           23.04 seconds

      cpu time            18.37 seconds

4406

4407

4408  proc sql;

4409  create table table2 as select * from big where var in (3:8);

NOTE: Table WORK.TABLE2 created, with 59996934 rows and 1 columns.

4410  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           38.54 seconds

      cpu time            33.85 seconds

4411

4412  proc sql;

4413  create table table3 as select * from big where var in (3 4 5 6 7 8);

NOTE: Table WORK.TABLE3 created, with 59996934 rows and 1 columns.

4414  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           22.54 seconds

      cpu time            19.59 seconds



data_null__
Jade | Level 19

I would suggest FULLSTIMER.

I would not expect the use of range (index operator) to be significantly slower.  What's up with that.

MikeZdeb
Rhodochrosite | Level 12

hi ... FULLSTIMER and same data (added BETWEEN 8 AND 3) ... agreed, what's with the RANGE time

4536  proc sql;

4537  create table table1 as select * from big where var between 3 and 8;

NOTE: Table WORK.TABLE1 created, with 59996934 rows and 1 columns.

4538  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           22.50 seconds

      user cpu time       14.54 seconds

      system cpu time     3.93 seconds

      memory              133.17k

      OS Memory           950128.00k

      Timestamp           05/18/2012 01:26:52 PM

4539

4540  proc sql;

4541  create table table1 as select * from big where var between 8 and 3;

NOTE: Table WORK.TABLE1 created, with 59996934 rows and 1 columns.

4542  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           22.17 seconds

      user cpu time       15.03 seconds

      system cpu time     3.37 seconds

      memory              133.17k

      OS Memory           950128.00k

      Timestamp           05/18/2012 01:27:28 PM

4543

4544  proc sql;

4545  create table table2 as select * from big where var in (3:8);

NOTE: Table WORK.TABLE2 created, with 59996934 rows and 1 columns.

4546  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           34.92 seconds

      user cpu time       30.04 seconds

      system cpu time     3.54 seconds

      memory              134.80k

      OS Memory           950128.00k

      Timestamp           05/18/2012 01:28:03 PM

4547

4548  proc sql;

4549  create table table3 as select * from big where var in (3 4 5 6 7 8);

NOTE: Table WORK.TABLE3 created, with 59996934 rows and 1 columns.

4550  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           22.10 seconds

      user cpu time       16.35 seconds

      system cpu time     3.29 seconds

      memory              133.17k

      OS Memory           950128.00k

      Timestamp           05/18/2012 01:28:25 PM


and with some character data ...


4565  proc sql;

4566  create table table1 as select * from big where var between '03' and '08';

NOTE: Table WORK.TABLE1 created, with 59996934 rows and 1 columns.

4567  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           17.75 seconds

      user cpu time       15.28 seconds

      system cpu time     1.68 seconds

      memory              133.17k

      OS Memory           276080.00k

      Timestamp           05/18/2012 01:33:35 PM

4568

4569  proc sql;

4570  create table table3 as select * from big where var in ('03' '04' '05' '06' '07' '08');

NOTE: Table WORK.TABLE3 created, with 59996934 rows and 1 columns.

4571  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           18.32 seconds

      user cpu time       15.51 seconds

      system cpu time     1.86 seconds

      memory              133.17k

      OS Memory           276080.00k

      Timestamp           05/18/2012 01:34:05 PM

ChrisNZ
Tourmaline | Level 20

Wow, the range operator does seem to behave strangely.

I would ask tech support to check that behaviour and record the defect if confirmed.

The in() operator is slightly slower it seems and I would expect it to grow slower as you add values to the list:

between only requires 2 tests (should be equivalent to Z >= X and X >= Y)

in() requires as many tests as there are values.

ChrisNZ
Tourmaline | Level 20

The range operator is slower because it calls the int() function.

 

data T;

  set T;

  where X in(1:4);

run;

generates:

 

NOTE: There were 0 observations read from the data set WORK.T.

  WHERE (x=INT(x)) and (x>=1 and x<=4);

NOTE: DATA statement used (Total process time):

real time 0.07 seconds

cpu time 0.00 seconds

So this shortcut is best avoided for large tables.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1312 views
  • 2 likes
  • 7 in conversation