Hi,
Need help to determine which query is faster between the IN and BETWEEN operators.
select * from tbl where var in(1,2,3,4,n)
VS
select * from tbl where var between 1 and n
Thanks!
I guess FIRST.
For the FIRST, you only compare several sparse integer number ,while at SECOND , you define a range which also consider the float number . So My opinion is FIRST .
Ksharp
Performance questions are answered by testing.
Are the benchmarking routines that will help with such testing?
First is make sure the STIMER option is turned on.
For SQL you then add the STIMER option to the procedure statemen and you'll get a report how long each statement takes.
proc sql stimer;
<your sql code goes here>;
<your other sql code goes here>;
quit;
Results will show real and cpu time for each sql code section.
hi ... only one variable in the data set ... using a data set placed in memory, looks like a tie (lot slower with a range using 3:8) ...
4392 data big;
4393 do _n_ = 1 to 1e8;
4394 var = ceil(10 * ranuni(999));
4395 output;
4396 end;
4397 run;
NOTE: The data set WORK.BIG has 100000000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 24.40 seconds
cpu time 15.06 seconds
4398
4399 sasfile big load;
NOTE: The file WORK.BIG.DATA has been loaded into memory by the SASFILE statement.
4400
4401 options stimer;
4402
4403 proc sql;
4404 create table table1 as select * from big where var between 3 and 8;
NOTE: Table WORK.TABLE1 created, with 59996934 rows and 1 columns.
4405 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 23.04 seconds
cpu time 18.37 seconds
4406
4407
4408 proc sql;
4409 create table table2 as select * from big where var in (3:8);
NOTE: Table WORK.TABLE2 created, with 59996934 rows and 1 columns.
4410 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 38.54 seconds
cpu time 33.85 seconds
4411
4412 proc sql;
4413 create table table3 as select * from big where var in (3 4 5 6 7 8);
NOTE: Table WORK.TABLE3 created, with 59996934 rows and 1 columns.
4414 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 22.54 seconds
cpu time 19.59 seconds
I would suggest FULLSTIMER.
I would not expect the use of range (index operator) to be significantly slower. What's up with that.
hi ... FULLSTIMER and same data (added BETWEEN 8 AND 3) ... agreed, what's with the RANGE time
4536 proc sql;
4537 create table table1 as select * from big where var between 3 and 8;
NOTE: Table WORK.TABLE1 created, with 59996934 rows and 1 columns.
4538 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 22.50 seconds
user cpu time 14.54 seconds
system cpu time 3.93 seconds
memory 133.17k
OS Memory 950128.00k
Timestamp 05/18/2012 01:26:52 PM
4539
4540 proc sql;
4541 create table table1 as select * from big where var between 8 and 3;
NOTE: Table WORK.TABLE1 created, with 59996934 rows and 1 columns.
4542 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 22.17 seconds
user cpu time 15.03 seconds
system cpu time 3.37 seconds
memory 133.17k
OS Memory 950128.00k
Timestamp 05/18/2012 01:27:28 PM
4543
4544 proc sql;
4545 create table table2 as select * from big where var in (3:8);
NOTE: Table WORK.TABLE2 created, with 59996934 rows and 1 columns.
4546 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 34.92 seconds
user cpu time 30.04 seconds
system cpu time 3.54 seconds
memory 134.80k
OS Memory 950128.00k
Timestamp 05/18/2012 01:28:03 PM
4547
4548 proc sql;
4549 create table table3 as select * from big where var in (3 4 5 6 7 8);
NOTE: Table WORK.TABLE3 created, with 59996934 rows and 1 columns.
4550 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 22.10 seconds
user cpu time 16.35 seconds
system cpu time 3.29 seconds
memory 133.17k
OS Memory 950128.00k
Timestamp 05/18/2012 01:28:25 PM
and with some character data ...
4565 proc sql;
4566 create table table1 as select * from big where var between '03' and '08';
NOTE: Table WORK.TABLE1 created, with 59996934 rows and 1 columns.
4567 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 17.75 seconds
user cpu time 15.28 seconds
system cpu time 1.68 seconds
memory 133.17k
OS Memory 276080.00k
Timestamp 05/18/2012 01:33:35 PM
4568
4569 proc sql;
4570 create table table3 as select * from big where var in ('03' '04' '05' '06' '07' '08');
NOTE: Table WORK.TABLE3 created, with 59996934 rows and 1 columns.
4571 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 18.32 seconds
user cpu time 15.51 seconds
system cpu time 1.86 seconds
memory 133.17k
OS Memory 276080.00k
Timestamp 05/18/2012 01:34:05 PM
Wow, the range operator does seem to behave strangely.
I would ask tech support to check that behaviour and record the defect if confirmed.
The in() operator is slightly slower it seems and I would expect it to grow slower as you add values to the list:
between only requires 2 tests (should be equivalent to Z >= X and X >= Y)
in() requires as many tests as there are values.
The range operator is slower because it calls the int() function.
data T;
set T;
where X in(1:4);
run;
generates:
NOTE: There were 0 observations read from the data set WORK.T.
WHERE (x=INT(x)) and (x>=1 and x<=4);
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.00 seconds
So this shortcut is best avoided for large tables.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.