Hi,
Is there a performance difference between numeric and character values when used as filter for the IN operator?
Ex.
select * from a where var in(1,2,3,4);
VS
select * from a where var in('1','2','3','4');
Thanks!
As My imagination ..
Episode1 is faster than Episode2.
Of course, I am not one hundred percent sure.
Ksharp
And I think that the performance difference would only be noticeable (if ever) when processing millions+ observations.
Lengths of strings may be a factor if the strings don't differ until you get to character 200 or so ...
But that's just a guess and since you're unlikely to be generating numbers with that many significant figures probably immaterial.
You may have more of a point than you realize. Numerics are typically stored in 8 bytes. If the character string is truly 1 byte long, that's probably faster. Of course, that might be just for illustration purposes in the original post and the actual lengths might be different.
Hi. SASBigot,
Did you test it ?
I hope you can give us some LOG information to show how different they are.
Ksharp
I didn't plan on testing it, but since it's Friday here you go. See how this runs in your environment.
* create 2 identically valued datasets that only ;
* differ by variable attributes;
data
dschar (keep=varchar)
dsnum (keep=varnum);
length varchar $1;
* 2 million obs - values 0-9 uniformly dist;
do i = 1 to 2e6;
varnum = int(ranuni(5) * 10);
varchar = put( varnum, 2.-l );
output;
end;
run;
options fullstimer;
* the following data steps select ~ 40% of obs;
data testchar;
set dschar;
where varchar in ('1','2','3','4');
run;
data testnum;
set dsnum;
where varnum in (1,2,3,4);
run;
I ran the 2 data steps several times and got similar results. I pasted part of the log below. This ran on my laptop with Windows XP on Intel Core2 Dual.
118 data testchar;
119 set dschar;
120 where varchar in ('1','2','3','4');
121 run;
NOTE: There were 799284 observations read from the data set WORK.DSCHAR.
WHERE varchar in ('1', '2', '3', '4');
NOTE: The data set WORK.TESTCHAR has 799284 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.39 seconds
user cpu time 0.15 seconds
system cpu time 0.04 seconds
Memory 206k
OS Memory 8304k
Timestamp 5/11/2012 10:38:39 AM
122
123 data testnum;
124 set dsnum;
125 where varnum in (1,2,3,4);
126 run;
NOTE: There were 799284 observations read from the data set WORK.DSNUM.
WHERE varnum in (1, 2, 3, 4);
NOTE: The data set WORK.TESTNUM has 799284 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 2.53 seconds
user cpu time 0.21 seconds
system cpu time 0.09 seconds
Memory 206k
OS Memory 8304k
Timestamp 5/11/2012 10:38:42 AM
Hi all,
Thank you for your comments. In addition to my question above, I would also like to ask if there is which one is faster:
where var in('A','B','C',........'Z') /* a number of values in the IN operator */
VS
where var in('A','B','C') /* few values in the IN operator. in this case only 3 */
Many thanks!
Of course, it is second, which has fewer members and less time to search .
I suspect that the ORDER of the comparison values may make a difference as well. From other SAS comparisons as soon as a true condition is found, such as in a long string of OR comparisons, the SAS stops evaluating and returns true. If you have some idea which of your values might be true more often, place them at the start of the list.
Oh. That might be true.
Did you mean something like SELECT statement? That makes some sense.
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.