BookmarkSubscribeRSS Feed
milts
Pyrite | Level 9


Hi,

Is there a performance difference between numeric and character values when used as filter for the IN operator?

Ex.

select * from a where var in(1,2,3,4);

VS

select * from a where var in('1','2','3','4');

Thanks!

10 REPLIES 10
Ksharp
Super User

As My imagination ..

Episode1 is faster than Episode2.

Of course, I am not one hundred percent sure.

Ksharp

FloydNevseta
Pyrite | Level 9

And I think that the performance difference would only be noticeable (if ever) when processing millions+ observations.

ballardw
Super User

Lengths of strings may be a factor if the strings  don't differ until you get to character 200 or so ...

But that's just a guess and since you're unlikely to be generating numbers with that many significant figures probably immaterial.

Astounding
PROC Star

You may have more of a point than you realize.  Numerics are typically stored in 8 bytes.  If the character string is truly 1 byte long, that's probably faster.  Of course, that might be just for illustration purposes in the original post and the actual lengths might be different.

Ksharp
Super User

Hi. SASBigot,

Did you test it ?

I hope you can give us some LOG information to show how different they are.

Ksharp

FloydNevseta
Pyrite | Level 9

I didn't plan on testing it, but since it's Friday here you go. See how this runs in your environment.

* create 2 identically valued datasets that only ;
* differ by variable attributes;
data
   dschar (keep=varchar)
   dsnum (keep=varnum);
length varchar $1;
* 2 million obs - values 0-9 uniformly dist;
do i = 1 to 2e6;
   varnum = int(ranuni(5) * 10);
   varchar = put( varnum, 2.-l );
   output;
end;
run;

options fullstimer;

* the following data steps select ~ 40% of obs;
data testchar;
set dschar;
where varchar in ('1','2','3','4');
run;

data testnum;
set dsnum;
where varnum in (1,2,3,4);
run;


I ran the 2 data steps several times and got similar results. I pasted part of the log below. This ran on my laptop with Windows XP on Intel Core2 Dual.

118  data testchar;
119  set dschar;
120  where varchar in ('1','2','3','4');
121  run;

NOTE: There were 799284 observations read from the data set WORK.DSCHAR.
      WHERE varchar in ('1', '2', '3', '4');
NOTE: The data set WORK.TESTCHAR has 799284 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.39 seconds
      user cpu time       0.15 seconds
      system cpu time     0.04 seconds
      Memory                            206k
      OS Memory                         8304k
      Timestamp            5/11/2012  10:38:39 AM


122
123  data testnum;
124  set dsnum;
125  where varnum in (1,2,3,4);
126  run;

NOTE: There were 799284 observations read from the data set WORK.DSNUM.
      WHERE varnum in (1, 2, 3, 4);
NOTE: The data set WORK.TESTNUM has 799284 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           2.53 seconds
      user cpu time       0.21 seconds
      system cpu time     0.09 seconds
      Memory                            206k
      OS Memory                         8304k
      Timestamp            5/11/2012  10:38:42 AM

milts
Pyrite | Level 9

Hi all,

Thank you for your comments. In addition to my question above, I would also like to ask if there is which one is faster:

where var in('A','B','C',........'Z') /* a number of values in the IN operator */

VS

where var in('A','B','C') /* few values in the IN operator. in this case only 3 */

Many thanks!

Ksharp
Super User

Of course, it is second, which has fewer members and less time to search .

ballardw
Super User

I suspect that the ORDER of the comparison values may make a difference as well. From other SAS comparisons as soon as a true condition is found, such as in a long string of OR comparisons, the SAS stops evaluating and returns true. If you have some idea which of your values might be true more often, place them at the start of the list.

Ksharp
Super User

Oh. That might be true.

Did you mean something like SELECT statement? That makes some sense.

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1079 views
  • 0 likes
  • 5 in conversation