Help using Base SAS procedures

IN operator filtering

Reply
Super Contributor
Posts: 326

IN operator filtering


Hi,

Is there a performance difference between numeric and character values when used as filter for the IN operator?

Ex.

select * from a where var in(1,2,3,4);

VS

select * from a where var in('1','2','3','4');

Thanks!

Super User
Posts: 10,028

Re: IN operator filtering

As My imagination ..

Episode1 is faster than Episode2.

Of course, I am not one hundred percent sure.

Ksharp

Frequent Contributor
Posts: 101

Re: IN operator filtering

And I think that the performance difference would only be noticeable (if ever) when processing millions+ observations.

Super User
Posts: 11,343

Re: IN operator filtering

Posted in reply to SAS_Bigot

Lengths of strings may be a factor if the strings  don't differ until you get to character 200 or so ...

But that's just a guess and since you're unlikely to be generating numbers with that many significant figures probably immaterial.

Super User
Posts: 5,505

Re: IN operator filtering

You may have more of a point than you realize.  Numerics are typically stored in 8 bytes.  If the character string is truly 1 byte long, that's probably faster.  Of course, that might be just for illustration purposes in the original post and the actual lengths might be different.

Super User
Posts: 10,028

Re: IN operator filtering

Posted in reply to SAS_Bigot

Hi. SASBigot,

Did you test it ?

I hope you can give us some LOG information to show how different they are.

Ksharp

Frequent Contributor
Posts: 101

Re: IN operator filtering

I didn't plan on testing it, but since it's Friday here you go. See how this runs in your environment.

* create 2 identically valued datasets that only ;
* differ by variable attributes;
data
   dschar (keep=varchar)
   dsnum (keep=varnum);
length varchar $1;
* 2 million obs - values 0-9 uniformly dist;
do i = 1 to 2e6;
   varnum = int(ranuni(5) * 10);
   varchar = put( varnum, 2.-l );
   output;
end;
run;

options fullstimer;

* the following data steps select ~ 40% of obs;
data testchar;
set dschar;
where varchar in ('1','2','3','4');
run;

data testnum;
set dsnum;
where varnum in (1,2,3,4);
run;


I ran the 2 data steps several times and got similar results. I pasted part of the log below. This ran on my laptop with Windows XP on Intel Core2 Dual.

118  data testchar;
119  set dschar;
120  where varchar in ('1','2','3','4');
121  run;

NOTE: There were 799284 observations read from the data set WORK.DSCHAR.
      WHERE varchar in ('1', '2', '3', '4');
NOTE: The data set WORK.TESTCHAR has 799284 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.39 seconds
      user cpu time       0.15 seconds
      system cpu time     0.04 seconds
      Memory                            206k
      OS Memory                         8304k
      Timestamp            5/11/2012  10:38:39 AM


122
123  data testnum;
124  set dsnum;
125  where varnum in (1,2,3,4);
126  run;

NOTE: There were 799284 observations read from the data set WORK.DSNUM.
      WHERE varnum in (1, 2, 3, 4);
NOTE: The data set WORK.TESTNUM has 799284 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           2.53 seconds
      user cpu time       0.21 seconds
      system cpu time     0.09 seconds
      Memory                            206k
      OS Memory                         8304k
      Timestamp            5/11/2012  10:38:42 AM

Super Contributor
Posts: 326

Re: IN operator filtering

Hi all,

Thank you for your comments. In addition to my question above, I would also like to ask if there is which one is faster:

where var in('A','B','C',........'Z') /* a number of values in the IN operator */

VS

where var in('A','B','C') /* few values in the IN operator. in this case only 3 */

Many thanks!

Super User
Posts: 10,028

Re: IN operator filtering

Of course, it is second, which has fewer members and less time to search .

Super User
Posts: 11,343

Re: IN operator filtering

I suspect that the ORDER of the comparison values may make a difference as well. From other SAS comparisons as soon as a true condition is found, such as in a long string of OR comparisons, the SAS stops evaluating and returns true. If you have some idea which of your values might be true more often, place them at the start of the list.

Super User
Posts: 10,028

Re: IN operator filtering

Oh. That might be true.

Did you mean something like SELECT statement? That makes some sense.

Ksharp

Ask a Question
Discussion stats
  • 10 replies
  • 322 views
  • 0 likes
  • 5 in conversation