Solved: Filtering Data - Efficiency Gains by removing ANDs?

GBL__ · Posted 03-07-2023 12:51 PM

I know this subject line is not the best, but I am curious if there are any efficiency gains from rewriting a filter:

numeric_variable > 0 AND numeric_variable < 50000

to just be this:

0 < numeric_variable < 50000

I know using subsetting WHERE is better than IF, when possible, but interested to know if there is any background logic that is applied when using filter example #2 that increases efficiency/speed of processing, or if there are performance differences between using them in DATA steps vs. PROC SQL.

Thanks in advance for any guidance!

ballardw · Posted 03-07-2023 02:08 PM

In general SQL is slower than a data step. If nothing else, the implied overhead available for the aggregate functions across observations is an issue. You may not see any practical difference with fewer than many thousands of observations though.

One of the bigger efficiencies, IMHO, is that when I see code with

0 < numeric_variable < 50000

it is pretty obvious what is intended. Between indentation choices and lengths of statement lines that may not be quit as obvious with the 'and' involved.

That advantage becomes more obvious with more complex expressions like

0 < numeric_variable < another_variable < 50000

View solution in original post

ballardw · Posted 03-07-2023 02:08 PM

In general SQL is slower than a data step. If nothing else, the implied overhead available for the aggregate functions across observations is an issue. You may not see any practical difference with fewer than many thousands of observations though.

One of the bigger efficiencies, IMHO, is that when I see code with

0 < numeric_variable < 50000

it is pretty obvious what is intended. Between indentation choices and lengths of statement lines that may not be quit as obvious with the 'and' involved.

That advantage becomes more obvious with more complex expressions like

0 < numeric_variable < another_variable < 50000

GBL__ · Posted 03-10-2023 08:14 AM

Thank you, @ballardw for your response! I agree completely that readability is key and was my major reason for using this type of filter/subsetting layout.

Thanks again!

s_lassen · Posted 03-08-2023 03:41 AM

AFAIK, there is no difference in efficiency between the two ways of expressing the filter.

The advantage to the second may be, as @ballardw remarked, that it is easier to read. The problem with it is that e.g. a<b<c is a rather specific SAS shorthand, which does not work the same way in other languages (in standard SQL you get a syntax error, and in C++ it may mean something completely different).

GBL__ · Posted 03-10-2023 08:16 AM

Thank you for your response! I also agree that the 'a < b < c' filtering is rather specific SAS shorthand, but nonetheless is probably most appropriate (in my situation, at least)

Filtering Data - Efficiency Gains by removing ANDs?

Re: Filtering Data - Efficiency Gains by removing ANDs?

Re: Filtering Data - Efficiency Gains by removing ANDs?

Re: Filtering Data - Efficiency Gains by removing ANDs?

Re: Filtering Data - Efficiency Gains by removing ANDs?

Re: Filtering Data - Efficiency Gains by removing ANDs?

Catch up on SAS Innovate 2026

Filtering Data - Efficiency Gains by removing ANDs?

Re: Filtering Data - Efficiency Gains by removing ANDs?

Re: Filtering Data - Efficiency Gains by removing ANDs?

Re: Filtering Data - Efficiency Gains by removing ANDs?

Re: Filtering Data - Efficiency Gains by removing ANDs?

Re: Filtering Data - Efficiency Gains by removing ANDs?

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away