BookmarkSubscribeRSS Feed
SAS_user_n
Calcite | Level 5
in the DATA step, I have my conditional statements as IF THEN,

But time is a practical concern for me, since the dataset is large, so I was wondering if SELECT WHEN tends to run faster than IF THEN.

I've read different things, some posts said IF THEN is less efficient than SELECT WHEN and vice versa.

So, I wanted to ask before I go ahead and change all my IF THEN's into SELECT WHEN's.

And what makes people choose IF THEN or SELECT WHEN if one is more efficient than the other?

Any suggestions from your experience? Thank you.
6 REPLIES 6
Doc_Duke
Rhodochrosite | Level 12
It seems that you are asking two questions, performance and preference.

Performance: "your mileage may vary" and you will just need to test it for your situation. My guess is that performance differences will be a function of the complexity of the SELECT and the number of WHEN categories, as the is complied just once for each SELECT and is compiled separately for each IF statement. So, a SELECT with a NULL expression [e.g. SELECT () ] may well perform very similarly to an IF/THEN and a complex expression could well be faster.

On preference, I tend to use IF statements for few levels of nesting or when the expression changes from one level to the next. I find it easier to read.

Doc Muhlbaier
Duke
chang_y_chung_hotmail_com
Obsidian | Level 7
The worst performer is the repeated IF... THEN... statements that decorate the code like an ugly wall-paper:
[pre]
if a then x=1;
if b then x=2;
if c then x=3;
[/pre]
As far as I know, there is no special compiler optimization going on for the SELECT statement, thus, the IF... THEN... ELSE IF... and SELECT... WHEN ... should perform equally well.

You can speed up the execution a little bit by testing the most likely condition first and then least likely condition last.
WaltSmith
Fluorite | Level 6
probably more important than if then / else vs select when (I've done no testing comparing these) - but standard advice to optimize performance is to test the most common condition first, second most common second, etc.

Easily seen using the reverse case - for a dataset with 10M rows, if the most common case is listed last in a sequence of 100 tests, then the most common case requires 100 * 10M tests. If listed first it requires 1 * 10M tests.
Cynthia_sas
SAS Super FREQ
Hi:
Another possibility to test and benchmark, if you have only one variable/condition that you need to test with IF/SELECT statements is to use the PROC FORMAT lookup technique. Using the PUT statement with a user-defined format causes the lookup to be done in the format list. SAS uses a binary search technique to search the format list -- which can be efficient depending on the number of lookups and whether or not you list the most common condition first.

This paper has some pros and cons for lookup techniques:
http://www2.sas.com/proceedings/forum2008/095-2008.pdf

cynthia
[pre]proc format;
value newv 11, 12 = 'Yes'
13-15 = 'No'
16 = 'Maybe';
run;

data alt_approach;
length newflag $5;
set sashelp.class;
newflag = put(age,newv.);
run;

ods html file='c:\temp\alt_if.html' style=sasweb;
proc print data=alt_approach;
var name age newflag;
run;
ods html close;

[/pre]
chang_y_chung_hotmail_com
Obsidian | Level 7
WaltSmith posted:
>probably more important than if then / else vs select when (I've done no testing comparing these) - but standard advice to optimize performance is to test the most common condition first, second most common second, etc.
>Easily seen using the reverse case - for a dataset with 10M rows, if the most common case is listed last in a sequence of 100 tests, then the most common case requires 100 * 10M tests. If listed first it requires 1 * 10M tests.
...

Hi, Walt,

Your suggestion may have been true some time ago, but no longer.

We don't have to argue about it. Below code shows that at least on my two year old PC, the CPU time saving by re-arranging the if conditions is not big enough to make the elapsed time difference at all, even when we force sas to load the input data on the memory.

Notice that my input data set has 10M obs with only 2 vars, which is unrealistically conservative. In reality, there will be more variables and this will only increase the time SAS spends on I/O instead of evaluating logical expressions or branching.

Conclusion? It does help to order your if conditions so that the most common cases listed first, but only a *little* bit, if any.

(Code was too long for this forum: posted on sas-l:
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind1009d&L=sas-l&D=1&O=D&P=18636)
WaltSmith
Fluorite | Level 6
I would agree that its going to help only a little bit - computers are very fast these days - but the question was about which is faster if then else vs select when - the difference between these options I suspect is going to be miniscule - greater savings (however small) will be obtained by ordering the test conditions.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 8285 views
  • 0 likes
  • 5 in conversation