- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 09-23-2010 10:40 AM
(10192 views)
in the DATA step, I have my conditional statements as IF THEN,
But time is a practical concern for me, since the dataset is large, so I was wondering if SELECT WHEN tends to run faster than IF THEN.
I've read different things, some posts said IF THEN is less efficient than SELECT WHEN and vice versa.
So, I wanted to ask before I go ahead and change all my IF THEN's into SELECT WHEN's.
And what makes people choose IF THEN or SELECT WHEN if one is more efficient than the other?
Any suggestions from your experience? Thank you.
But time is a practical concern for me, since the dataset is large, so I was wondering if SELECT WHEN tends to run faster than IF THEN.
I've read different things, some posts said IF THEN is less efficient than SELECT WHEN and vice versa.
So, I wanted to ask before I go ahead and change all my IF THEN's into SELECT WHEN's.
And what makes people choose IF THEN or SELECT WHEN if one is more efficient than the other?
Any suggestions from your experience? Thank you.
6 REPLIES 6
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It seems that you are asking two questions, performance and preference.
Performance: "your mileage may vary" and you will just need to test it for your situation. My guess is that performance differences will be a function of the complexity of the SELECT and the number of WHEN categories, as the is complied just once for each SELECT and is compiled separately for each IF statement. So, a SELECT with a NULL expression [e.g. SELECT () ] may well perform very similarly to an IF/THEN and a complex expression could well be faster.
On preference, I tend to use IF statements for few levels of nesting or when the expression changes from one level to the next. I find it easier to read.
Doc Muhlbaier
Duke
Performance: "your mileage may vary" and you will just need to test it for your situation. My guess is that performance differences will be a function of the complexity of the SELECT
On preference, I tend to use IF statements for few levels of nesting or when the expression changes from one level to the next. I find it easier to read.
Doc Muhlbaier
Duke
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The worst performer is the repeated IF... THEN... statements that decorate the code like an ugly wall-paper:
[pre]
if a then x=1;
if b then x=2;
if c then x=3;
[/pre]
As far as I know, there is no special compiler optimization going on for the SELECT statement, thus, the IF... THEN... ELSE IF... and SELECT... WHEN ... should perform equally well.
You can speed up the execution a little bit by testing the most likely condition first and then least likely condition last.
[pre]
if a then x=1;
if b then x=2;
if c then x=3;
[/pre]
As far as I know, there is no special compiler optimization going on for the SELECT statement, thus, the IF... THEN... ELSE IF... and SELECT... WHEN ... should perform equally well.
You can speed up the execution a little bit by testing the most likely condition first and then least likely condition last.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
probably more important than if then / else vs select when (I've done no testing comparing these) - but standard advice to optimize performance is to test the most common condition first, second most common second, etc.
Easily seen using the reverse case - for a dataset with 10M rows, if the most common case is listed last in a sequence of 100 tests, then the most common case requires 100 * 10M tests. If listed first it requires 1 * 10M tests.
Easily seen using the reverse case - for a dataset with 10M rows, if the most common case is listed last in a sequence of 100 tests, then the most common case requires 100 * 10M tests. If listed first it requires 1 * 10M tests.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi:
Another possibility to test and benchmark, if you have only one variable/condition that you need to test with IF/SELECT statements is to use the PROC FORMAT lookup technique. Using the PUT statement with a user-defined format causes the lookup to be done in the format list. SAS uses a binary search technique to search the format list -- which can be efficient depending on the number of lookups and whether or not you list the most common condition first.
This paper has some pros and cons for lookup techniques:
http://www2.sas.com/proceedings/forum2008/095-2008.pdf
cynthia
[pre]proc format;
value newv 11, 12 = 'Yes'
13-15 = 'No'
16 = 'Maybe';
run;
data alt_approach;
length newflag $5;
set sashelp.class;
newflag = put(age,newv.);
run;
ods html file='c:\temp\alt_if.html' style=sasweb;
proc print data=alt_approach;
var name age newflag;
run;
ods html close;
[/pre]
Another possibility to test and benchmark, if you have only one variable/condition that you need to test with IF/SELECT statements is to use the PROC FORMAT lookup technique. Using the PUT statement with a user-defined format causes the lookup to be done in the format list. SAS uses a binary search technique to search the format list -- which can be efficient depending on the number of lookups and whether or not you list the most common condition first.
This paper has some pros and cons for lookup techniques:
http://www2.sas.com/proceedings/forum2008/095-2008.pdf
cynthia
[pre]proc format;
value newv 11, 12 = 'Yes'
13-15 = 'No'
16 = 'Maybe';
run;
data alt_approach;
length newflag $5;
set sashelp.class;
newflag = put(age,newv.);
run;
ods html file='c:\temp\alt_if.html' style=sasweb;
proc print data=alt_approach;
var name age newflag;
run;
ods html close;
[/pre]
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
WaltSmith posted:
>probably more important than if then / else vs select when (I've done no testing comparing these) - but standard advice to optimize performance is to test the most common condition first, second most common second, etc.
>Easily seen using the reverse case - for a dataset with 10M rows, if the most common case is listed last in a sequence of 100 tests, then the most common case requires 100 * 10M tests. If listed first it requires 1 * 10M tests.
...
Hi, Walt,
Your suggestion may have been true some time ago, but no longer.
We don't have to argue about it. Below code shows that at least on my two year old PC, the CPU time saving by re-arranging the if conditions is not big enough to make the elapsed time difference at all, even when we force sas to load the input data on the memory.
Notice that my input data set has 10M obs with only 2 vars, which is unrealistically conservative. In reality, there will be more variables and this will only increase the time SAS spends on I/O instead of evaluating logical expressions or branching.
Conclusion? It does help to order your if conditions so that the most common cases listed first, but only a *little* bit, if any.
(Code was too long for this forum: posted on sas-l:
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind1009d&L=sas-l&D=1&O=D&P=18636)
>probably more important than if then / else vs select when (I've done no testing comparing these) - but standard advice to optimize performance is to test the most common condition first, second most common second, etc.
>Easily seen using the reverse case - for a dataset with 10M rows, if the most common case is listed last in a sequence of 100 tests, then the most common case requires 100 * 10M tests. If listed first it requires 1 * 10M tests.
...
Hi, Walt,
Your suggestion may have been true some time ago, but no longer.
We don't have to argue about it. Below code shows that at least on my two year old PC, the CPU time saving by re-arranging the if conditions is not big enough to make the elapsed time difference at all, even when we force sas to load the input data on the memory.
Notice that my input data set has 10M obs with only 2 vars, which is unrealistically conservative. In reality, there will be more variables and this will only increase the time SAS spends on I/O instead of evaluating logical expressions or branching.
Conclusion? It does help to order your if conditions so that the most common cases listed first, but only a *little* bit, if any.
(Code was too long for this forum: posted on sas-l:
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind1009d&L=sas-l&D=1&O=D&P=18636)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would agree that its going to help only a little bit - computers are very fast these days - but the question was about which is faster if then else vs select when - the difference between these options I suspect is going to be miniscule - greater savings (however small) will be obtained by ordering the test conditions.