BookmarkSubscribeRSS Feed
deleted_user
Not applicable
What is the benefit of using else then with specified ranges?
data one;
set two;
if 8 gt age lt 10 then agecat=1;
else if 11 gt age lt 14 then agecat=2;
run;

Is there any benefit to adding "else" in the statement if you specify the range?

I'm just doing some searching and wondering if it's a preference of serves a purpose
21 REPLIES 21
NickR
Quartz | Level 8
Using IF-THEN statements without the ELSE statement causes SAS to evaluate all IF-THEN statements.

Using IF-THEN statements with the ELSE statement causes SAS to execute IF-THEN statements until it encounters the first true statement. Subsequent IF-THEN statements are not evaluated.

You may refer SAS Online Documentation for more details.
chang_y_chung_hotmail_com
Obsidian | Level 7
Maybe no benefits at all. See some of more recently published SAS code, for instance,
KNOWLEDGE BASE / SAMPLES & SAS NOTES 33017 -- It does not bother to use ELSE IF at all. It goes, in part, like:



data new.newshoes;


 set sashelp.shoes; /* the original data set */


         if PRODUCT = "Boot" then NEWPRODUCT=1;


         if PRODUCT = "Men's Casual" then NEWPRODUCT=2;


         if PRODUCT = "Women's Casual" then NEWPRODUCT=3;


         if PRODUCT = "Men's Dress" then NEWPRODUCT=4;


         if PRODUCT = "Women's Dress" then NEWPRODUCT=5;


         if PRODUCT = "Sandal" then NEWPRODUCT=6;


         if PRODUCT = "Slipper" then NEWPRODUCT=7;


         if PRODUCT = "Sport Shoe" then NEWPRODUCT=8;


 


format NEWPRODUCT productfmt.;


run;

Gatineau1
Calcite | Level 5
Simplify your life with SELECT instead of IF THEN ELSE

data new.newshoes;
set sashelp.shoes;
Select(PRODUCT);
When ("Boot") NEWPRODUCT=1;
When ("Men's Casual") NEWPRODUCT=2;
..........
..........
Otherwise;
end;
run;

Message was edited by: Gatineau1
Gatineau1
Calcite | Level 5
None Message was edited by: Gatineau1
Reeza
Super User
What Nick R said.

Small enough difference, but this doesn't do much of a comparison, probably more pronounced in character or other types of comparison.
[pre]
data ifelse;
do i=1 to 1000000;
x=rand('normal', 0,1);
if x>0 then y=1;
else if x<=0 and x>-0.5 then y=0;
else if x<=-0.5 then y=2;
output;
end;
run;


data if;
do i=1 to 1000000;
x=rand('normal', 0,1);
if x>0 then y=1;
if x<=0 and x>-0.5 then y=0;
if x<=-0.5 then y=2;
output;
end;
run;
[/pre]

Message was edited by: Reeza Message was edited by: Reeza
Peter_C
Rhodochrosite | Level 12
I think there is no benefit in excluding the ELSE in that example.
I prefer to avoid the IF as well as the ELSE for these situations.
I would create a "user defined format" with PROC FORMAT and use a put function to derive that ageCat, something like

ageCat = put( age, myFmt. );


creating that range-to-value format is simply:
proc format;
value myFmt
8 - 10 = 1
11-14 = 2
;
run;


of course the nuisance is that the put() returns a string only, but if the category really has to be a number, then
ageCat = input( put( age, myFmt. ), best8. ) ;

Probably it is always worth using the user-format approach when there is only one outcome variable for a series of if-tests.

peterC
ChrisNZ
Tourmaline | Level 20
If there is only one outcome variable for a series of if-tests, I also find
[pre]
data new.newshoes;
set sashelp.shoes;
NEWPRODUCT=ifn(PRODUCT = "Boot" , 1
,ifn(PRODUCT = "Men's Casual" , 2
,ifn(PRODUCT = "Women's Casual", 3
,ifn(PRODUCT = "Men's Dress" , 4
,ifn(PRODUCT = "Women's Dress" , 5
,ifn(PRODUCT = "Sandal" , 6
,ifn(PRODUCT = "Slipper" , 7
,ifn(PRODUCT = "Sport Shoe" , 8
, . ))))))));
run;
[/pre]
more legible than if tests. Otherwise the select statement is also handy.
Peter_C
Rhodochrosite | Level 12
Would the which() function help in that situation?
ChrisNZ
Tourmaline | Level 20
Peter, in this case it would, but this is an example, values are random and don't matter here.


One thing: the embedded ifn() suite above would be a wee bit slower than a suite of if else statement as sas in their wisdom have chosen to always evaluate all arguments of the ifn/ifc functions, even though by design some will never be used. I hope that one day C=ifn(B=0,.,A/B); will not generate a division by zero error when B=0. It shouldn't.



Anyway, I am ready to trade the tiny performance hit for better code legibility.

Peter_C
Rhodochrosite | Level 12
> Peter, in this case it would, but this is an example,
> values are random and don't matter here.
?????

> One thing: the embedded ifn() suite above would be a
> wee bit slower than a suite of if else statement
> Anyway, I am ready to trade the tiny performance hit
> for better code legibility.

Chris

I thought the informat technique would provide the clearest code while being simplest to manage.
Does it not also perform better than a series of logic tests on the same data (with/out ELSE)?

peterC
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
I would say that a SAS FORMAT look-up (using a PUT function, or sometimes when using a PROC with a FORMAT statement) can be a more efficient performer depending on the search size. And I also would agree that the idea of maintaining a SAS format (no moving parts, like SAS statements) can also be more effective / efficient from a support perspective, particularly when using CNTLIN= to generate the FORMAT itself.

As well, consider that PROC FORMAT was enhanced with SAS version 6 to permit character-value range strings which afforded the opportunity for data-string prefix wildcarding, when coded correctly. Consider that SAS can be influenced in the search/match behavior, choosing either "most discrete match" or "first entry match" when using a particular user FORMAT, as well.

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic / post:

proc format character value range site:sas.com
ChrisNZ
Tourmaline | Level 20
I meant to say that several embedded ifn() calls can be more legible than if/then statements.


Informats can be clearer or not, and easier to maintain or not depending on the situation. They are defined outside of the datastep in any case, so require non-sequential reading to understand the logic.



All methods have their place, as usual: select, if/then, ifn(), proc format with values, proc format with cntlin datasets. I just wanted to mention embedded ifn()'s legibility as this function is more recent.



Formats using put() can be slower than tests, as functions are much slower than tests. In any case, unless the code is run billions of times it doesn't really matter. Legibility matters most imho.



Here are a few tests. [pre]
proc format;
value type low - 12 = '1'
13 - 19 = '2'
20 - high= '3';
run;
data _null_; * format 5.7s;
AGE=1;
do I=1 to 1e8;
TYPE=put(AGE, type.);
end;
run;
data _null_; * format+informat 6.9s;
AGE=1;
do I=1 to 1e8;
TYPE=input(put(AGE, type.),32.);
end;
run;
data _null_; * if 0.6s;
AGE=1;
do I=1 to 1e8;
if AGE le 12 then TYPE= 1;
if 13 le AGE le 19 then TYPE= 2;
if AGE ge 20 then TYPE= 3;
end;
run;
data _null_; * if else x1 0.5s;
AGE=1;
do I=1 to 1e8;
if AGE le 12 then TYPE= 1;
else if 13 le AGE le 19 then TYPE= 2;
else if AGE ge 20 then TYPE= 3;
end;
run;
data _null_;
AGE=31; * if else x3 0.7s;
do I=1 to 1e8;
if AGE le 12 then TYPE= 1;
else if 13 le AGE le 19 then TYPE= 2;
else if AGE ge 20 then TYPE= 3;
end;
run;
data _null_;
AGE=31; * ifn 7.8s;
do I=1 to 1e8;
TYPE=ifn( AGE le 12 , 1
,ifn( 13 le AGE le 19 , 2
,ifn( AGE ge 20 , 3, .)));
end;
run;
data _null_; * select 0.9s;
AGE=31;
do I=1 to 1e8;
select;
when( AGE le 12) TYPE= 1;
when(13 le AGE le 19) TYPE= 2;
when( AGE ge 20) TYPE= 3;
otherwise;
end;
end;
run;
data _null_; * test as number 1.8s;
AGE=1;
do I=1 to 1e8;
TYPE=( AGE le 12 ) * 1
+( 13 le AGE le 19 ) * 2
+( AGE ge 20 ) * 4;
end;
run;
[/pre]

Added "test as number" benchmark (Thanks André).
ChrisNZ
Tourmaline | Level 20
>> Peter, in this case it would, but this is an example,
>> values are random and don't matter here.

>?????

I mean to say that instead of 1,2,3, etc, resulting values could be 1, 22, 56.

Chances are test-result values they will almost never be 1,2,3 in real life.
The discussion is about the best testing techniques. Which() only works if you want to retrieve the 1,2,3 suite of values, which have been chosen here by pure chance/laziness/lack of imagination. I have changed 3 to 4 in my last benchmark above. 🙂


>I thought the informat technique would provide the clearest code while being simplest to manage.

If one has a few cut-off values, I find having the values visible within the data step convenient most times. Of course there are times when one wants to keep all cut-off values together somewhere. Also, if there is a large number of cut-offs, the data step might be more legible without them. Clarity is in the eye of the beholder. 🙂

>Does it not also perform better than a series of logic tests on the same data (with/out ELSE)?

Not in the runs above. Functions are slower than tests. Using formats in procs is still very fast and convenient though.
Peter_C
Rhodochrosite | Level 12
Chris
may I respond?

> >> Peter, in this case it would, but this is an example,
> >> values are random and don't matter here.
>
> >?????
>
> I mean to say that instead of 1,2,3, etc, resulting
> values could be 1, 22, 56.

That sounds like replacing input strings or values with a randomly distributed (but subset of the continuum of) values.
I have nothing to offer as a "best way" to derive one randon set of values from another, but I do have a use for (and thought we were discussing) determining an array index or table pointer from raw data.

> The discussion is about the best testing techniques.
> Which() only works if you want to retrieve the 1,2,3


>
> >I thought the informat technique would provide the clearest code while being simplest to manage.

> If one has a few cut-off values, I find having the
> values visible within the data step convenient most
> times. Of course there are times when one wants to
> keep all cut-off values together somewhere. Also, if
> there is a large number of cut-offs, the data step
> might be more legible without them. Clarity is in the
> eye of the beholder. 🙂

> >Does it not also perform better than a series of logic tests on the same data (with/out ELSE)?

> Not in the runs above.
> Functions are slower than tests.

This will be true when there are 4 tests for 1 function, and perhaps above 10:1.
I guess the threshhold is platform dependant.

I had in mind a credit risk staging area for a bank in which their solution for deriving "business product hierarchy" from a couple of data items was if/then/else SAS code built with a VBA macro from a decision tree table stored in excel.
For the user, changes meant re-aligning the decision tree, re-running the VBA macro and moving the generated code to the staging area application. The generated if/then/else logic ran to over 3000 lines (around 1500 tests). It was easy to avoid these tests appearing on the SASlog (/nosource2 on the %include).
Maintaining that VBA became unmanageable after personnel changed more than once. Re-writing the VBA in SAS did not take long, but still generated the vast collection of if/then/else code. The alternative solution, involving 2 informats with cntlin data sets from the decision trees, enormously reduced the if/then/else logic and ran a whole lot faster.

> Using formats in procs is still very fast and convenient though.

sure is!
and it is handy to have consistency when using the same format in both PROCs and data steps.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 4571 views
  • 0 likes
  • 9 in conversation