BookmarkSubscribeRSS Feed
SAS93
Quartz | Level 8

I've seen how different codes might do:

 

If PAY in (1,2) then PAY_CAT = 1;

Else if PAY in (3) then PAY_CAT =2;

Else PAY_CAT = .;

 

Versus below, where a different frequency for each group would be output:

 

If PAY in (.) then PAY_CAT = .;

Else if PAY in (1,2) then PAY_CAT = 1;

Else PAY_CAT = 2;

 

Am I missing (no pun intended) something? I was taught and have read numerous times that IF-THEN-ELSE is usually advised to be written in "descending order" of probability/frequency of occurrence in the dataset, so naturally I put missing at the bottom. But some official CDC and other code guidance doesn't use that approach all the time, instead placing missing at the top of some (while putting it at the bottom in others).

 

What gives, and how can I best decide when to use one approach over the other? 

 

 

4 REPLIES 4
Tom
Super User Tom
Super User

The main reason would be if the other test involved performing operations on the value of the variable that might be missing. By eliminating the cases with missing values you prevent performing calculations on missing values.

if N(PAY,PAY_MEAN,PAY_STD) < 3 then PAY_CATEGORY='MISSING';
else if ( (PAY-PAY_MEAN)/PAY_STD) > 2 ) then PAY_CATEGORY='HIGH';
else PAY_CATEGORY='OTHER';

NOTE:  Your original example will result in different PAY_CAT when PAY has a value like 10 that is not in the list (. 1 2 3).

ballardw
Super User

Sometimes the order of evaluations, i.e. the IF /Then will relate to execution time. If you have large data sets then placing the evaluation that is more likely first can result in reduced run time for code. The "else" only gets considered when the "if" is false. So if you know which is more likely to be true test it first.

 

Another cause can be what I think of as "archeological". A change needed to be made to code and the order chosen may have been the easiest that generated desired output with minimal change to other existing code.

 

There is also the SELECT/When block that may be more appropriate for multiple comparisons.

Select (Pay);
   when (1,2) Pay_cat=1;
   when (3)   Pay_cat=2;
   when (.)   Pay_cat=.;
   otherwise ; /* this what is executed when no listed condition is true*/
end;

Not much nicer code when using 3 values but dealing with 30 can be significantly easier to maintain. Besides a simple assignment you can use Do/End blocks as the result for any When group.

 

And of course another option when dealing with single variable values custom Informat or Formats might be appropriate.

 

Astounding
PROC Star

First, note that the logic might be different when PAY_CAT already exists vs. when it is a new variable created by your programming statements.

 

This statement can be removed when PAY_CAT is not in your incoming data:

 

else PAY_CAT = .;

It begins with a missing value, and if you never change it it ends up with a missing value.

 

Second, if speed is of the essence then order of your statements makes a difference

if PAY in (1,2) then PAY_CAT = 1;
else if PAY in (3) then PAY_CAT =2;

These statements run faster when PAY is usually 1 or 2.  But if PAY is usually 3, this would run faster:

if PAY in (3) then PAY_CAT = 2;
else if PAY in (1, 2) then PAY_CAT =1;

So perhaps the best code guidance would be to take it on a case by case basis and not blindly follow a set of rules.

mkeintz
PROC Star

I wonder if putting the specific condition that results in a missing is just a protocol to make the coder more likely to account for conditions that should result in such an assignment - even at the cost of some lost efficiency.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 558 views
  • 0 likes
  • 5 in conversation