BookmarkSubscribeRSS Feed
tennytivvytutu
Fluorite | Level 6

I am working with historical code, and my task is to update the code to run more efficiently. I have a very specific question about using an IF statement to subset the data. Code is below. 

 

DATA sci_tested;

SET 2022.Science (rename = (performance = perform));

IF perform = 1 THEN X = 1; ELSE X = 0;

IF perform = 2 THEN Y = 1; ELSE Y = 0;

IF perform ne .;

tested = 1;

RUN;

 

According to SAS documentation, IF perform should return all values in perform that are not missing or 0. If this is the case, what is the difference between IF perform; and IF perform ne .;?

 

Is the second IF statement redundant? Could I not achieve the same results by simply using the first IF statement? 

 

I am further unclear on the tested = 1; line of code. 

 

Can someone walk through this section of code with me, to help understand the functions of IF THEN ELSE vs IF vs the last "tested = 1" line? 



Thanks in advance!

2 REPLIES 2
Kurt_Bremser
Super User
if perform;

will be true when perform is neither zero nor missing.

if perform ne .;

will also be true when perform is zero.

These two statements

IF perform = 1 THEN X = 1; ELSE X = 0;
IF perform = 2 THEN Y = 1; ELSE Y = 0;

can be simplified to

x = (perform = 1);
y = (perform = 2);

You can even omit the brackets, I only used them for clarity.

Tom
Super User Tom
Super User

So this pair of statements :

 

IF perform = 1 THEN X = 1; 
ELSE X = 0;

Will set to X to either 1 or 0 depending on the value of PERFORM.

 

 

The second IF/THEN/ELSE pair is making a different variable, Y in this case.

 

This statement will determine if the data step iteration continues or not.  

 

IF perform ne .;

If the condition is TRUE then this iteration continues on to the following statements, including the implied OUTPUT at the end of the data step.  So when it is FALSE that observation is NOT output. 

 

 

Is the variable Y redundant? Yes.  The values of X and Y contain the same information.  Whenever X is 1 then Y is 0 and whenever X is 0 then Y is 1. 

 

And if PERFORM never has values like zero or .Z then both X and Y are redundant to PERFORM since there are only two possible values of each and only two possible combinations of all three.

PERFORM  X Y
1        1 0
2        0 1

 

 

This line is executed unconditionally for every observation that is going to be written so every observation will have TESTED=1.

tested = 1;

As to why it is there you would need to see how the variable is used later on.

 

 

These two statements are using a different test to determine whether or not to continue:

IF perform; 
IF perform ne .;

The first one is FALSE when PERFORM is zero or has any of SAS's 28 different missing values. (., ._, and .A to .Z).

 

The second is FALSE only when PERFORM is just the regular missing value.  So PREFORM=0 values and PERFORM=.Z values will make it past this statement but would not make it past the first one.

 

It is marginally more efficient to subset the observations earlier.  There is no need to create X or Y values for observations that are going to get discarded.

DATA sci_tested;
  SET Science (rename = (performance = perform));
  IF perform ne .;
  IF perform = 1 THEN X = 1; ELSE X = 0;
  IF perform = 2 THEN Y = 1; ELSE Y = 0;
  tested = 1;
RUN;

You could also convert the subsetting IF to a WHERE and avoid even reading those observations into the data step. (Normally that is more important when the input is from some external database as SAS can potentially avoid moving those observations from the remote database to the machine where SAS is running).

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 499 views
  • 2 likes
  • 3 in conversation