BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ForrestYao
Fluorite | Level 6

Hi all,

 

The following questions confused me a lot, I would like to understand the following codes, specifically, how they impact the results when I put the rename function on the different locations within the data step. 

 

first piece:

Data new_student1;

set student;

rename gender = sex;

if name = 'jack' then gender = 'm';

run;

 

second piece:

Data new_student2;

set student (rename = (gender = sex));

if name = 'jack' then sex = 'm';

run;

 

third piece:

Data new_student3  (rename = (gender = sex));

set student;

if name = 'jack' then gender = 'm';

run;

 

Can anyone explain in detail how the code runs in sequence and what the results for each step are?

 

Much appreciated!

Sam

 

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

The first data step and the second do the same thing, but in different ways. When RENAME is used as a statement as in the first data step, the variable is renamed on the output dataset, but not in the statements in the data step. So, in data step 1, the gender variable is modified, and then renamed to "sex" on the output dataset. In the second data step, the gender variable is renamed when the data is read, so it is called "sex" in the data step, and in the output data.

 

RENAME used as a statement is non-executable, meaning that it does not matter where you put it in the code (unless you put it before the SET statement, in which case the RENAME statement will be the definition of the GENDER variable, meaning it will be numeric, and the code will generate an error).

 

In the third data step, the gender variable is renamed when the data is read, and then a new variable named "gender" is created. The new variable is only assigned a value when name='jack', and the sex variable (which was the original "gender" variable) is not modified.

 

Hope that answers your questions.

View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

Why are you confused?

 

Can you tell us which statement is specifying the INPUT to the data step?

Can you tell us which statement is specify where the OUTPUT of the data step will be saved?

 

Once you can see that then for each step explain

1) What variables names will come into the data step.

2) What variable names will exist while the data step is running.

3) What variable names will be written to the output dataset.

s_lassen
Meteorite | Level 14

The first data step and the second do the same thing, but in different ways. When RENAME is used as a statement as in the first data step, the variable is renamed on the output dataset, but not in the statements in the data step. So, in data step 1, the gender variable is modified, and then renamed to "sex" on the output dataset. In the second data step, the gender variable is renamed when the data is read, so it is called "sex" in the data step, and in the output data.

 

RENAME used as a statement is non-executable, meaning that it does not matter where you put it in the code (unless you put it before the SET statement, in which case the RENAME statement will be the definition of the GENDER variable, meaning it will be numeric, and the code will generate an error).

 

In the third data step, the gender variable is renamed when the data is read, and then a new variable named "gender" is created. The new variable is only assigned a value when name='jack', and the sex variable (which was the original "gender" variable) is not modified.

 

Hope that answers your questions.

ForrestYao
Fluorite | Level 6
thanks, may i ask one more question, i guess the real confusing part for me is when the if statement becomes effective? For example, the first query, rename happens after new_student1 is created, and it will not change table student_score, but when the if statement will happen? will it happen before or afte r the rename statement? because in new_student1, the gender attribute has already been renamed to sex, then why the if statement still says gender = M? much appreciated your reply!
ForrestYao
Fluorite | Level 6
i meant student instead of student_score, just FYI
Tom
Super User Tom
Super User

The IF statement is a EXECUTABLE statement.  Just like an assignment statement.

It executes for every iteration of the data step when the program flow gets to that part of the program.   So in your simple data steps the IF statement will execute once for every observation read in by the SET statement.

 

The RENAME statement is NOT an EXECUTABLE statement. It just helps the compiler set up the data step. So it let's SAS know that when it is ready to setup the output dataset(s) that the name in the new dataset should be the new name, not the name that the variable was referenced by while the data step was executing.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 814 views
  • 1 like
  • 3 in conversation