BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tianerhu
Pyrite | Level 9
libname results 'C:\practice everyday\input';
proc sort data = cert.input06 out = input06std;
by department descending income;
run;
data results.output06;
set input06std;
by department descending income;
if first.department;
run;
proc print data = results.output06;
run;

After 'proc sort', the 'input06std' has been sorted by variable department and ordered by variable  income,

proc sort data = cert.input06 out = input06std; by department descending income; (the first time) run;

why do the same thing again in the next step ----'data step' ?

data results.output06; set input06std; by department descending income; (the second time) if first.department; run;

  

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello,

In the data step, the by-statement is simply 'telling the data step' that there IS a sort order that can be exploited by the code. It's not doing a sort operation or so.

If you don't use the by-statement, the FIRST. and LAST. variables won't exist!!

Kind regards,

Koen

View solution in original post

6 REPLIES 6
sbxkoenk
SAS Super FREQ

Hello,

In the data step, the by-statement is simply 'telling the data step' that there IS a sort order that can be exploited by the code. It's not doing a sort operation or so.

If you don't use the by-statement, the FIRST. and LAST. variables won't exist!!

Kind regards,

Koen

tianerhu
Pyrite | Level 9

Thank you for your help.

tianerhu
Pyrite | Level 9

In the second 'by statement', Can I delete 'descending income' , the resulting output is the same when I run the program.

Kurt_Bremser
Super User

@tianerhu wrote:

In the second 'by statement', Can I delete 'descending income' , the resulting output is the same when I run the program.


Just think about it.

Spoiler
Which FIRST. variable is used?
tianerhu
Pyrite | Level 9

The variable 'First' use on variable 'department' , so we can delete the code 'descending income' .

 

Thank you for your enlighten .

FreelanceReinh
Jade | Level 19

@tianerhu wrote:

The variable 'First' use on variable 'department' , so we can delete the code 'descending income' .


Hi @tianerhu,

 

Wait a moment. It's true that only department is necessary in the BY statement of the DATA step in order to make first.department work. But it's safer (and therefore good practice) to repeat also descending income:

 

Consider the situation that, for some reason (e.g., later changes to the program), the department BY-groups in dataset INPUT06STD were not sorted by descending income. With the fully repeated BY statement from the PROC SORT step the DATA step would then produce an error message ("ERROR: BY variables are not properly sorted ...") and thus remind you that this second-level sort order is needed for the program logic (i.e., for selecting an observation with the highest income per department). With the "parsimonious" by department;, however, the DATA step would produce a clean log, yet the result might be wrong! You might think that you've selected the highest income per department, as intended, while in fact you've just selected the first observation of each department BY-group, regardless of the income value.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 782 views
  • 3 likes
  • 4 in conversation