libname results 'C:\practice everyday\input';
proc sort data = cert.input06 out = input06std;
by department descending income;
run;
data results.output06;
set input06std;
by department descending income;
if first.department;
run;
proc print data = results.output06;
run;
After 'proc sort', the 'input06std' has been sorted by variable department and ordered by variable income,
proc sort data = cert.input06 out = input06std;
by department descending income; (the first time)
run;
why do the same thing again in the next step ----'data step' ?
data results.output06;
set input06std;
by department descending income; (the second time)
if first.department;
run;
Hello,
In the data step, the by-statement is simply 'telling the data step' that there IS a sort order that can be exploited by the code. It's not doing a sort operation or so.
If you don't use the by-statement, the FIRST. and LAST. variables won't exist!!
Kind regards,
Koen
Hello,
In the data step, the by-statement is simply 'telling the data step' that there IS a sort order that can be exploited by the code. It's not doing a sort operation or so.
If you don't use the by-statement, the FIRST. and LAST. variables won't exist!!
Kind regards,
Koen
Thank you for your help.
In the second 'by statement', Can I delete 'descending income' , the resulting output is the same when I run the program.
The variable 'First' use on variable 'department' , so we can delete the code 'descending income' .
Thank you for your enlighten .
@tianerhu wrote:
The variable 'First' use on variable 'department' , so we can delete the code 'descending income' .
Hi @tianerhu,
Wait a moment. It's true that only department is necessary in the BY statement of the DATA step in order to make first.department work. But it's safer (and therefore good practice) to repeat also descending income:
Consider the situation that, for some reason (e.g., later changes to the program), the department BY-groups in dataset INPUT06STD were not sorted by descending income. With the fully repeated BY statement from the PROC SORT step the DATA step would then produce an error message ("ERROR: BY variables are not properly sorted ...") and thus remind you that this second-level sort order is needed for the program logic (i.e., for selecting an observation with the highest income per department). With the "parsimonious" by department;, however, the DATA step would produce a clean log, yet the result might be wrong! You might think that you've selected the highest income per department, as intended, while in fact you've just selected the first observation of each department BY-group, regardless of the income value.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.