BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
hcstritz
Calcite | Level 5

Hi all,

I am working through the 2019 SAS Certified Specialist Prep Guide and I am confused about an explanation of the FIRST. and LAST. variables on page 133. Here is the code in the book:

 

data work.budget(keep=dept payroll);
set work.temp ;
by dept ;
if wagecat='S' then Yearly=wagerate*12 ; 
else if wagecat='H' then Yearly=wagerate*2000 ;
if first.dept then Payroll=0 ;
payroll+yearly ;
if last.dept ;
run ; 

The part that I'm confused by is that the explanation for the line

if last.dept ; 

is: "If this observation is the last in the variable, Dept, then end. If not, then read the next observation." 

 

 

I don't understand this explanation - it seems to imply that BY-group processing requires you to tell SAS when the last observation in a BY group occurs. My understanding is that this line tells SAS to output the line where last.Dept=1. However, the next figure (8.1) on page 134 shows all observations for the dataset and says, "When you print the new dataset, you can now list and sum the annual payroll by department."

 

 

proc print data=work.budget noobs ;
sum payroll ;
format payroll dollar12.2 ;
run ; 

I'm confused as to why the second snippet of code is necessary. Shouldn't

if last.dept ;

create the sum of the annual payroll by department? And does this line accomplish anything other than controlling the output?

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

 If this observation is the last in the variable, Dept, then end. If not, then read the next observation.

That is definitely poorly worded.

 

First thing "last in the variable" makes no sense at all or if it has any sense it is NOT what the LAST. variable means.  The LAST.DEPT is TRUE when this OBSERVATION is the last one in the current group of observations that have this value of DEPT.

 

And the second part might sort of be true but does not really explain why the SUBSETTING if statement is being used here.   A subsetting if statement means that when the condition is TRUE that the current iteration continues.  And if it is FALSE the iteration ends immediately, skipping all of the rest of the data step.  Including, most importantly for that data step, the IMPLIED OUTPUT statement that is executed at the end of a data step iteration when the data step does not include any explicit OUTPUT statement.   Like with a  Go to Jail card in Monopoly you go directly to jail without passing GO or collecting $200.

 

So the end result is that only the last observation in each group actually gets written to the output dataset.

View solution in original post

2 REPLIES 2
quickbluefish
Barite | Level 11

The 

if last.dept;

...is just shorthand for

if last.dept=1;

...as you suspected.  It's known as a "subsetting IF", I think, in either case. 

These are in turn shorthand for:

if last.dept=1 then output;

...I agree that the first explanation is confusing - I've never seen that before.  My understanding is that - yes - all this is doing is creating the 0/1 variable.  It does not do any sort of summation (re: your 2nd question).  I never use any of those features of PROC PRINT, so not really sure about the syntax there.  

 

Note that:

if last.dept;

...is really just evaluating whether this resolves to 0 / missing vs. some other numeric value.  So if you had a value like $15.75 in a variable called "amount", then:

if amount;

...would also selectively only output those lines where amount was not zero and not missing.  
Also note that:

if sex="F";

...is also just evaluating, essentially, to 0 or 1 (SAS does not have typical FALSE / TRUE Boolean values).  This is why:

n_females + (sex="F");

...will create an accumulating variable called "n_females" which is just adding 0 or 1 to the value for every observation.  

You could even just write:

if 1;

...to take it to the ridiculous extreme.

Tom
Super User Tom
Super User

 If this observation is the last in the variable, Dept, then end. If not, then read the next observation.

That is definitely poorly worded.

 

First thing "last in the variable" makes no sense at all or if it has any sense it is NOT what the LAST. variable means.  The LAST.DEPT is TRUE when this OBSERVATION is the last one in the current group of observations that have this value of DEPT.

 

And the second part might sort of be true but does not really explain why the SUBSETTING if statement is being used here.   A subsetting if statement means that when the condition is TRUE that the current iteration continues.  And if it is FALSE the iteration ends immediately, skipping all of the rest of the data step.  Including, most importantly for that data step, the IMPLIED OUTPUT statement that is executed at the end of a data step iteration when the data step does not include any explicit OUTPUT statement.   Like with a  Go to Jail card in Monopoly you go directly to jail without passing GO or collecting $200.

 

So the end result is that only the last observation in each group actually gets written to the output dataset.

Welcome to the Certification Community

 

This is a knowledge-sharing community for SAS Certified Professionals and anyone who wants to learn more about becoming SAS Certified. Ask questions and get answers fast. Share with others who are interested in certification and who are studying for certifications.To get the most from your community experience, use these getting-started resources:

Community Do's and Don'ts
How to add SAS syntax to your post
How to get fast, helpful answers

 

Why Get SAS Certified.jpg

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 320 views
  • 2 likes
  • 3 in conversation