I am reviewing someone's code and came across a SET statement with an END option. I've read the SAS documentation on this combination but I am still not clear on why it is necessary. To my understanding, SAS waits until the last observation of the set has been read before performing the subsequent action, but I don't understand the benefit of this? If it is helpful, the code is below:
data _null_; set e_gen_count end=eof; if eof then do; call symput("macro_counter",put(trim(left(macro_counter)),2.0)); end; run; %put ¯o_counter;
Hi,
A key point is that your understanding "SAS waits until the last observation of the set has been read before performing the subsequent action" is wrong.
The DATA step is an iterative loop. Consider the following step:
data _null_ ;
set sashelp.class end=eof ;
put (_n_ name eof)(=) ;
run ;
When the step executes, the SET statement will read the first record, then the PUT statement will execute, then it will loop again and the SET statement will read the second statement, and the PUT statement will execute. Each time the SET statement executes, it reads only 1 record. It does not read all of the records at once.
The log shows the values read on each iteration of the loop, and the EOF variable created by the end= option:
5 data _null_; 6 set sashelp.class end=eof; 7 put (_n_ name eof)(=) ; 8 run ; _N_=1 Name=Alfred eof=0 _N_=2 Name=Alice eof=0 _N_=3 Name=Barbara eof=0 _N_=4 Name=Carol eof=0 _N_=5 Name=Henry eof=0 _N_=6 Name=James eof=0 _N_=7 Name=Jane eof=0 _N_=8 Name=Janet eof=0 _N_=9 Name=Jeffrey eof=0 _N_=10 Name=John eof=0 _N_=11 Name=Joyce eof=0 _N_=12 Name=Judy eof=0 _N_=13 Name=Louise eof=0 _N_=14 Name=Mary eof=0 _N_=15 Name=Philip eof=0 _N_=16 Name=Robert eof=0 _N_=17 Name=Ronald eof=0 _N_=18 Name=Thomas eof=0 _N_=19 Name=William eof=1
Now suppose you wanted to calculate the total weight of all students, and assign that value to a macro variable. You could do it like:
data _null_;
set sashelp.class end=eof;
totalweight+weight ;
put (_n_ name totalweight eof)(=) ;
call symputx("totalweight",totalweight) ;
run ;
%put &totalweight ;
And that code works, but there's an inefficiency. That CALL SYMPUTX statement will execute 19 times. You only need it to execute once, after you have read the read the last record and calculated the totalweight for all 19 records. You can add that efficiency with an IF statement:
data _null_;
set sashelp.class end=eof;
totalweight+weight ;
put (_n_ name totalweight eof)(=) ;
if eof then do ;
call symputx("totalweight",totalweight) ;
end ;
run ;
Hi @raivester
To me, it is quite like the notion of first. and last. internal variables when you have a BY statement.
first.variable will be equal to 1 for the first occurence of each group and 0 otherwise.
The END statement will be equal to 1 for the very last observation.
It can be useful for example if you want to output the very last record only or even in some do loop statements.
In your case, END=eof creates an indicator variable named eof which is equal to 1 if it is the last observation -> so the calculation of the macrovariable in the CALL SYMPUT statement will be done only based on this record.
Best,
Hi,
A key point is that your understanding "SAS waits until the last observation of the set has been read before performing the subsequent action" is wrong.
The DATA step is an iterative loop. Consider the following step:
data _null_ ;
set sashelp.class end=eof ;
put (_n_ name eof)(=) ;
run ;
When the step executes, the SET statement will read the first record, then the PUT statement will execute, then it will loop again and the SET statement will read the second statement, and the PUT statement will execute. Each time the SET statement executes, it reads only 1 record. It does not read all of the records at once.
The log shows the values read on each iteration of the loop, and the EOF variable created by the end= option:
5 data _null_; 6 set sashelp.class end=eof; 7 put (_n_ name eof)(=) ; 8 run ; _N_=1 Name=Alfred eof=0 _N_=2 Name=Alice eof=0 _N_=3 Name=Barbara eof=0 _N_=4 Name=Carol eof=0 _N_=5 Name=Henry eof=0 _N_=6 Name=James eof=0 _N_=7 Name=Jane eof=0 _N_=8 Name=Janet eof=0 _N_=9 Name=Jeffrey eof=0 _N_=10 Name=John eof=0 _N_=11 Name=Joyce eof=0 _N_=12 Name=Judy eof=0 _N_=13 Name=Louise eof=0 _N_=14 Name=Mary eof=0 _N_=15 Name=Philip eof=0 _N_=16 Name=Robert eof=0 _N_=17 Name=Ronald eof=0 _N_=18 Name=Thomas eof=0 _N_=19 Name=William eof=1
Now suppose you wanted to calculate the total weight of all students, and assign that value to a macro variable. You could do it like:
data _null_;
set sashelp.class end=eof;
totalweight+weight ;
put (_n_ name totalweight eof)(=) ;
call symputx("totalweight",totalweight) ;
run ;
%put &totalweight ;
And that code works, but there's an inefficiency. That CALL SYMPUTX statement will execute 19 times. You only need it to execute once, after you have read the read the last record and calculated the totalweight for all 19 records. You can add that efficiency with an IF statement:
data _null_;
set sashelp.class end=eof;
totalweight+weight ;
put (_n_ name totalweight eof)(=) ;
if eof then do ;
call symputx("totalweight",totalweight) ;
end ;
run ;
Sir @Quentin Very elegant and neat explanation. Just class! Thank you. I have just copied to my notes. Hmm sounds like you have a lot of free time today. lol
Thanks @novinosrin . I'm listening in the background to a not so engaging corporate webinar. : )
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.