BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
raivester
Quartz | Level 8

I am reviewing someone's code and came across a SET statement with an END option. I've read the SAS documentation on this combination but I am still not clear on why it is necessary. To my understanding, SAS waits until the last observation of the set has been read before performing the subsequent action, but I don't understand the benefit of this?  If it is helpful, the code is below:

 

data _null_;
 set e_gen_count end=eof;
 if eof then do;
 	call symput("macro_counter",put(trim(left(macro_counter)),2.0));
end;
run;

%put &macro_counter;
1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

Hi,

 

A key point is that your understanding "SAS waits until the last observation of the set has been read before performing the subsequent action" is wrong.

 

The DATA step is an iterative loop. Consider the following step:

data _null_ ;
  set sashelp.class end=eof ;
  put (_n_ name eof)(=) ;
run ;

When the step executes, the SET statement will read the first record, then the PUT statement will execute, then it will loop again and the SET statement will read the second statement, and the PUT statement will execute.  Each time the SET statement executes, it reads only 1 record.  It does not read all of the records at once.

 

The log shows the values read on each iteration of the loop, and the EOF variable created by the end= option:

 

5    data _null_;
6      set sashelp.class end=eof;
7      put (_n_ name eof)(=) ;
8    run ;

_N_=1 Name=Alfred eof=0
_N_=2 Name=Alice eof=0
_N_=3 Name=Barbara eof=0
_N_=4 Name=Carol eof=0
_N_=5 Name=Henry eof=0
_N_=6 Name=James eof=0
_N_=7 Name=Jane eof=0
_N_=8 Name=Janet eof=0
_N_=9 Name=Jeffrey eof=0
_N_=10 Name=John eof=0
_N_=11 Name=Joyce eof=0
_N_=12 Name=Judy eof=0
_N_=13 Name=Louise eof=0
_N_=14 Name=Mary eof=0
_N_=15 Name=Philip eof=0
_N_=16 Name=Robert eof=0
_N_=17 Name=Ronald eof=0
_N_=18 Name=Thomas eof=0
_N_=19 Name=William eof=1

Now suppose you wanted to calculate the total weight of all students, and assign that value to a macro variable.  You could do it like:

 

data _null_;
  set sashelp.class end=eof;
  totalweight+weight ;
  put (_n_ name totalweight eof)(=) ;
  call symputx("totalweight",totalweight) ;
run ;

%put &totalweight ;

And that code works, but there's an inefficiency.  That CALL SYMPUTX statement will execute 19 times.  You only need it to execute once, after you have read the read the last record and calculated the totalweight for all 19 records.  You can add that efficiency with an IF statement:

data _null_;
  set sashelp.class end=eof;
  totalweight+weight ;
  put (_n_ name totalweight eof)(=) ;

  if eof then do ;
    call symputx("totalweight",totalweight) ;
  end ;
run ;

 

 

 

 

 

 

 

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

4 REPLIES 4
ed_sas_member
Meteorite | Level 14

Hi @raivester 

To me, it is quite like the notion of first. and last. internal variables when you have a BY statement.

first.variable will be equal to 1 for the first occurence of each group and 0 otherwise.

The END statement will be equal to 1 for the very last observation.

It can be useful for example if you want to output the very last record only or even in some do loop statements.

In your case, END=eof creates an indicator variable named eof which is equal to 1 if it is the last observation -> so the calculation of the macrovariable in the CALL SYMPUT statement will be done only based on this record.

Best,

Quentin
Super User

Hi,

 

A key point is that your understanding "SAS waits until the last observation of the set has been read before performing the subsequent action" is wrong.

 

The DATA step is an iterative loop. Consider the following step:

data _null_ ;
  set sashelp.class end=eof ;
  put (_n_ name eof)(=) ;
run ;

When the step executes, the SET statement will read the first record, then the PUT statement will execute, then it will loop again and the SET statement will read the second statement, and the PUT statement will execute.  Each time the SET statement executes, it reads only 1 record.  It does not read all of the records at once.

 

The log shows the values read on each iteration of the loop, and the EOF variable created by the end= option:

 

5    data _null_;
6      set sashelp.class end=eof;
7      put (_n_ name eof)(=) ;
8    run ;

_N_=1 Name=Alfred eof=0
_N_=2 Name=Alice eof=0
_N_=3 Name=Barbara eof=0
_N_=4 Name=Carol eof=0
_N_=5 Name=Henry eof=0
_N_=6 Name=James eof=0
_N_=7 Name=Jane eof=0
_N_=8 Name=Janet eof=0
_N_=9 Name=Jeffrey eof=0
_N_=10 Name=John eof=0
_N_=11 Name=Joyce eof=0
_N_=12 Name=Judy eof=0
_N_=13 Name=Louise eof=0
_N_=14 Name=Mary eof=0
_N_=15 Name=Philip eof=0
_N_=16 Name=Robert eof=0
_N_=17 Name=Ronald eof=0
_N_=18 Name=Thomas eof=0
_N_=19 Name=William eof=1

Now suppose you wanted to calculate the total weight of all students, and assign that value to a macro variable.  You could do it like:

 

data _null_;
  set sashelp.class end=eof;
  totalweight+weight ;
  put (_n_ name totalweight eof)(=) ;
  call symputx("totalweight",totalweight) ;
run ;

%put &totalweight ;

And that code works, but there's an inefficiency.  That CALL SYMPUTX statement will execute 19 times.  You only need it to execute once, after you have read the read the last record and calculated the totalweight for all 19 records.  You can add that efficiency with an IF statement:

data _null_;
  set sashelp.class end=eof;
  totalweight+weight ;
  put (_n_ name totalweight eof)(=) ;

  if eof then do ;
    call symputx("totalweight",totalweight) ;
  end ;
run ;

 

 

 

 

 

 

 

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
novinosrin
Tourmaline | Level 20

Sir @Quentin  Very elegant and neat explanation. Just class! Thank you. I have just copied to my notes. Hmm sounds like you have a lot of free time today. lol

Quentin
Super User

Thanks @novinosrin .  I'm listening in the background to a not so engaging corporate webinar. : )

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1603 views
  • 8 likes
  • 4 in conversation