BookmarkSubscribeRSS Feed
novinosrin
Tourmaline | Level 20

Trivia END= SET statement option?

 

Why the END= variable has a design that initializes a value with zero at compile time and one time execution to turn true when SAS finds or reaches end of file marker to 1. Of course we all completely understand what happens, how and when. However I am a little intrigued to find out why the design doesn't force an execution to zero when each record read that is not the last record or until end of file marker is reached. 

 

Sure it's trivial, however I would like some opinions and thoughts. This chat came up when someone uttered the NOT TRUE(0) should be forced in the design unlike "it is what it is" 😃  at an alumni party . Hmm!


data w;
 put z=;
 set sashelp.class end=z;
 stop;
run;

data w;
 put 'before' +3 z=;
 set sashelp.class end=z;
 z+1;
 put 'after' +3 z=;
run;

data w;
 put z=;
 set sashelp.class( where=( name='William')) end=z;
run;

data w;
 put z=;
 set sashelp.class( where=( name='William')) end=z;
 z+1;
 put z=;
run;

 

12 REPLIES 12
Tom
Super User Tom
Super User

Probably because it is easier and executes faster.  At compile time it sets the initial value (or sets the code to set the initial value).  Then  it adds code to set it to 1 along with whatever other actions it was already going to do when it hit the end of the input.

mkeintz
PROC Star

@novinosrin

 

First things first: I have not thought about this before, so below is an off-the-cuff response:

 

You've shown that Z (created by the "end=z" option) is retained.  As an automatic variable, it is not output.  The user can modify it and it will remain unchanged by sas until the actual last record is read in by SET.  I did a test on another such variable (using option NOBS=NRECS).  After assigning a new value to NRECS, sas kept the new value throughout the data step.  Unlike Z, it was never changed by sas at execution time.

 

I guess the principle is: after the compiler assigns a value to an automatic variable, it will be retained, and will only be changed by SAS when a relevant event is encountered (which would be "never" for the NOBS= variable).  Assuming, of course, the user specifically changes it.

 

There's a certain parsimony to this approach.  Imagine reading a dataset of a couple billion records (I've had many such) - why put in the overhead to reset the (retained) variable to zero instead of merely waiting for end-of-data to set it to one?  There could be notable resources saved, especially given the "birthdate" of the END= option was probably in the 1960's, when conserving that sort of resource would be more valuable.

 

Of course some automatic variables HAVE to be reset during sas execution, most notably _N_.  The user can change this variable all they want, but every time the data step iterates, _N_ is reset to an iteration counter.

 

I suppose you could classify automatic temporary variables into 3 groups:

  1. automatically reset with each iteration:  _N_,  curobs=, ...
  2. automatically reset once per event:     end=  indsname= point=, ...
      Of course point= is intended to be modified by the user.
  3. never reset (nobs=).

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

> There's a certain parsimony to this approach. 

Exactly: Only toggle the value when it changes. Why waste CPU doing otherwise?

That's the case for all (most?) automatic variables I think.

If the value is tampered with, then so be it. It's not a better or worse choice than doing otherwise, such as locking automatic variables. It's just a design choice, and it's a frugal one.

 

RichardDeVen
Barite | Level 11

As for the why? Perhaps to save clock cycles and meet the needs of expected use cases. Further research into late 1960's coding manuals for IBM mainframe programming in PL/1, Fortran and Assembly would likely stumble upon an end= mechanism that was propagated into the SAS DATA Step grammar.

 

Maybe Jim Goodnight has some 50 year old notebooks from the project resulting in SAS 71.

 

From the perspective of event or queue based programming the i/o end of file event would be the trigger for setting the automatic PDV variable.

Astounding
PROC Star

Would somebody please run a test for me to verify how SAS executes this?

 

data have;
finished=5;
do i=1 to 5;
   output;
end;
run;
data want;
   set have end=finished;
   put finished=;
run;
FreelanceReinh
Jade | Level 19
8    data want;
9      put 'before:' +1 finished=;
10     set have end=finished;
11     put 'after:' +2 finished= /;
12   run;

WARNING: The variable finished exists on an input data set and is also set by
         an I/O statement option.  The variable will not be included on any
         output data set and unexpected results can occur.
before: finished=0
after:  finished=5

before: finished=5
after:  finished=5

before: finished=5
after:  finished=5

before: finished=5
after:  finished=5

before: finished=5
after:  finished=1

before: finished=1
NOTE: There were 5 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 5 observations and 1 variables.

Hi  @Astounding,

 

I've slightly extended the data step. It depends on one's expectations whether the above results are really "unexpected" as the warning indicates. I think they are fairly plausible: The automatic variable behaves as usual, but the SET statement overwrites it in every execution. In the last observation the END= option has the final say.

 

The initialization of finished to 0 by virtue of END= would even override a RETAIN statement with a nonzero initial value.

Astounding
PROC Star

Thanks for conducting the test.  Very thorough, and very much appreciated.

 

Yes, you and I would not need this warning.  But pity the average programmer who codes a simple application:

data want;
   set have end=done;
   total + amount;
   if done;
run;

There is no way that said average programmer would expect multiple observations in the output data set.  Some less than average programmers might not even notice.  So I'm happy to see the warning being generated.

novinosrin
Tourmaline | Level 20

Thank you @Astounding , @FreelanceReinh  and other great pundits in the thread. FYI- For what it's worth I am happy to share that the problem actually stems from a real event in Chicago at my friend's office where his team "inadvertently" had a production code developed and signed off with this mess and so some conditional execution caused unexpected results leading to a big escalation. 

 

PS The company lists under "Fortune 20" with a trillion dollar asset 😊

mkeintz
PROC Star

@novinosrin wrote:

Thank you @Astounding , @FreelanceReinh  and other great pundits in the thread. FYI- For what it's worth I am happy to share that the problem actually stems from a real event in Chicago at my friend's office where his team "inadvertently" had a production code developed and signed off with this mess and so some conditional execution caused unexpected results leading to a big escalation. 

 

PS The company lists under "Fortune 20" with a trillion dollar asset 😊


Translation:  The production code developers did not pay sufficient attention to the sas log - making for an interesting interpretation of the term "inadvertantly" .

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Quentin
Super User

Happy New Year @novinosrin and all.  Thanks for an interesting thread to start off the year!

 

I noticed that the end= variable is not always initialized to false during compile time.  It looks like during compile time, SAS actually checks to see if there is a next (first) record to read.  If there is not, it will be initialized to 1.  This seems handy.

 

1    data w;
2     put z=;
3     stop;
4     set sashelp.class(where=(name='Foo')) end=z;
5    run;

z=1
NOTE: There were 0 observations read from the data set SASHELP.CLASS.
      WHERE name='Foo';
NOTE: The data set WORK.W has 0 observations and 5 variables.

 

Along these lines, interesting to note that first. and last. variables are (I think) always initialized to 1:

 

1    data _null_ ;
2      put _all_ ;
3      stop ;
4      set sashelp.shoes(keep=region) ;
5      by region ;
6    run ;

Region=  FIRST.Region=1 LAST.Region=1 _ERROR_=0 _N_=1
NOTE: There were 1 observations read from the data set SASHELP.SHOES.

I couldn't find a way to trick them into being initialized to 0.

 

Which explains why DoW looping through by-groups with a WHILE loop wouldn't work well:

1    data _null_ ;
2      do while (NOT last.region) ;
3        set sashelp.shoes(keep=region) ;
4        by region ;
5        put "I don't execute" ;
6      end ;
7    run ;

NOTE: DATA STEP stopped due to looping.
NOTE: There were 1 observations read from the data set SASHELP.SHOES.

Kind Regards,

--Q.

BASUG is hosting free webinars ! Check out our recordings of past webinars: https://www.basug.org/videos. Be sure to subscribe to our email list for notification of future BASUG events.
Kurt_Bremser
Super User

@novinosrin wrote:

Thank you @Astounding , @FreelanceReinh  and other great pundits in the thread. FYI- For what it's worth I am happy to share that the problem actually stems from a real event in Chicago at my friend's office where his team "inadvertently" had a production code developed and signed off with this mess and so some conditional execution caused unexpected results leading to a big escalation. 

 

PS The company lists under "Fortune 20" with a trillion dollar asset 😊


Did they really ignore a WARNING?

novinosrin
Tourmaline | Level 20

lol "ignore" translated to "inadvertent" . It seems they love tranwrd/translate functions 🙂 like @mkeintz  spotted

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1439 views
  • 10 likes
  • 9 in conversation