Datastep process causing confusion, plz help?

Reply
Frequent Contributor
Posts: 75

Datastep process causing confusion, plz help?

Hello everybody,

The output of the below code is confusing. I am expecting only the first record to be output, but why is it throwing 2 records?

/*incorrect code*/

data want;
put _n_= ;
if _n_=1 then set sashelp.class;

if _n_=1 then set sashelp.class(rename=(name=name1 age=age1 height=height1 weight=weight1 sex=gender));


run;

However, the corrected code works fine, but i want to know why?:/*please help*/

/*corrected code*/

data want;
put _n_= ;
if _n_=1 then set sashelp.class;

if _n_=1 then set sashelp.class(rename=(name=name1 age=age1 height=height1 weight=weight1 sex=gender));
if _n_=2 then stop;


run;

why is it executing when _n_=2 in the first code? if it so executes, why not retain for all 19 records in sashelp.class until step boundary is reached?

Super User
Posts: 7,859

Re: Datastep process causing confusion, plz help?

Since there is no set statement to be executed when _n_ <> 1, there is no EOF condition to stop the iteration of the data step, so it is stopped by SAS as a precaution (see the "NOTE: DATA STEP stopped due to looping.").

The contents of the PDV are still written out.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 75

Re: Datastep process causing confusion, plz help?

Posted in reply to KurtBremser

fair enough, but if you replace with subsetting if, this works fine too, so in this case does it not require an EOF condition?, I presume so:

Set statement retention is what is causing that eh?

data want;put _n_= ;
if _n_=1 then set sashelp.class;

if _n_=1 then set sashelp.class(rename=(name=name1 age=age1 height=height1 weight=weight1 sex=gender));
if _n_=1;

run;

again the same note:

NOTE: DATA STEP stopped due to looping.

Where is the loop here?

Super User
Posts: 7,859

Re: Datastep process causing confusion, plz help?

The subsetting if is NOT part of the datastep looping logic, it's just a filter applied somewhere in every single iteration of the data step.

Therefore SAS stops the loop (which would run indefinitely, only without writing additional records because of the subsetting if) by itself. And creates the NOTE:

SAS sees the data step like this

- I have a set statement (or several set statements), so I will end data step iteration when EOF occurs (without a set, the data step would only have 1 iteration per default)

- after iteration 1, the file pointer for the set statement(s) has not moved (and will never move), so EOF will never occur -> stop iteration and write NOTE:

Your data step without the stop would always run indefinitely if SAS wouldn't stop it on its own.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
PROC Star
Posts: 1,324

Re: Datastep process causing confusion, plz help?

Excellent question: where is the loop here?

Answer: The data step ITSELF is an implicit loop, with an implicit OUTPUT statement at the bottom of it.  (The implicit OUTPUT at the bottom of the loop is the answer to your main question).

Paul Dorfman has an excellent paper describing the implicit data step loop, and benefits of explicit looping:

http://support.sas.com/resources/papers/proceedings13/126-2013.pdf

One way a data step stops is by having a SET statement read an EOF character.  Another way a data step can stop is if it iterates through a full loop (i.e. from top of the data step to top the bottom), and doesn't execute a SET statement (or other file reading statement), even though there was SET statement executed on a previous loop.  In that case, the DATA step stops 'due to looping'.  (e.g., "something odd happened, I was reading records fine, and then I went all the way through an iteration of the data step code and couldn't find anything to read.  I'll stop, to avoid risk that I'm in an infinite loop.).  I generally think this statement should be avoided, and is better to code an explicit STOP statement when needed.

When you add the subsetting IF _N_=1;   just before the run statement, you only have 1 record output because on the second iteration of the data step, the implied OUTPUT statement at the bottom of the loop is never reached.

Suggest thinking through something like:

data want ;

  if _n_ ne 3 then set sashelp.class ;

run ;

Understanding the implied loop is a key to understanding the data step language, along with understanding the program data vector, and will help tremendously with understanding what the RETAIN statement is for, what it means to have variables implicitly retained, and how the merge statement works, and .....

HTH,

--Q.

Frequent Contributor
Posts: 75

Re: Datastep process causing confusion, plz help?

That is so clear and makes complete sense. Cheers and Thanks!

Ask a Question
Discussion stats
  • 5 replies
  • 287 views
  • 9 likes
  • 3 in conversation