BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tom
Super User Tom
Super User

Q3    Is my idea right that everything hangs on some internal _N_ counter? using set multiple times doesnt go to the next row, so it has to be some inherent property of what is calling it that indicates what row set will access.

Nothing depends on the _N_ counter.  SAS just increments that variable every time the data step iterates. You can use it if you want, or just ignore it, which is the normal thing to do.

 

Astounding
PROC Star

OK, for now let's stick to the first question.  

 

This part of your understanding is not quite right:

 

so my understanding is the data step starts its iteration _N_= 1, gets to the do loop and 

 

Actually, the DATA step does start with _N_=1.  It does not get to the DO loop.  It gets to the first SET statement, and after that gets to the DO loop.  The first SET statement reads the first observation.  Inside the DO loop, the second SET statement reads that first observation over again (when VAR=1) then reads the second observation (when VAR=2).  Because of the OUTPUT statement, both observations get output.  Notice that the SET statements operate independently of one another, both beginning with the first observation.

 

At the end of the programming statements, SAS returns to the top of the DATA step and performs those statements over again.  It gets to the first SET statement, which now reads the second observation.  Then it enters the DO loop, which reads the third observation (and outputs it) and looks for a fourth observation.  Since there is no fourth observation, the DATA step is over.

 

Start there, see if this starts to make sense.

Tom
Super User Tom
Super User

Q4    say I wanted to use set and a do loop from the 50th to 60th observation.  Is there a nice way to do that instead of some if statement while iterating through every observation?

It depends on what you mean by the question.  If you want your SET sattement to only read from the 50th to 60th observation use the firstobs= and obs= dataset options.

set one (firstobs=50 obs=60);

The SET statement will work as it always work but the first time it execute it skips the first 49 observations.

 

If you really wanted to do that in a DO loop then use the POINT= option of the SET statement instead.  

do obs_num=50 to 60;
  set one point=obs_num ;
  ...
end;

But take care that you have a way for your data step to end.  The normal way a data step ends is when SAS reads past the end of the input data (caused by execution of a SET/MERGE/UPDATE statement or an INPUT statement).  If you use the POINT= option then the SET will never read past the end.  It might error out if you tried to point too far, but in this example if the dataset has at least 60 observations then you are at risk of writing an infinite loop.

Quentin
Super User

Hi,

 

These are really EXCELLENT questions, especially for a beginner.  I wish I had asked these questions earlier in my SAS programming.  It took me a few years to really understand the DATA step.

 

Importantly, your general understanding is wrong:

My general understanding of `set lib.table` is that it reads the _N_'th observation from the  given table where _N_ is the PDV's internal counter.

 

The DATA step is an implied loop.  _N_ is simply a counter of the number of times the loop has executed.  There is no causal relationship / dependency between _N_ and the SET statement.  When the SET statement executes, it reads the next record from a data set.  What does "next record" mean? The SET statement has its own pointer which tracks which record to read from a data set.  There is often a correlation between _N_ and the SET statement because many steps happen to execute the SET statement once for each iteration of the DATA step.  But that is just a correlation.

 

The below step iterates 20 times.  The log shows that on each iteration of the DATA step, one record is read (and output):

 

data want ;
  put "Top of loop " _N_= ;
  set sashelp.class ;
  put "Bottom of loop " _N_= Name= ;
run ;

 

On the 20th iteration of the loop, the SET statement executes.  Because there is no next record to read, it hits the end of file marker and that causes the DATA step to stop executing.  Note that the "Bottom of loop" PUT statement does not execute on the 20th iteration, because the step stopped executing when the SET statement executed.

 

 

The below step iterates only four times, but also reads and outputs all 19 records from sashelp.class.

data want ;
  put "Top of loop " _N_= ;
  do i=1 to 5 ;
    set sashelp.class ;
    output ;
    put "Inside DO loop " _N_= i= Name= ;
  end ;
  put "Bottom of loop " _N_= Name= /;
run ;

 

On the first iteration of the DATA step (_N_=1), the explicit do loop iterates 5 times, so the SET statement reads the first 5 records.  On the second iteration of the DATA step (_N_=2), the explicit do loop iterates 5 times, so the SET statement reads records 6-10.  On the 4th iteration of the DATA step (_N_=4), the explicit do loop iterates 5 times.  On i=1 to i=4, the SET statement reads records 16-19.  When i=5, the SET statement tries to read the next record, it hits the end of file marker, and the DATA step stops executing immediately.

 

In answer to other questions:

1. The SET statement does NOT change its meaning inside of a DO loop.  That would be chaos if it did.  If this is not clear, please post an example where you think this is happening.

2. Each SET statement that is reading from a data set uses its own internal pointer to keep track of which record to read.  If there are two SET statements in a step, each still has its own pointer, and the two SET statements are independent of each other.  

 

To work through your second example (sorry, I'm out of time to write more), I would recommend you add some PUT statements, something like below, and remember that each SET statement has it's own pointer, the two SET statements are independent of each other, and there is only one PDV for the step.

data test2;
  length name $13 ;
  put "Top of DATA step " (_N_ Var Name Rate)(=) ;
  set test;
  put "Before DO loop " (_N_ Var Name Rate)(=) ;

  do var=1 to 2;
    put "Top of DO loop "  (_N_ Var Name Rate)(=) ; 
    set test;
    output;
    put "Bottom of DO loop "  (_N_ Var Name Rate)(=) ; 
  end;
  put "Bottom of DATA step " (_N_ Var Name Rate)(=)  /;
run;

 

If you can't figure out what is happening, respond with more questions. I or others will happily explain more.  These are good questions, and will raise a lot of issues critical to understanding DATA step programming.  

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 942 views
  • 9 likes
  • 7 in conversation