☑ This topic is **solved**.
Posted 03-19-2023 05:18 AM
Hi, just reading about **end=** option in the set statement of data step in this thread.

I don't think the OP of that thread ever made another about the existence of **start=** option.

When I type **start= **or** being= **in the set statement, there is no pop up with links to doc, so I assume **start= **option does not actually exist in the set statement of the data step?

SAS Base Programming (2022 Dec), Preparing for SAS Advanced Programming (Cancelled).

Hi @Nietzsche

Sorry, my text seems to have dissappeared.

There is no **start=** option. The **end=xxx** option sets the value of variable **xxx** to 1 (true) when the current observation is the last observation read into the program vector. The automatic variable **_N_** holds the number of the current observation read into the program vector, so** if _N_ = 1 **is true in the first observation read. Neither the variable created by the **end=** option nor the automatic variable **_N_** are written to the output data set.

Note that the data set options **firstobs=** and **obs=** control the observations read into the program vector and are applied first, so **end=** and **_N_** works on the resulting subset. Try the code in the previous post and see what happens.

@Nietzsche wrote:

Hi, just reading about

end=option in the set statement of data step in this thread.

I don't think the OP of that thread ever made another about the existence of

start=option.

When I type

start=orbeing=in the set statement, there is no pop up with links to doc, so I assumestart=option does not actually exist in the set statement of the data step?

What would a START= option do? Would it signal the beginning of a data step (similar to END= signalling the end of a data step)? If that's what you want, you can use

`if _n_=1 then do;`

Paige Miller

Paige Miller

```
data a;
do obs = 1 to 5;
output;
end;
run;
data b;
set a end=eof;
if _N_ = 1 then firstobs = 1;
if eof then lastobs = 1;
run;
data c;
set a (firstobs=2 obs=4) end=eof;
if _N_ = 1 then firstobs = 1;
if eof then lastobs = 1;
run;
```

Hi @Nietzsche

Sorry, my text seems to have dissappeared.

There is no **start=** option. The **end=xxx** option sets the value of variable **xxx** to 1 (true) when the current observation is the last observation read into the program vector. The automatic variable **_N_** holds the number of the current observation read into the program vector, so** if _N_ = 1 **is true in the first observation read. Neither the variable created by the **end=** option nor the automatic variable **_N_** are written to the output data set.

Note that the data set options **firstobs=** and **obs=** control the observations read into the program vector and are applied first, so **end=** and **_N_** works on the resulting subset. Try the code in the previous post and see what happens.

@ErikLund_Jensen, if I may, let me continue your thread and add something more.

The firstobs= and ons= works before end= and _N_, but we have to be aware when we are using them in composition with the WHERE statement:

```
data have;
do x = 1 to 3;
output;
end;
run;
data want;
set have(firstobs=2);
where x > 1;
run;
```

in this case the WHERE cuts "1" from input data set and then the firstobs= cuts "2" from what have left from filtering.

And a note about "start=" one thing is to use "_N_=1" but when we for example are reading several data sets with a single SET statement we can use the CUROBS= option to get info which observation we are reading into PDV, e.g.

```
data A B C;
do x = 1 to 3;
output;
end;
run;
data ABC;
set A B C curobs=curobs;
if curobs=1 then output;
run;
```

So "curobs=1" tests if we are reading the first observation from a give data set (of course if the data set has the first observation, what not always have to be the case).

Bart

**Polish SAS Users Group**: www.polsug.com and communities.sas.com/polsug

"**SAS Packages: the way to share**" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.

Hands-on-Workshop: "**Share your code with SAS Packages**"

"**My First SAS Package: A How-To**" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three

SAS Documentation

Let me correct your language. _N_ does not count observations. It counts iterations of the data step. The confusion arises because in the normal simple data step:

```
data new;
set old;
run;
```

they amount to the same thing.

But once you get more complicated, say by using DOW loop, they diverge. For example in this data step the value of _N_ can be seen as a count of the number of ID values seen.

```
data want;
do until(last.id);
set old;
by id;
total=sum(total,amount)
end;
keep id total;
run;
```

But even in the simple data step you can see that the value of _N_ is different than "the number of observations read in". Most obviously is when it increments beyond the number of observations in the source dataset since such a data step will end at the SET statement and not the RUN statement.

2327 data want; 2328 put _n_= eof= ; 2329 set sashelp.class(obs=2) end=eof; 2330 run; _N_=1 eof=0 _N_=2 eof=0 _N_=3 eof=1 NOTE: There were 2 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.WANT has 2 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds

I would go even one step further.

"*_N_ does not count observations. It counts iterations of the data step.*" - _N_ in fact does not counts iterations, it is a placeholder for value of internal iterations counter. In deed, you can modify it and then at the beginning of the new iteration it is automatically updated with current iteration number:

```
data have;
do x = "A", "B", "C";
output;
end;
run;
data _null_;
put "1)" _all_;
set have;
put "2)" _all_;
do _N_ = 1 to 5;
put _N_= @;
end;
put;
put "3)" _all_;
put;
run;
```

Log:

```
1 data have;
2 do x = "A", "B", "C";
3 output;
4 end;
5 run;
NOTE: The data set WORK.HAVE has 3 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
6
7 data _null_;
8 put "1)" _all_;
9 set have;
10 put "2)" _all_;
11
12 do _N_ = 1 to 5;
13 put _N_= @;
14 end;
15 put;
16
17 put "3)" _all_;
18 put;
19 run;
1)x= _ERROR_=0 _N_=1
2)x=A _ERROR_=0 _N_=1
_N_=1 _N_=2 _N_=3 _N_=4 _N_=5
3)x=A _ERROR_=0 _N_=6
1)x=A _ERROR_=0 _N_=2
2)x=B _ERROR_=0 _N_=2
_N_=1 _N_=2 _N_=3 _N_=4 _N_=5
3)x=B _ERROR_=0 _N_=6
1)x=B _ERROR_=0 _N_=3
2)x=C _ERROR_=0 _N_=3
_N_=1 _N_=2 _N_=3 _N_=4 _N_=5
3)x=C _ERROR_=0 _N_=6
1)x=C _ERROR_=0 _N_=4
NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
```

Very good reading about looping is "The Magnificent DO" article by Paul Dorfman ( @hashman ), link is here: https://support.sas.com/resources/papers/proceedings13/126-2013.pdf

Bart

**Polish SAS Users Group**: www.polsug.com and communities.sas.com/polsug

"**SAS Packages: the way to share**" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.

Hands-on-Workshop: "**Share your code with SAS Packages**"

"**My First SAS Package: A How-To**" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three

SAS Documentation

Cześć Bartku,

You're exactly right. Methinks the necessary and sufficient definition could be this:

* Regardless of its current value, _N_ is assigned the next consecutive natural number every time program control is passed to the top of the DATA step (by the action of the implied loop). *

Thus, since at first program control is at the top of the implied loop, 1 is moved to _N_. The next time program control is passed to the top of the implied loop, 2 is moved to _N_, and so forth. Hence, as you have indicated, the program can assign any numeric value to _N_ between two consecutive returns of program control to the top of the DATA step, yet it has no effect on the new value moved to _N_ at the top of the DATA step from the independent internal counter.

Perhaps one could say that an internal equivalent of the statement:

_N_ = monotonic() ;

is executed at the top of the implied loop.

Thanks for the plug 😉.

Pozdrowienia,

Paul D.

