Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- Does the start= option inside of a data step actually exist?

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-19-2023 05:18 AM
(324 views)

Hi, just reading about **end=** option in the set statement of data step in this thread.

I don't think the OP of that thread ever made another about the existence of **start=** option.

When I type **start= **or** being= **in the set statement, there is no pop up with links to doc, so I assume **start= **option does not actually exist in the set statement of the data step?

SAS Base Programming (2022 Dec), Preparing for SAS Advanced Programming (Cancelled).

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @Nietzsche

Sorry, my text seems to have dissappeared.

There is no **start=** option. The **end=xxx** option sets the value of variable **xxx** to 1 (true) when the current observation is the last observation read into the program vector. The automatic variable **_N_** holds the number of the current observation read into the program vector, so** if _N_ = 1 **is true in the first observation read. Neither the variable created by the **end=** option nor the automatic variable **_N_** are written to the output data set.

Note that the data set options **firstobs=** and **obs=** control the observations read into the program vector and are applied first, so **end=** and **_N_** works on the resulting subset. Try the code in the previous post and see what happens.

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Nietzsche wrote:

Hi, just reading about

end=option in the set statement of data step in this thread.

I don't think the OP of that thread ever made another about the existence of

start=option.

When I type

start=orbeing=in the set statement, there is no pop up with links to doc, so I assumestart=option does not actually exist in the set statement of the data step?

What would a START= option do? Would it signal the beginning of a data step (similar to END= signalling the end of a data step)? If that's what you want, you can use

`if _n_=1 then do;`

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

```
data a;
do obs = 1 to 5;
output;
end;
run;
data b;
set a end=eof;
if _N_ = 1 then firstobs = 1;
if eof then lastobs = 1;
run;
data c;
set a (firstobs=2 obs=4) end=eof;
if _N_ = 1 then firstobs = 1;
if eof then lastobs = 1;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @Nietzsche

Sorry, my text seems to have dissappeared.

There is no **start=** option. The **end=xxx** option sets the value of variable **xxx** to 1 (true) when the current observation is the last observation read into the program vector. The automatic variable **_N_** holds the number of the current observation read into the program vector, so** if _N_ = 1 **is true in the first observation read. Neither the variable created by the **end=** option nor the automatic variable **_N_** are written to the output data set.

Note that the data set options **firstobs=** and **obs=** control the observations read into the program vector and are applied first, so **end=** and **_N_** works on the resulting subset. Try the code in the previous post and see what happens.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@ErikLund_Jensen, if I may, let me continue your thread and add something more.

The firstobs= and ons= works before end= and _N_, but we have to be aware when we are using them in composition with the WHERE statement:

```
data have;
do x = 1 to 3;
output;
end;
run;
data want;
set have(firstobs=2);
where x > 1;
run;
```

in this case the WHERE cuts "1" from input data set and then the firstobs= cuts "2" from what have left from filtering.

And a note about "start=" one thing is to use "_N_=1" but when we for example are reading several data sets with a single SET statement we can use the CUROBS= option to get info which observation we are reading into PDV, e.g.

```
data A B C;
do x = 1 to 3;
output;
end;
run;
data ABC;
set A B C curobs=curobs;
if curobs=1 then output;
run;
```

So "curobs=1" tests if we are reading the first observation from a give data set (of course if the data set has the first observation, what not always have to be the case).

Bart

_______________

**Polish SAS Users Group**: www.polsug.com and communities.sas.com/polsug

"**SAS Packages: the way to share**" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.

Hands-on-Workshop: "**Share your code with SAS Packages**"

"**My First SAS Package: A How-To**" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three

SAS Documentation

"

Hands-on-Workshop: "

"

SAS Ballot Ideas: one: SPF in SAS, two, and three

SAS Documentation

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Let me correct your language. _N_ does not count observations. It counts iterations of the data step. The confusion arises because in the normal simple data step:

```
data new;
set old;
run;
```

they amount to the same thing.

But once you get more complicated, say by using DOW loop, they diverge. For example in this data step the value of _N_ can be seen as a count of the number of ID values seen.

```
data want;
do until(last.id);
set old;
by id;
total=sum(total,amount)
end;
keep id total;
run;
```

But even in the simple data step you can see that the value of _N_ is different than "the number of observations read in". Most obviously is when it increments beyond the number of observations in the source dataset since such a data step will end at the SET statement and not the RUN statement.

2327 data want; 2328 put _n_= eof= ; 2329 set sashelp.class(obs=2) end=eof; 2330 run; _N_=1 eof=0 _N_=2 eof=0 _N_=3 eof=1 NOTE: There were 2 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.WANT has 2 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I would go even one step further.

"*_N_ does not count observations. It counts iterations of the data step.*" - _N_ in fact does not counts iterations, it is a placeholder for value of internal iterations counter. In deed, you can modify it and then at the beginning of the new iteration it is automatically updated with current iteration number:

```
data have;
do x = "A", "B", "C";
output;
end;
run;
data _null_;
put "1)" _all_;
set have;
put "2)" _all_;
do _N_ = 1 to 5;
put _N_= @;
end;
put;
put "3)" _all_;
put;
run;
```

Log:

```
1 data have;
2 do x = "A", "B", "C";
3 output;
4 end;
5 run;
NOTE: The data set WORK.HAVE has 3 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
6
7 data _null_;
8 put "1)" _all_;
9 set have;
10 put "2)" _all_;
11
12 do _N_ = 1 to 5;
13 put _N_= @;
14 end;
15 put;
16
17 put "3)" _all_;
18 put;
19 run;
1)x= _ERROR_=0 _N_=1
2)x=A _ERROR_=0 _N_=1
_N_=1 _N_=2 _N_=3 _N_=4 _N_=5
3)x=A _ERROR_=0 _N_=6
1)x=A _ERROR_=0 _N_=2
2)x=B _ERROR_=0 _N_=2
_N_=1 _N_=2 _N_=3 _N_=4 _N_=5
3)x=B _ERROR_=0 _N_=6
1)x=B _ERROR_=0 _N_=3
2)x=C _ERROR_=0 _N_=3
_N_=1 _N_=2 _N_=3 _N_=4 _N_=5
3)x=C _ERROR_=0 _N_=6
1)x=C _ERROR_=0 _N_=4
NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
```

Very good reading about looping is "The Magnificent DO" article by Paul Dorfman ( @hashman ), link is here: https://support.sas.com/resources/papers/proceedings13/126-2013.pdf

Bart

_______________

**Polish SAS Users Group**: www.polsug.com and communities.sas.com/polsug

"**SAS Packages: the way to share**" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.

Hands-on-Workshop: "**Share your code with SAS Packages**"

"**My First SAS Package: A How-To**" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three

SAS Documentation

"

Hands-on-Workshop: "

"

SAS Ballot Ideas: one: SPF in SAS, two, and three

SAS Documentation

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Cześć Bartku,

You're exactly right. Methinks the necessary and sufficient definition could be this:

* Regardless of its current value, _N_ is assigned the next consecutive natural number every time program control is passed to the top of the DATA step (by the action of the implied loop). *

Thus, since at first program control is at the top of the implied loop, 1 is moved to _N_. The next time program control is passed to the top of the implied loop, 2 is moved to _N_, and so forth. Hence, as you have indicated, the program can assign any numeric value to _N_ between two consecutive returns of program control to the top of the DATA step, yet it has no effect on the new value moved to _N_ at the top of the DATA step from the independent internal counter.

Perhaps one could say that an internal equivalent of the statement:

_N_ = monotonic() ;

is executed at the top of the implied loop.

Thanks for the plug 😉.

Pozdrowienia,

Paul D.

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.