DATA Step, Macro, Functions and more

Selecting every other row in a dataset

Reply
New Contributor
Posts: 3

Selecting every other row in a dataset

This seems like it should have a fairly easy solution, but I have not been able to find it. I want to keep every 2nd row in a data set I have, deleting each odd numbered row. The only way I can think to do this is to create an index spanning the whole set, then delete based on the index, but I'm unsure how to do that.

Any help would be appreciated.
Respected Advisor
Posts: 3,777

Re: Selecting every other row in a dataset

MOD
Returns the remainder from the division of the first argument by the second argument, fuzzed to avoid most unexpected floating-point results

[pre]
data classEven;
set sashelp.class;
if mod(_n_,2) eq 0;
obs = _n_;
run;
proc print;
run;

data classEven;
set sashelp.class;
where mod(monotonic(),2) eq 0;
run;
proc print;
run;
[/pre]
SAS Super FREQ
Posts: 8,743

Re: Selecting every other row in a dataset

Hi:
My usual warning about monotonic(), which may work in this case, but is not guaranteed to work in all cases... Any warning like this gives me something to worry about:
http://support.sas.com/kb/15/138.html

A simple alternative (assuming that you are not doing anything other than a simple read and write with the data step program, so that _n_ will correspond to the number of obs in the data set).

cynthia
[pre]
data classSimple;
set sashelp.class;
orig_obs = _n_;
if mod(_n_,2) eq 0 then output;
run;

proc print data=classSimple;
run;
[/pre]
New Contributor
Posts: 3

Re: Selecting every other row in a dataset

Thanks, I didn't know the row indexes could be referenced as _n_. Works perfectly now.
Respected Advisor
Posts: 3,777

Re: Selecting every other row in a dataset

_N_ is not the row index. It is the data step iteration counter. In this simple situation they are equal, but not always.
SAS Super FREQ
Posts: 8,743

Re: Selecting every other row in a dataset

Hi:
Data _NULL_ is correct. _N_ is not a row index -- it is a count of loops through the data step program. Which is why I qualified what I said about using _N_ -- that as long as you were doing a simple read/write in your program you could use _N_. Because in that instance, _N_ will correspond to each observation, assuming that each observation is read with 1 loop of the DATA step program.

You could write DATA step programs to read more than one observation in a single loop of the program -- in which case, _N_ would cease to be a possibility. But in your case, assuming you have explained the problem as your program works -- with a simple read/write situation -- and you are only reading 1 observation for every loop through the program, then you can use _N_ as shown.

cynthia
Ask a Question
Discussion stats
  • 5 replies
  • 4840 views
  • 0 likes
  • 3 in conversation