## Keep only first N observations for certain by variables

# Keep only first N observations for certain by variables

Hello All,

I have data with many variables and many observations, and I want to know how to keep the first n observations of any configurations that I choose to make the data based on by variables.

If I sort the data by (name) then I want the first n observations by name, but if I sort the data by (name date) then I want the first n observations for each name on a given date.  I am hoping that this makes sense.

Obs                    name                         date

1                              a                              x

2                              a                              x

3                              a                              x

4                              a                              x

5                              a                              y

6                              a                              y

7                              b                              x

8                              b                              x

9                              b                              y

10                            b                              y

11                            b                              y

So if I chose my by variable to be just (name) and my n to be 2, then I would retain observations 1, 2, 7 and 8.  But if I chose the by variables to be (name, date) and my n to be two, then I would retain observations 1, 2, 5, 6, 7, 8, 9 and 10.

Any help is greatly appreciated!!!

Thanks,

John

## Re: Keep only first N observations for certain by variables

Just use data step with by, and restart a counter each time you encounter a new BY-group.

And pair this with a conditional/explicit output statement.

You could probably quite easy embed this in a macro (if you or someone that you can get help from knows macro programming) to make it easy for you to change the rules in each run.

Data never sleeps
## Re: Keep only first N observations for certain by variables

Hey ,

How would I get the counter to restart at the beginning of a by variable?

This is what I originally tried to do but couldn't figure it out.

Thanks!

## Re: Keep only first N observations for certain by variables

Thank you !!!

## Re: Keep only first N observations for certain by variables

This is somewhat generic and uses and array of first dot variables.  Poor choice of example data but you get the idea..

%let obs=2;
%let data=class;
%let by=sex age;

proc sort data=sashelp.&data out=&data;
by &by;
run;
proc print;

run;
data keepobs;
set &data;
by &by;
array _by
• 'first.'n:;
if _by[dim(_by)] then c = 0;
c +
1;

if c le &obs then output;

run;
proc print;

run;

## Re: Keep only first N observations for certain by variables

Nice! Didn't know you can refer those automatic variables using literals, and first time I see those variables to be array elements! Happy to learn! Thanks for sharing, John.

Haikuo

## Re: Keep only first N observations for certain by variables

You have to use VALIDVARNAME=ANY to refer to them as in the example.  If not you can use FIRST: but that is not so safe.  Sometimes we forget that ARRAYs are just variable lists FIRST dot this or that are just variables.

It is not my original idea.  I have no original ideas.

## Re: Keep only first N observations for certain by variables

But do you ever forget anything

## Re: Keep only first N observations for certain by variables

```%let obs=2;
%let data=class;
%let by=sex age;

proc sort data=sashelp.&data out=&data;
by &by;
run;
proc print;
run;
data keepobs;
set &data;
by &by;
if first.%scan(&by,-1) then c = 0;
c + 1;
if c le &obs then output;
run;
proc print;
run;

```

Xia Keshan

## Re: Keep only first N observations for certain by variables

What's the fun in that? :smileysilly:

