Help using Base SAS procedures

Keep only first N observations for certain by variables

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 101
Accepted Solution

Keep only first N observations for certain by variables

Hello All,

I have data with many variables and many observations, and I want to know how to keep the first n observations of any configurations that I choose to make the data based on by variables. 

If I sort the data by (name) then I want the first n observations by name, but if I sort the data by (name date) then I want the first n observations for each name on a given date.  I am hoping that this makes sense. 

Please see the example below:

Obs                    name                         date

1                              a                              x

2                              a                              x

3                              a                              x

4                              a                              x

5                              a                              y

6                              a                              y

7                              b                              x

8                              b                              x

9                              b                              y

10                            b                              y

11                            b                              y

So if I chose my by variable to be just (name) and my n to be 2, then I would retain observations 1, 2, 7 and 8.  But if I chose the by variables to be (name, date) and my n to be two, then I would retain observations 1, 2, 5, 6, 7, 8, 9 and 10.

Any help is greatly appreciated!!!

Thanks,

John


Accepted Solutions
Solution
‎12-17-2014 03:29 PM
Super User
Posts: 19,815

Re: Keep only first N observations for certain by variables

Posted in reply to mahler_ji

All Replies
Super User
Posts: 5,430

Re: Keep only first N observations for certain by variables

Posted in reply to mahler_ji

Just use data step with by, and restart a counter each time you encounter a new BY-group.

And pair this with a conditional/explicit output statement.

You could probably quite easy embed this in a macro (if you or someone that you can get help from knows macro programming) to make it easy for you to change the rules in each run.

Data never sleeps
Frequent Contributor
Posts: 101

Re: Keep only first N observations for certain by variables

Hey ,

How would I get the counter to restart at the beginning of a by variable?

This is what I originally tried to do but couldn't figure it out.

Thanks!

Solution
‎12-17-2014 03:29 PM
Super User
Posts: 19,815

Re: Keep only first N observations for certain by variables

Posted in reply to mahler_ji
Frequent Contributor
Posts: 101

Re: Keep only first N observations for certain by variables

Thank you !!!

Respected Advisor
Posts: 3,799

Re: Keep only first N observations for certain by variables

Posted in reply to mahler_ji

This is somewhat generic and uses and array of first dot variables.  Poor choice of example data but you get the idea..



%let obs=2;
%let data=class;
%let by=sex age;


proc sort data=sashelp.&data out=&data;
   by &by;
   run;
proc print;
  
run;
data keepobs;
   set &data;
   by &by;
   array _by
  • 'first.'n:;
       if _by[dim(_by)] then c = 0;
       c +
    1;
      
    if c le &obs then output;
      
    run;
    proc print;
      
    run;
    12-17-2014 2-16-09 PM.png
    Respected Advisor
    Posts: 3,156

    Re: Keep only first N observations for certain by variables

    Posted in reply to data_null__

    Nice! Didn't know you can refer those automatic variables using literals, and first time I see those variables to be array elements! Happy to learn! Thanks for sharing, John.

    Haikuo

    Respected Advisor
    Posts: 3,799

    Re: Keep only first N observations for certain by variables

    You have to use VALIDVARNAME=ANY to refer to them as in the example.  If not you can use FIRST: but that is not so safe.  Sometimes we forget that ARRAYs are just variable lists FIRST dot this or that are just variables.

    It is not my original idea.  I have no original ideas. Smiley Happy

    Super User
    Posts: 19,815

    Re: Keep only first N observations for certain by variables

    Posted in reply to data_null__

    But do you ever forget anything Smiley Wink

    Super User
    Posts: 10,029

    Re: Keep only first N observations for certain by variables

    Posted in reply to data_null__

    John, How about :

    %let obs=2;
    %let data=class;
    %let by=sex age;
    
    
    proc sort data=sashelp.&data out=&data;
       by &by;
       run; 
    proc print; 
       run; 
    data keepobs;
       set &data;
       by &by;
       if first.%scan(&by,-1) then c = 0; 
       c + 1; 
       if c le &obs then output; 
       run; 
    proc print; 
       run; 
    
    

    Xia Keshan

    Respected Advisor
    Posts: 3,799

    Re: Keep only first N observations for certain by variables

    What's the fun in that? :smileysilly:

    Super User
    Posts: 10,029

    Re: Keep only first N observations for certain by variables

    Posted in reply to data_null__

    Nothing . Just another way.

    🔒 This topic is solved and locked.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 11 replies
    • 3344 views
    • 5 likes
    • 6 in conversation