DATA Step, Macro, Functions and more

Select top 3 records for each id

Reply
Super Contributor
Posts: 647

Select top 3 records for each id

there are 10 ids and scores,with varying records for each id. how to get top 3 records for each id based on score?

there is a function "TOP" in other SQL's, what is the equivalent in SAS?

Trusted Advisor
Posts: 1,300

Re: Select top 3 records for each id

data want;

do _n_=1 by 1 until(last.id);

set have;

by id;

if _n_ <= 3 then output;

end;

run;

PROC Star
Posts: 7,363

Re: Select top 3 records for each id

Of course that assumes that the data is pre-sorted and that ties aren't relevant.

Trusted Advisor
Posts: 1,300

Re: Select top 3 records for each id

Yes, it relies upon the data being presorted, unlike sql, however the TOP clause itself is indiscriminate about and type of rank or ties, so that shouldn't be an issue to repeat functionality.

Respected Advisor
Posts: 3,124

Re: Select top 3 records for each id

If still choose SQL (SAS SQL in this case) and include ties:

data have;

input id $  value;

cards;

S12 -20

S12  20

S12  10

S12  10

S12  20

S12 -50

S13 -40

S13  60

S13 -100

S14  10

S14  20

S14  0

S14 -30

S14 -20

;

proc sql;

select id,value from

  (select distinct b.id,b.value, a.value as avalue from have a

  left join have b

  on a.id=b.id)

group by id,value

having sum(avalue>=value)<=3;

quit;

Haikuo

PROC Star
Posts: 7,363

Re: Select top 3 records for each id

  I don't think that works as intended.

You still haven't marked this question as being answered, thus everyone will still keep trying to come up with solutions.  I really think you are best off doing this in either a datastep or using a proc like proc rank.  However, since you asked how you might be able to do it using proc sql, here is a proc sql solution that takes ties into consideration (i.e., if there is a tie for the 3rd position, it will provide all records that have a value greater than of equal to the tied value).

Unfortunately, it has to be modified to account for the specific ids of interest.  I used Haikuo's example dataset, but added one record for testing purposes:

data have;

  input id $  value;

  cards;

S12 -20

S12  20

S12  10

S12  10

S12  20

S12 -50

S13 -40

S13 15

S13  60

S13 -100

S14  10

S14  20

S14  0

S14 -30

S14 -20

;

proc sql;

  create table temp as

    select *

      from have

        where id="S12"

          order by value desc

  ;

  create table S12 as

    select id, value as bottom_value

      from temp (firstobs=3 obs=3)

  ;

  create table temp as

    select *

      from have

        where id="S13"

          order by value desc

  ;

  create table S13 as

    select id, value as bottom_value

      from temp (firstobs=3 obs=3)

  ;

  create table temp as

    select *

      from have

        where id="S14"

          order by value desc

  ;

  create table S14 as

    select id, value as bottom_value

      from temp (firstobs=3 obs=3)

  ;

  create table criteria as

    select * from S12

      union all select * from S13

      union all select * from S14

  ;

  create table want (drop=count) as

    select a.*,count(*) as count

      from have a

        left join criteria b

          on a.id=b.id

            group by a.id

              having a.value>=b.bottom_value

                order by id, value desc

  ;

quit;

Trusted Advisor
Posts: 1,300

Re: Select top 3 records for each id

data have;

input id $  value @@;

cards;

S12 -20 S12  20 S12  10

S12  10 S12  20 S12 -50

S13 -40 S13  60 S13 -100

S14  10 S14  20 S14  0

S14 -30 S14 -20

;

proc means data=have nway noprint;

class id;

var value;

output out=top3arr(drop=_Smiley Happy idgroup ( max(value) out[3] (value)=) /autolabel autoname;

run;

proc transpose data=top3arr out=top3list(drop=_: rename=(col1=value));

by id;

var value:;

run;

proc sql;

drop table top3arr;

quit;

IDVALUE
S1220
S1220
S1210
S1360
S13-40
S13-100
S1420
S1410
S140
Respected Advisor
Posts: 4,646

Re: Select top 3 records for each id

With proc rank you get some control over the way ties are treated :

data have;
input id $ score @@;
datalines;
S12 -20 S12  20 S12  10
S12  10 S12  20 S12 -50
S13 -40 S13  60 S13 -100
S14  10 S14  20 S14  0
S14 -30 S14 -20
;

proc rank data=have descending ties=low /*ties=dense*/
     out=want(where=(order<=3));
by id notsorted;
var score; ranks order;
run;

PG

PG
Respected Advisor
Posts: 3,124

Re: Select top 3 records for each id

Agree with Art, SQL is not really optimal for this kind of task. Comparing to PG's clean-cut proc rank, Data step will have same amount of control, if not more:

data have;

input id $ score @@;

datalines;

S12 -20 S12  20 S12  10

S12  10 S12  20 S12 -50

S13 -40 S13  60 S13 -100

S14  10 S14  20 S14  0

S14 -30 S14 -20

;

proc sort data=have;

by id descending value;

run;

data want;

  do _n_=1 by 1 until (last.id);

   set have nobs=nobs;

     by id descending value;

     if first.id then _f=nobs;

     _n+ifn(value=lag(value),0,1);

     _n=ifn(first.id,1,_n);

     _f=ifn(_n_=3,_n,_f) ;

     if _n<=_f then output;

  end;

  drop _:;

  run;

  proc print;run;

Contributor
Posts: 24

Re: Select top 3 records for each id

Here's my take on this.  A mix of proc sql and data step.

Alan

data have;
input id $ score @@;
datalines;
S12 -20 S12  20 S12  10
S12  10 S12  20 S12 -50
S13 -40 S13  60 S13 -100
S14  10 S14  20 S14  0
S14 -30 S14 -20
;


/* change as needed to bring back desired ranking  */
%let start= 1;
%let end = 3;


/*  create table containing distinct id and value  -- needed b/c more than one id can have the same score */
proc sql;
create table dst_id_score as
select distinct id, score from have
order by id, score desc;  /* keep this sort order for ranking */
* NOTE: Table WORK.DST_ID_SCORE created, with 12 rows and 2 columns. ;


/*  assign ranking by id by score */
data temp;
set dst_id_score ;
by id;
if first.id then rnk=0;
rnk + 1;
run;
* NOTE: The data set WORK.TEMP has 12 observations and 3 variables. ;


/*  go back to original file and bring back all data meeting desired ranking  */
proc sql;
create table final_top_ranked as

select
  a.id
, a.score
, b.rnk as rank
from have a inner join temp b on a.id = b.id and a.score = b.score
where rnk between &start and &end
order by id, rnk;
* NOTE: Table WORK.FINAL_TOP_RANKED created, with 11 rows and 3 columns. ;

New Contributor
Posts: 3

Re: Select top 3 records for each id

Great solution. It is transparent, and that is critical for debugging and portability.
New Contributor
Posts: 3

Re: Select top 3 records for each id

I meant, great solution by alandool. It totally matches my coding style, where I spell out each and every item explicitly for QA and later use. Kudos!
Ask a Question
Discussion stats
  • 11 replies
  • 15126 views
  • 2 likes
  • 7 in conversation