BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SASPhile
Quartz | Level 8

there are 10 ids and scores,with varying records for each id. how to get top 3 records for each id based on score?

there is a function "TOP" in other SQL's, what is the equivalent in SAS?

1 ACCEPTED SOLUTION

Accepted Solutions
alandool
Quartz | Level 8

Editor's Note: Thanks to all who contributed, showing that there are mulitple ways to get the same results in SAS.

 

Here's my take on this.  A mix of proc sql and data step.

 

Alan

 

 

data have;
input id $ score @@;
datalines;
S12 -20 S12  20 S12  10
S12  10 S12  20 S12 -50
S13 -40 S13  60 S13 -100
S14  10 S14  20 S14  0
S14 -30 S14 -20
;


/* change as needed to bring back desired ranking  */
%let start= 1;
%let end = 3;


/*  create table containing distinct id and value  -- needed b/c more than one id can have the same score */
proc sql;
create table dst_id_score as
select distinct id, score from have
order by id, score desc;  /* keep this sort order for ranking */
* NOTE: Table WORK.DST_ID_SCORE created, with 12 rows and 2 columns. ;


/*  assign ranking by id by score */
data temp;
set dst_id_score ;
by id;
if first.id then rnk=0;
rnk + 1;
run;
* NOTE: The data set WORK.TEMP has 12 observations and 3 variables. ;


/*  go back to original file and bring back all data meeting desired ranking  */
proc sql;
create table final_top_ranked as

select
  a.id
, a.score
, b.rnk as rank
from have a inner join temp b on a.id = b.id and a.score = b.score
where rnk between &start and &end
order by id, rnk;
* NOTE: Table WORK.FINAL_TOP_RANKED created, with 11 rows and 3 columns. ;

View solution in original post

11 REPLIES 11
FriedEgg
SAS Employee

data want;

do _n_=1 by 1 until(last.id);

set have;

by id;

if _n_ <= 3 then output;

end;

run;

art297
Opal | Level 21

Of course that assumes that the data is pre-sorted and that ties aren't relevant.

FriedEgg
SAS Employee

Yes, it relies upon the data being presorted, unlike sql, however the TOP clause itself is indiscriminate about and type of rank or ties, so that shouldn't be an issue to repeat functionality.

Haikuo
Onyx | Level 15

If still choose SQL (SAS SQL in this case) and include ties:

data have;

input id $  value;

cards;

S12 -20

S12  20

S12  10

S12  10

S12  20

S12 -50

S13 -40

S13  60

S13 -100

S14  10

S14  20

S14  0

S14 -30

S14 -20

;

proc sql;

select id,value from

  (select distinct b.id,b.value, a.value as avalue from have a

  left join have b

  on a.id=b.id)

group by id,value

having sum(avalue>=value)<=3;

quit;

Haikuo

art297
Opal | Level 21

  I don't think that works as intended.

You still haven't marked this question as being answered, thus everyone will still keep trying to come up with solutions.  I really think you are best off doing this in either a datastep or using a proc like proc rank.  However, since you asked how you might be able to do it using proc sql, here is a proc sql solution that takes ties into consideration (i.e., if there is a tie for the 3rd position, it will provide all records that have a value greater than of equal to the tied value).

Unfortunately, it has to be modified to account for the specific ids of interest.  I used Haikuo's example dataset, but added one record for testing purposes:

data have;

  input id $  value;

  cards;

S12 -20

S12  20

S12  10

S12  10

S12  20

S12 -50

S13 -40

S13 15

S13  60

S13 -100

S14  10

S14  20

S14  0

S14 -30

S14 -20

;

proc sql;

  create table temp as

    select *

      from have

        where id="S12"

          order by value desc

  ;

  create table S12 as

    select id, value as bottom_value

      from temp (firstobs=3 obs=3)

  ;

  create table temp as

    select *

      from have

        where id="S13"

          order by value desc

  ;

  create table S13 as

    select id, value as bottom_value

      from temp (firstobs=3 obs=3)

  ;

  create table temp as

    select *

      from have

        where id="S14"

          order by value desc

  ;

  create table S14 as

    select id, value as bottom_value

      from temp (firstobs=3 obs=3)

  ;

  create table criteria as

    select * from S12

      union all select * from S13

      union all select * from S14

  ;

  create table want (drop=count) as

    select a.*,count(*) as count

      from have a

        left join criteria b

          on a.id=b.id

            group by a.id

              having a.value>=b.bottom_value

                order by id, value desc

  ;

quit;

FriedEgg
SAS Employee

data have;

input id $  value @@;

cards;

S12 -20 S12  20 S12  10

S12  10 S12  20 S12 -50

S13 -40 S13  60 S13 -100

S14  10 S14  20 S14  0

S14 -30 S14 -20

;

proc means data=have nway noprint;

class id;

var value;

output out=top3arr(drop=_:) idgroup ( max(value) out[3] (value)=) /autolabel autoname;

run;

proc transpose data=top3arr out=top3list(drop=_: rename=(col1=value));

by id;

var value:;

run;

proc sql;

drop table top3arr;

quit;

IDVALUE
S1220
S1220
S1210
S1360
S13-40
S13-100
S1420
S1410
S140
PGStats
Opal | Level 21

With proc rank you get some control over the way ties are treated :

data have;
input id $ score @@;
datalines;
S12 -20 S12  20 S12  10
S12  10 S12  20 S12 -50
S13 -40 S13  60 S13 -100
S14  10 S14  20 S14  0
S14 -30 S14 -20
;

proc rank data=have descending ties=low /*ties=dense*/
     out=want(where=(order<=3));
by id notsorted;
var score; ranks order;
run;

PG

PG
Haikuo
Onyx | Level 15

Agree with Art, SQL is not really optimal for this kind of task. Comparing to PG's clean-cut proc rank, Data step will have same amount of control, if not more:

data have;

input id $ score @@;

datalines;

S12 -20 S12  20 S12  10

S12  10 S12  20 S12 -50

S13 -40 S13  60 S13 -100

S14  10 S14  20 S14  0

S14 -30 S14 -20

;

proc sort data=have;

by id descending value;

run;

data want;

  do _n_=1 by 1 until (last.id);

   set have nobs=nobs;

     by id descending value;

     if first.id then _f=nobs;

     _n+ifn(value=lag(value),0,1);

     _n=ifn(first.id,1,_n);

     _f=ifn(_n_=3,_n,_f) ;

     if _n<=_f then output;

  end;

  drop _:;

  run;

  proc print;run;

alandool
Quartz | Level 8

Editor's Note: Thanks to all who contributed, showing that there are mulitple ways to get the same results in SAS.

 

Here's my take on this.  A mix of proc sql and data step.

 

Alan

 

 

data have;
input id $ score @@;
datalines;
S12 -20 S12  20 S12  10
S12  10 S12  20 S12 -50
S13 -40 S13  60 S13 -100
S14  10 S14  20 S14  0
S14 -30 S14 -20
;


/* change as needed to bring back desired ranking  */
%let start= 1;
%let end = 3;


/*  create table containing distinct id and value  -- needed b/c more than one id can have the same score */
proc sql;
create table dst_id_score as
select distinct id, score from have
order by id, score desc;  /* keep this sort order for ranking */
* NOTE: Table WORK.DST_ID_SCORE created, with 12 rows and 2 columns. ;


/*  assign ranking by id by score */
data temp;
set dst_id_score ;
by id;
if first.id then rnk=0;
rnk + 1;
run;
* NOTE: The data set WORK.TEMP has 12 observations and 3 variables. ;


/*  go back to original file and bring back all data meeting desired ranking  */
proc sql;
create table final_top_ranked as

select
  a.id
, a.score
, b.rnk as rank
from have a inner join temp b on a.id = b.id and a.score = b.score
where rnk between &start and &end
order by id, rnk;
* NOTE: Table WORK.FINAL_TOP_RANKED created, with 11 rows and 3 columns. ;

neekum
Calcite | Level 5
Great solution. It is transparent, and that is critical for debugging and portability.
neekum
Calcite | Level 5
I meant, great solution by alandool. It totally matches my coding style, where I spell out each and every item explicitly for QA and later use. Kudos!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 43399 views
  • 3 likes
  • 7 in conversation