DATA Step, Macro, Functions and more

Select only the highest-ranked in each ranking group using the Rank column (1 is the highest possible rank).

Reply
Super Contributor
Posts: 673

Select only the highest-ranked in each ranking group using the Rank column (1 is the highest possible rank).

Select only the highest-ranked in each ranking group using the Rank column (1 is the highest possible rank).

.

contract        Rankgroup           CC                      Rank

1001             Neoplasm 1        CC-10                      4

1001             Neoplasm 1        CC-9                        2

1001             Neoplasm 1        CC-7                        3

1001             Neoplasm 2         CC-10                      1

1001             Neoplasm 2        CC-10                       1

1001             Neoplasm 2        CC-7                         3

Regular Contributor
Posts: 233

Select only the highest-ranked in each ranking group using the Rank column (1 is the highest possible rank).

data test;                                        
input contract Rankgroup $5-15 CC $ Rank ;
datalines;
1001 Neoplasm 1 CC-10 4
1001 Neoplasm 1 CC-9 2
1001 Neoplasm 1 CC-7 3
1001 Neoplasm 2 CC-10 1
1001 Neoplasm 2 CC-10 1
1001 Neoplasm 2 CC-7 3

;
RUN;
PROC PRINT; RUN;

PROC SQL;
SELECT * FROM test WHERE RANK IN (SELECT MIN(RANK) FROM tesT);
QUIT;

PROC Star
Posts: 7,491

Select only the highest-ranked in each ranking group using the Rank column (1 is the highest possible rank).

Or, with just one pass through the data:

proc sql;

  create table want as

    select *

      from have

        group by rankgroup

          having rank=min(rank)

  ;

quit;

Contributor
Posts: 65

Select only the highest-ranked in each ranking group using the Rank column (1 is the highest possible rank).

Not arguing, Art, just taking a learning opportunity. I get the "query requires remerging" message from your query, so Proc Sql technically is reading the data twice, no?

Also, SasPhile, Art's elegant query reproduces the duplicate record you have in your original dataset. If you don't want that, you can add "distinct" as in "select distinct *. . .".

Karl

PROC Star
Posts: 7,491

Select only the highest-ranked in each ranking group using the Rank column (1 is the highest possible rank).

Karl, I agree!  Conversely, Hima's code has an additional problem, in that a group by clause would have to be added in order for the OP to obtain the desired result.

Super User
Posts: 5,516

Select only the highest-ranked in each ranking group using the Rank column (1 is the highest possible rank).

SASPhile,

Other posters have correctly pointed out ... you have to decide what should happen when the same Rank appears multiple times within the same grouping.

I'm just asking about something else here ... Did you really mean to select the highest rank for each Rankgroup?  Or should it be the combination of contract and Rankgroup together that determines the grouping?

Good luck.

Super Contributor
Posts: 673

Select only the highest-ranked in each ranking group using the Rank column (1 is the highest possible rank).

Posted in reply to Astounding

Astounding,

   select the highest rank for each Rankgroup.

Super User
Posts: 5,516

Select only the highest-ranked in each ranking group using the Rank column (1 is the highest possible rank).

SasPhile,

In that case, you have solutions posted by others that should work.

If you want to select just one record per Rankgroup (even if many are tied for the highest Rank), try Keith's below.  If you want to select all records from a Rankgroup that share the highest Rank, try one of the SQL solutions above.

If you posted a solution that I misinterpreted, please say so.  But I don't think DISTINCT(*) would limit the selection to one observation per Rankgroup if the variable CC might be different.

Good luck.   

Regular Contributor
Posts: 151

Select only the highest-ranked in each ranking group using the Rank column (1 is the highest possible rank).

If you only want one of the highest rank records returned, then proc summary does this very easily.

proc summary data=test nway;

output out=top_rank (drop=_Smiley Happy minid(rank(_all_))=;

run;

Ask a Question
Discussion stats
  • 8 replies
  • 403 views
  • 6 likes
  • 6 in conversation