topic Re: proc rank (ties = mean) returning duplicates of the same rank in SAS Programming

proc rank (ties = mean) returning duplicates of the same rank

nathanleggett — Thu, 22 Aug 2019 05:32:19 GMT

I have this data:

data dat;
   input ID 10 domain rowpercent;
   datalines;
RZL34500 1 2.8
RZL34500 2 0.0
RZL34500 3 71.1
RZL34500 4 0.0
RZL34500 5 4.8
RZL34500 6 2.8
RZL34500 7 2.9
RZL34500 8 4.8
RZL34500 9 4.8
RZL34500 10 0.0
RZL34500 11 0.0
RZL34500 12 0.0
RZL34500 13 0.0
RZL34500 14 2.5
RZL34500 15 0.0
RZL34500 16 0.0
RZL34500 17 0.0
RZL34500 18 0.0
RZL34500 19 0.0
;

I am trying to rank it so that higher values are ranked higher:

proc rank data=dat out=temp_1 ties=mean descending;
	by id;
	var RowPercent;
	ranks rank;
run;

My results are:

ID	         domain rowpercent	rank
RZL34500	1	2.8	                 6.5
RZL34500	2	0	                 14
RZL34500	3	71.1	                 1
RZL34500	4	0	                 14
RZL34500	5	4.8	                 3
RZL34500	6	2.8	                 6.5
RZL34500	7	2.9	                 5
RZL34500	8	4.8	                 3
RZL34500	9	4.8	                 3
RZL34500	10	0	                 14
RZL34500	11	0	                 14
RZL34500	12	0	                 14
RZL34500	13	0	                 14
RZL34500	14	2.5	                 8
RZL34500	15	0	                 14
RZL34500	16	0	                 14
RZL34500	17	0	                 14
RZL34500	18	0	                 14
RZL34500	19	0	                 14

Why are there multiple '3's and '14's?

Secondary question: Is there a way that I can rank these values such that I get an accurate 'top 3'?

Thanks in advance

Re: proc rank (ties = mean) returning duplicates of the same rank

Reeza — Thu, 22 Aug 2019 15:18:06 GMT

Because you have ties....a value of 0 is a value of 0 and it will have the same rank as other 0s. Is there another way you'd like ties to be accounted for? You could just sort and create a counter but then you don't really have a true indication of the rank because does a rank of 15 mean anything if the rank of 14 and 16 have the same underlying value?

Re: proc rank (ties = mean) returning duplicates of the same rank

FreelanceReinh — Thu, 22 Aug 2019 15:25:34 GMT

Hello @nathanleggett and welcome to the SAS Support Communities!

First off a correction to your INPUT statement in order to make the DATA step work:

input ID :$10. domain rowpercent;

@nathanleggett wrote:
Why are there multiple '3's and '14's?

[As Reeza has pointed out already:] This is because there are tied observations, i.e. groups of observations with the same ROWPERCENT value, and PROC RANK assigns the same rank to all observations within a group. For instance, the 2nd, 3rd and 4th largest value are 4.8 so that each of them is assigned rank 3, the average of 2, 3 and 4 because you specified ties=mean.

Secondary question: Is there a way that I can rank these values such that I get an accurate 'top 3'?

Sure, but you need to decide how to handle the ties. For example, you could select all observations with the top three values by using

proc rank data=dat out=temp_1(where=(rank<=3)) ties=dense descending;

and then restrict dataset TEMP_1 to three observations in a subsequent step if needed.

Or use a different procedure instead of PROC RANK:

E.g., PROC SUMMARY:

proc summary data=dat;
by id;
var rowpercent;
output out=top3(drop=_:) idgrp (max(rowpercent) out[3] (rowpercent domain)=) / autoname;
run;

(There are options to modify the handling of ties.)

Or PROC UNIVARIATE:

ods select none;
ods output extremeobs=top3obs(drop=low: varname);
proc univariate data=dat nextrobs=3;
by id;
var rowpercent;
run;
ods select all;

(A variant of this could select the top three values [disregarding multiplicities] rather than the top three observations.)