I have this data:
data dat;
input ID 10 domain rowpercent;
datalines;
RZL34500 1 2.8
RZL34500 2 0.0
RZL34500 3 71.1
RZL34500 4 0.0
RZL34500 5 4.8
RZL34500 6 2.8
RZL34500 7 2.9
RZL34500 8 4.8
RZL34500 9 4.8
RZL34500 10 0.0
RZL34500 11 0.0
RZL34500 12 0.0
RZL34500 13 0.0
RZL34500 14 2.5
RZL34500 15 0.0
RZL34500 16 0.0
RZL34500 17 0.0
RZL34500 18 0.0
RZL34500 19 0.0
;
I am trying to rank it so that higher values are ranked higher:
proc rank data=dat out=temp_1 ties=mean descending; by id; var RowPercent; ranks rank; run;
My results are:
ID domain rowpercent rank RZL34500 1 2.8 6.5 RZL34500 2 0 14 RZL34500 3 71.1 1 RZL34500 4 0 14 RZL34500 5 4.8 3 RZL34500 6 2.8 6.5 RZL34500 7 2.9 5 RZL34500 8 4.8 3 RZL34500 9 4.8 3 RZL34500 10 0 14 RZL34500 11 0 14 RZL34500 12 0 14 RZL34500 13 0 14 RZL34500 14 2.5 8 RZL34500 15 0 14 RZL34500 16 0 14 RZL34500 17 0 14 RZL34500 18 0 14 RZL34500 19 0 14
Why are there multiple '3's and '14's?
Secondary question: Is there a way that I can rank these values such that I get an accurate 'top 3'?
Thanks in advance
Hello @nathanleggett and welcome to the SAS Support Communities!
First off a correction to your INPUT statement in order to make the DATA step work:
input ID :$10. domain rowpercent;
@nathanleggett wrote:Why are there multiple '3's and '14's?
[As Reeza has pointed out already:] This is because there are tied observations, i.e. groups of observations with the same ROWPERCENT value, and PROC RANK assigns the same rank to all observations within a group. For instance, the 2nd, 3rd and 4th largest value are 4.8 so that each of them is assigned rank 3, the average of 2, 3 and 4 because you specified ties=mean.
Secondary question: Is there a way that I can rank these values such that I get an accurate 'top 3'?
Sure, but you need to decide how to handle the ties. For example, you could select all observations with the top three values by using
proc rank data=dat out=temp_1(where=(rank<=3)) ties=dense descending;
and then restrict dataset TEMP_1 to three observations in a subsequent step if needed.
Or use a different procedure instead of PROC RANK:
E.g., PROC SUMMARY:
proc summary data=dat;
by id;
var rowpercent;
output out=top3(drop=_:) idgrp (max(rowpercent) out[3] (rowpercent domain)=) / autoname;
run;
(There are options to modify the handling of ties.)
Or PROC UNIVARIATE:
ods select none;
ods output extremeobs=top3obs(drop=low: varname);
proc univariate data=dat nextrobs=3;
by id;
var rowpercent;
run;
ods select all;
(A variant of this could select the top three values [disregarding multiplicities] rather than the top three observations.)
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.