03-03-2014 03:03 AM
I have a question regarding ranking (proc rank) by a secondary variable (i.e. dealing with ties):
I have branches that I want ranked based on a variable (var1), but I want the ties resolved based on a secondary variable (var2). This idea is very similar to sorting by more than one variable (i.e. have two variables in the class statement). However I cannot seem to find something like this with proc rank.
Currently I am using proc rank to rank based on var1, then I have a data step to manually sort out the ties based on the secondary variable.
I asked on the Twitter group, but the guys there don't seem to understand my problem.
I would be glad for any help
03-03-2014 06:14 AM
Proc Rank treats each variable to be ranked as a separate exercise, which is why you are having to do a follow up data step (presumably with a sort as well) to mediate the ties.
If you used a view before running proc rank that created a single variable out of the two you currently use you could rank on that in one pass (always assuming that ties are not a problem in the second variable). Some thing like this (assuming var1 and var2 are numeric and less than 999,999,999,999)
Proc SQL ;
Create view temp_rank as
Select Put (var1, z12.) || ' ' || Put (var2, z12.) as new_var
Proc rank data = temp_rank ;
var new_var ;
[untested code - add in whatever Rank options you are using]
03-03-2014 08:24 AM
I'm not sure I understand either. Show example data, "I have this and I need that" works well usually makes the need pretty clear.
Given what I think I understand it sounds like maybe you should just sort by the two variables and create and index variable. For example... note that Mary and William are still tied.
03-05-2014 08:54 AM
Thnx for the reply. i do appreciate it.
Ok, so lets hope this explains my problem:
input branchcode type Volume Rate ;
1001 1 10 0.1
1012 1 20 0.3
1291 1 15 0.3
2008 2 10 0.2
2343 2 20 0.2
2455 2 20 0.4
proc sort data=branches;
by type Branchcode Volume decending Rate;
proc rank data=branches out=branches_ranked ties=low descending;
proc sort data=branches_ranked;
by type branch_rank decending Volume;
proc print data=branches_ranked;
This gives output :
So the problem is the ties...What I want is actually to resolve the ties using the Volume column. Hence branch 1291 would get a rank 2, based on the volume and branch 2008 will get a rank 3 based on its volume. This is why I stated that this is similar to sorting based on two variables. I wrote a data step (as I stated before) which does exactly what I want. But from your answers I gather there is no way to do this in proc rank itself.
Thnx for the answers though, I am going to try what you suggested anyway