Last time I asked the question with SINGLE individuals competing in each of several events, and how to determine the overall ranking of winners.
https://communities.sas.com/thread/44071
Now I return with things a little more complicated: Suppose we have PAIRS in each contestant group.
I would like to find a way of ranking the top INDIVIDUALS, even though they are taking part as a PAIR.
In this competition, every individual was paired up round-robin, with every other individual. This was repeated seven times.... over a period of one year. We have the list of winners for each round, seven rounds in all.
Unfortunately some pairs missed certain rounds. Thus, not a complete and even data set.
One might guess that the top PAIR would include the, say, #1 and #2 contestants overall. My hunch is that such need not be the case. Could be that a single contestant from each of Pair 1, Pair 2, Pair 3 might be the top rankers overall.
So, again, what we want is a ranking list of all INDIVIDUALS, based on the data set.
Any help and advice is greatly appreciated.
Thanks,
Nicholas Kormanik
=======
Edit by Nicholas:
I've added a new data file. In it I've separated the 'Pairs' into two columns each, so that one needn't do it in SAS.
'Round 1' becomes 'R1a' and 'R1b', etc.
I tried using Vince28's code and kept getting SAS errors. Bummer. Maybe by separating the 'Pairs' the task can be handled more easily.
Again, any help greatly appreciated.
=======
Data set follows (and additionally as an attachment):
Order | Round 1 | Round 2 | Round 3 | Round 4 | Round 5 | Round 6 | Round 7 |
1 | DREW_EDGAR | HENRY_ISAAC | EDGAR_GRANT | CHUCK_GRANT | HENRY_ISAAC | EDGAR_ISAAC | AARON_FRANK |
2 | GRANT_ISAAC | CHUCK_HENRY | BLAKE_DREW | GRANT_HENRY | AARON_EDGAR | EDGAR_HENRY | DREW_EDGAR |
3 | DREW_HENRY | DREW_ISAAC | AARON_BLAKE | BLAKE_CHUCK | FRANK_HENRY | AARON_ISAAC | CHUCK_DREW |
4 | CHUCK_HENRY | AARON_EDGAR | AARON_CHUCK | DREW_GRANT | AARON_DREW | BLAKE_CHUCK | FRANK_GRANT |
5 | CHUCK_FRANK | FRANK_ISAAC | AARON_GRANT | AARON_GRANT | DREW_GRANT | FRANK_GRANT | AARON_EDGAR |
6 | AARON_EDGAR | DREW_FRANK | BLAKE_ISAAC | EDGAR_GRANT | BLAKE_FRANK | DREW_FRANK | EDGAR_ISAAC |
7 | BLAKE_GRANT | CHUCK_FRANK | DREW_GRANT | DREW_HENRY | BLAKE_HENRY | BLAKE_ISAAC | FRANK_HENRY |
8 | AARON_CHUCK | AARON_ISAAC | GRANT_ISAAC | EDGAR_ISAAC | BLAKE_GRANT | DREW_ISAAC | BLAKE_HENRY |
9 | EDGAR_GRANT | AARON_HENRY | BLAKE_CHUCK | AARON_BLAKE | AARON_ISAAC | FRANK_HENRY | AARON_BLAKE |
10 | CHUCK_ISAAC | FRANK_HENRY | AARON_FRANK | BLAKE_ISAAC | CHUCK_ISAAC | AARON_CHUCK | CHUCK_ISAAC |
11 | AARON_BLAKE | DREW_GRANT | BLAKE_HENRY | EDGAR_HENRY | CHUCK_FRANK | DREW_HENRY | CHUCK_HENRY |
12 | BLAKE_CHUCK | BLAKE_ISAAC | AARON_EDGAR | DREW_FRANK | DREW_ISAAC | AARON_GRANT | EDGAR_GRANT |
13 | FRANK_HENRY | BLAKE_EDGAR | FRANK_ISAAC | EDGAR_FRANK | FRANK_ISAAC | GRANT_HENRY | BLAKE_DREW |
14 | BLAKE_DREW | EDGAR_GRANT | DREW_HENRY | DREW_EDGAR | DREW_HENRY | EDGAR_GRANT | CHUCK_EDGAR |
15 | BLAKE_HENRY | AARON_DREW | DREW_ISAAC | AARON_FRANK | AARON_HENRY | BLAKE_EDGAR | AARON_ISAAC |
16 | CHUCK_DREW | CHUCK_GRANT | BLAKE_GRANT | CHUCK_EDGAR | BLAKE_CHUCK | BLAKE_DREW | DREW_GRANT |
17 | FRANK_GRANT | CHUCK_ISAAC | GRANT_HENRY | CHUCK_DREW | EDGAR_ISAAC | DREW_EDGAR | BLAKE_CHUCK |
18 | FRANK_ISAAC | AARON_CHUCK | EDGAR_FRANK | CHUCK_HENRY | EDGAR_GRANT | AARON_FRANK | GRANT_HENRY |
19 | GRANT_HENRY | BLAKE_HENRY | AARON_ISAAC | FRANK_GRANT | CHUCK_GRANT | BLAKE_HENRY | BLAKE_ISAAC |
20 | DREW_FRANK | AARON_GRANT | FRANK_HENRY | BLAKE_DREW | DREW_FRANK | CHUCK_GRANT | DREW_ISAAC |
21 | BLAKE_ISAAC | BLAKE_DREW | AARON_HENRY | CHUCK_FRANK | CHUCK_DREW | CHUCK_ISAAC | AARON_CHUCK |
22 | AARON_DREW | GRANT_ISAAC | CHUCK_EDGAR | HENRY_ISAAC | GRANT_HENRY | AARON_EDGAR | GRANT_ISAAC |
23 | AARON_GRANT | DREW_EDGAR | CHUCK_GRANT | GRANT_ISAAC | CHUCK_HENRY | CHUCK_DREW | EDGAR_HENRY |
24 | HENRY_ISAAC | EDGAR_HENRY | CHUCK_FRANK | CHUCK_ISAAC | CHUCK_EDGAR | HENRY_ISAAC | BLAKE_EDGAR |
25 | EDGAR_HENRY | FRANK_GRANT | DREW_FRANK | BLAKE_GRANT | AARON_FRANK | AARON_HENRY | HENRY_ISAAC |
26 | CHUCK_EDGAR | AARON_FRANK | BLAKE_FRANK | AARON_HENRY | GRANT_ISAAC | CHUCK_FRANK | BLAKE_GRANT |
27 | EDGAR_ISAAC | DREW_HENRY | EDGAR_HENRY | FRANK_ISAAC | AARON_GRANT | FRANK_ISAAC | BLAKE_FRANK |
28 | CHUCK_GRANT | AARON_BLAKE | CHUCK_HENRY | AARON_DREW | FRANK_GRANT | CHUCK_HENRY | FRANK_ISAAC |
29 | DREW_GRANT | CHUCK_EDGAR | HENRY_ISAAC | BLAKE_FRANK | BLAKE_EDGAR | AARON_DREW | AARON_DREW |
30 | DREW_ISAAC | CHUCK_DREW | CHUCK_ISAAC | AARON_CHUCK | BLAKE_ISAAC | DREW_GRANT | DREW_FRANK |
31 | AARON_HENRY | GRANT_HENRY | CHUCK_DREW | DREW_ISAAC | DREW_EDGAR | BLAKE_FRANK | CHUCK_GRANT |
32 | BLAKE_FRANK | EDGAR_ISAAC | BLAKE_EDGAR | AARON_ISAAC | AARON_BLAKE | CHUCK_EDGAR | AARON_HENRY |
33 | AARON_FRANK | BLAKE_FRANK | DREW_EDGAR | BLAKE_HENRY | EDGAR_HENRY | BLAKE_GRANT | DREW_HENRY |
34 | BLAKE_EDGAR | EDGAR_FRANK | EDGAR_ISAAC | FRANK_HENRY | AARON_CHUCK | AARON_BLAKE | AARON_GRANT |
35 | AARON_ISAAC | BLAKE_GRANT | AARON_DREW | * | BLAKE_DREW | GRANT_ISAAC | CHUCK_FRANK |
36 | * | BLAKE_CHUCK | FRANK_GRANT | * | EDGAR_FRANK | * | EDGAR_FRANK |
Hi Nicholas,
I just posted a hash table solution that could accomodate your need, at least if I understand properly, in the following thread https://communities.sas.com/message/176069#176069
Obviously, some tweaks would need to be made.
data _null_;
if _n_=1 then do;
declare hash h();
h.defineKey('name');
h.defineData('name', 'points');
h.defineDone;
end;
set have;
array round {*} round_1-round_7; /* This effectively allows us to use loops on the 7 rounds rather than hard code them */
do i=1 to 7;
if trim(round{i}) NE "*" then do;
win1 = scan(round{i}, 1, "_");
win2 = scan(round{i}, 2, "_");
roundpoints = max(0, 10-order+1);
rc=h.find(name: win1);
if rc NE 0 then do;
name=win1;
points=roundpoints;
h.add();
end; else do;
points=points+roundpoints;
h.replace();
end;
rc=h.find(name: win2);
if rc NE 0 then do;
name=win2;
points=roundpoints;
h.add();
end; else do;
points=points+roundpoints;
h.replace();
end;
end;
end;
rc=h.output(dataset: "work.want");
drop rc win1 win2 points;
run;
Here, you are using a hash object to add a new member each time a new name is encountered in the dataset initiating his "wins" value to 1. Afterwards, each time the same name shows in the DS, the 'wins' value is incremented by 1. The resulting data is stored in the hash object hence, the "want" data is output via the hash output method. This will create a dataset with columns 'name' and 'wins'.
It is currently untested but the logic and syntax element is all there.
Vincent
*Edit*
Reading Astounding's comment and going through the post you've linked, I believe I had misinterpreted your data. I believe the biggest question which we can't answer for you is how do you measure performance in each round. For instance, you could consider the sum of all orders for a given individual to be the best indicator of his performances (ascending). Or, similar to, say, F1 standings, you can attribute a score value per position and set everything below a certain position to 0 and then take the total score per round or the best score per round per individual etc.
For the sake of providing a better code sample, I've updated the above code to output a dataset that "scores" each round as follow: Top 10 positions gives 10/9.../1 points resp. The sum of all points on all rounds gives the score of an individual. You can then simply sort the Want dataset by descending order to rank players from best to worst.
Well, if you already have a solution for individuals instead of pairs, you might piggyback on that. For the pairs data, turn each observation into four observations: (A + B) vs. (C + D) becomes A vs. C, A vs. D, B vs. C, and B vs. D. Then apply your original solutions for individuals, but count each win as 0.5 instead of 1.
Good luck.
This is a great idea, as it is immediately generalizable to n-tuples on the one hand, and to weighted rankings based on opponents' record on the other.
Steve Denham
I'm not sure to follow or rather, the way I understand the data in the original post, I don't think each records are presenting matches of a pair vs another pair. It reads to me more like a "race against the clock" by pair and then each pair is ranked based on the time they put up.
Maybe I really misinterpreted the data but given that there are 8 distinct names or competitors and thus sum_(i=1)^(n-1) i = n*(n+1)/2 = 36 distinct pairs and that for each round robin, there are 36 records minus missings, this truly reads to me as though each round results are ordered by best pair performance to worst pair performance similar to the race against the clock concept. I can't seem to think of any team v team based competition in a round robin setup of all distinct pairs matchup that could result in 36 single matchup winners.
If you considered a round robin of matches with a single winer on a multinom(8,2,2,2,2) scenario, you'd have to have 7*6*5*4*3=2520 matches or 1/3 of that if the records represented the winner of a round robin within a round robin to effectively give even chances to each competitor of facing every other possible team configuration within a given team.
Anyway I'll watch the thread closely I'm quite intrigued by the concept
Vincent
Thanks Vincent , , for the input.
Vincent asks, "How do we measure performance in each round?"
Assume only what's given -- the 'Order' the 'Pair' came in for each 'Round'. Seems reasonable to sum 'Order' across all 'Rounds' for each individual, and compare grand sums, ascending, for overall rank orderings.
It should be pointed out that the data set here was created just for this post. The actual data set is far larger -- more than 100 'Contestants' x 100 'Rounds'. Thus Astounding's suggestion may not be practicable, because the resulting numbers of combinations would be so large. I'm not sure how to code it anyway. If Astounding or Steve wants to take a crack at it..... Super.
A "race against the clock" is one good way of viewing the contest. Thanks Vincent. For pairs..., lowest time sum wins. Pairs, along this line, could be broken down as well. Lowest grand time sum wins. Of course, we lack the 'Time' data. Only have the 'Rounds' rankings.
An additional question I'd like to ask regarding the data set is:
Is it possible to derive any additional information from the results, besides the primary objective of determining individual 'winners'?
Any procedure in SAS come to mind, to bring out a pattern, that is not so obvious, but that statistically is present?
Finally, as a motivator of sorts, let's imagine the data set representing professional basketball. And the top five contestants will be awarded multi-million dollar contracts.
Huge money awaits the grand winners.
All up to our statistical pronouncements.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.