Suppose we have 100 runners. And they run 100 races. All we are given is the order that the racers come in, 1 through 100.
At the end of the race series what would be the fairest way to determine the top five winners?
What SAS procedure might one use to arrive at the solution?
We have 100 lists, of results. All lists numbered 1 to 100. A different ordering of names on each list, depending on the place they finish each race. We concatenate horizontally and come up with one large spreadsheet. Here would be a small sample of the spreadsheet:
Order Finished | Race 1 | Race 2 | Race 3 |
1 | D | A | D |
2 | A | D | B |
3 | C | B | A |
4 | B | C | C |
The whole spreadsheet is 100 x 100. Let's assume that all runners run in all races.
Any help and insights greatly appreciated.
Nicholas Kormanik
Add rank by person over all races, persons with the 5 lowest rank are the top 5.
data input;
input rank race1 $ race2 $ race3 $;
cards;
1 D A D
2 A D B
3 C B A
4 B C C
;
run;
proc transpose data=input out=data1 ;
by rank;
var race1 race2 race3;
run;
proc means data=data1 sum noprint;
ways 1;
class col1;
var rank;
output out=summary1 sum(rank)=total;
run;
proc sort data=summary1;
by total ;
run;
Add rank by person over all races, persons with the 5 lowest rank are the top 5.
data input;
input rank race1 $ race2 $ race3 $;
cards;
1 D A D
2 A D B
3 C B A
4 B C C
;
run;
proc transpose data=input out=data1 ;
by rank;
var race1 race2 race3;
run;
proc means data=data1 sum noprint;
ways 1;
class col1;
var rank;
output out=summary1 sum(rank)=total;
run;
proc sort data=summary1;
by total ;
run;
Reeza, expertly done. Thank you so very much. I'll leave the question unanswered until tomorrow night in case anyone else has something more to add.
For now, though, you're the champ!
I would suggest principal component analysis . make the data like these and get the final score for every competitor based on the first two principal components.
race1 race2
A (the score of winning the champion has highest score)
B
Ksharp
Ksharp, why PCA as opposed to what Reeza has done?
Would you please write the complete code for the simple example above so one can compare?
Reeza's code seems to work flawlessly (copy and paste right into SAS). It would be interesting to see if PCA comes up with a different result.
Oh. Sorry . That doesn't matter whether the champion has highest score or lowest score. only if they has the same directory.
I used multi-variable analysis, not like Reeza , Principal Component Analysis will use the principal component of correlation matrix to explain the entire dataset data .
Or I want hear the opinion from Dr SteveDenham who is a statistical expert.
Ksharp
This is an interesting problem. Suppose you have an entrant who never wins a race, yet finishes second in all one hundred races, while there are never more than three or four races won by any other individual. If I were ranking the runners, I would certainly rank the first one highest. Now, what sort of analysis would yield this? This is an example of 's principal components analysis. That method extracts the vectors that explain the greatest variation, in order. I would then pick the scores on the first component's loadings to get my rankings.
But since the variable is ordinal, rather than continuous, and there may be ties, I would explore PROC PRINQUAL, and in particular, look at Example 74.2 Principal Components of Basketball Rankings in the PRINQUAL documentation.
Steve Denham
Dr SteveDenham
I am glad to hear your explanation for this question and know a new proc to process Principal Component analysis.
I hope I have some time to learn statistical methods which I really am amazed.
Best.
Ksharp
Thanks very much for the lead. Tonight I'll be reading up on PROC PRINQUAL.
Now what's needed here is to come up with the SAS code using PROC PRINQUAL to solve the initial problem. I'm sure the same result will be had for this small test. Will be very interesting, though, to see how it works with the larger data set.
Steve Denham wrote:
Now, what sort of analysis would yield this?
Actually, Reeza's ranking would yield that. With her total ranks method, the always-second runner would score 100*2 = 200 whereas no other runner would score lower than 4+96*3=292.
PG
We still need the code for the original small test in the Proc PRINQUAL manner, to compare results with the other ways suggested. I read the Basketball example, and it appears to be very similar. The original data set would need to be transformed, though. Please include that as well in your code.
The approaches suggested by and by yielded the same results.
Thanks to for suggesting PCA to begin with.
Yes there are many different methodologies for determining rankings in sports. In the present circumstance, however, we are given this particular set of results, and must try to come up with the fairest conclusions.
Particular concerns include:
-- Ties. In a tie between S (Strong) and W (Weak), if S beat better opponents than W, seems the tie should be broken in the favor of S.
-- Individuals missing some races, though in the races he/she participated in did very well.
PRINQUAL is not going to behave well at all with a 4x4 sample matrix, due to the lack of sample size, so I'm going to pass on writing code for something that needs more data to be valid. However, for the 100x100 situation, I would use the code for the basketball example, including the datasteps prior to calling the PROC. These address ties and missing values.
Steve Denham
Here is a transformed data set (thanks ), slightly expanded to include an additional runner, and some missing data (those runners missed a race). This is in line with the data format in the very similar example mentioned above by , Example 74.2 Principal Components of Basketball Rankings, SAS/STAT(R) 12.1 User's Guide.
Runner | Order_1 | Order_2 | Order_3 |
A | 2 | * | 3 |
B | 5 | 4 | 2 |
C | 4 | 5 | 4 |
D | 1 | 2 | 1 |
E | 3 | 1 | * |
The Basketball example is quite complicated with parts I'm not familiar with. If one of you wants to take a crack at translating that example to fit the present one, that would be fantastic.
At least the data is now in the required format, a substantial step forward.
I'll continue working on it as well.
Here's the stripped down version of the example mentioned above. What's needed is to translate it for the data set we have:
proc means data=nicholas.bballm;
output out=nicholas.maxrank
max=mcsn mdurs mdurh mwas musa mspom mins mupi map mspoi;
run;
data nicholas.bball;
set nicholas.bballm;
if _n_=1 then set nicholas.maxrank;
array services[10] CSN--SportsIllustrated;
array maxranks[10] mcsn--mspoi;
keep School CSN--SportsIllustrated;
do i=1 to 10;
if services=. then services=maxranks+1;
end;
run;
ods graphics on;
proc prinqual data=nicholas.bball out=nicholas.tbball scores n=1 tstandard=z
plots=transformations;
transform untie(CSN -- SportsIllustrated);
id School;
run;
proc factor nfactors=1 plots=scree;
ods select factorpattern screeplot;
var TCSN -- TSportsIllustrated;
run;
proc sort;
by Prin1;
run;
proc print;
var School Prin1;
run;
Another scoring option is to look at who finished first the most often and to declare that runner the winner. If there is a tie, look among the ties who finished second the most often, and so on. This ranking is not trivial to compute but I would rely on the absence of a documented maximum on the number of sorting (BY) variables in proc sort to do the job, as follows :
data input;
input rank (race1-race3) ($) ;
cards;
1 D A D
2 A D B
3 C B A
4 B C C
;
data list;
set input;
array race race:;
do _n_ = 1 to dim(race);
name = race{_n_};
output;
end;
drop race:;
run;
proc freq data=list noprint;
table name*rank / sparse out=scores(drop=percent);
run;
proc transpose data=scores out=scoreTable(drop=_:) prefix=r_;
var count;
by name;
id rank;
run;
proc sort data=scoreTable; key r_: / descending; run;
proc print data=scoreTable(obs=5) obs="Final rank"; run;
PG
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.