BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NKormanik
Barite | Level 11

Suppose we have 100 runners. And they run 100 races. All we are given is the order that the racers come in, 1 through 100.

At the end of the race series what would be the fairest way to determine the top five winners?

What SAS procedure might one use to arrive at the solution?

We have 100 lists, of results. All lists numbered 1 to 100. A different ordering of names on each list, depending on the place they finish each race.  We concatenate horizontally and come up with one large spreadsheet.  Here would be a small sample of the spreadsheet:


Order FinishedRace 1Race 2Race 3
1DAD
2ADB
3CBA
4BCC


The whole spreadsheet is 100 x 100.  Let's assume that all runners run in all races.

Any help and insights greatly appreciated.

Nicholas Kormanik

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Add rank by person over all races, persons with the 5 lowest rank are the top 5.

data input;

input rank race1 $ race2 $  race3 $;

cards;

1 D A D

2 A D B

3 C B A

4 B C C

;

run;

proc transpose data=input out=data1 ;

by rank;

var race1 race2 race3;

run;

proc means data=data1 sum noprint;

ways 1;

class col1;

var rank;

output out=summary1 sum(rank)=total;

run;

proc sort data=summary1;

by total ;

run;

View solution in original post

23 REPLIES 23
Reeza
Super User

Add rank by person over all races, persons with the 5 lowest rank are the top 5.

data input;

input rank race1 $ race2 $  race3 $;

cards;

1 D A D

2 A D B

3 C B A

4 B C C

;

run;

proc transpose data=input out=data1 ;

by rank;

var race1 race2 race3;

run;

proc means data=data1 sum noprint;

ways 1;

class col1;

var rank;

output out=summary1 sum(rank)=total;

run;

proc sort data=summary1;

by total ;

run;

NKormanik
Barite | Level 11

Reeza, expertly done.  Thank you so very much.  I'll leave the question unanswered until tomorrow night in case anyone else has something more to add.

For now, though, you're the champ!

Ksharp
Super User

I would suggest principal component analysis . make the data like these and get the final score for every competitor based on the first two principal components.

          race1  race2

A     (the score of winning the champion has highest score)

B

Ksharp

NKormanik
Barite | Level 11

Ksharp, why PCA as opposed to what Reeza has done?

Would you please write the complete code for the simple example above so one can compare?

Reeza's code seems to work flawlessly (copy and paste right into SAS).  It would be interesting to see if PCA comes up with a different result.

Ksharp
Super User

Oh. Sorry . That doesn't matter whether the champion has highest score or lowest score. only if they has the same directory.

I used multi-variable analysis, not like Reeza , Principal Component Analysis will use the principal component of correlation matrix to explain the entire dataset data .

Or I want hear the opinion from Dr SteveDenham   who is a statistical expert.

Ksharp

SteveDenham
Jade | Level 19

This is an interesting problem.  Suppose you have an entrant who never wins a race, yet finishes second in all one hundred races, while there are never more than three or four races won by any other individual.  If I were ranking the runners, I would certainly rank the first one highest.  Now, what sort of analysis would yield this?  This is an example of 's principal components analysis.  That method extracts the vectors that explain the greatest variation, in order.  I would then pick the scores on the first component's loadings to get my rankings.

But since the variable is ordinal, rather than continuous, and there may be ties, I would explore PROC PRINQUAL, and in particular, look at Example 74.2 Principal Components of Basketball Rankings in the PRINQUAL documentation.

Steve Denham

Ksharp
Super User

Dr SteveDenham

I am glad to hear your explanation for this question and know a new proc   to process Principal Component analysis.

I hope I have some time to learn statistical methods which I really am amazed.

Best.

Ksharp

NKormanik
Barite | Level 11

Thanks very much for the lead.  Tonight I'll be reading up on PROC PRINQUAL.

Now what's needed here is to come up with the SAS code using PROC PRINQUAL to solve the initial problem.  I'm sure the same result will be had for this small test.  Will be very interesting, though, to see how it works with the larger data set.

PGStats
Opal | Level 21

Steve Denham wrote:

Now, what sort of analysis would yield this?

Actually, Reeza's ranking would yield that. With her total ranks method, the always-second runner would score 100*2 = 200 whereas no other runner would score lower than 4+96*3=292.

PG

PG
NKormanik
Barite | Level 11

  We still need the code for the original small test in the Proc PRINQUAL manner, to compare results with the other ways suggested.  I read the Basketball example, and it appears to be very similar.  The original data set would need to be transformed, though.  Please include that as well in your code.

The approaches suggested by and by yielded the same results.

Thanks to for suggesting PCA to begin with.

  Yes there are many different methodologies for determining rankings in sports.  In the present circumstance, however, we are given this particular set of results, and must try to come up with the fairest conclusions.

Particular concerns include:

-- Ties.  In a tie between S (Strong) and W (Weak), if S beat better opponents than W, seems the tie should be broken in the favor of S.

-- Individuals missing some races, though in the races he/she participated in did very well.

SteveDenham
Jade | Level 19

PRINQUAL is not going to behave well at all with a 4x4 sample matrix, due to the lack of sample size, so I'm going to pass on writing code for something that needs more data to be valid.  However, for the 100x100 situation, I would use the code for the basketball example, including the datasteps prior to calling the PROC.  These address ties and missing values.

Steve Denham

NKormanik
Barite | Level 11

Here is a transformed data set (thanks ), slightly expanded to include an additional runner, and some missing data (those runners missed a race).  This is in line with the data format in the very similar example mentioned above by , Example 74.2 Principal Components of Basketball Rankings, SAS/STAT(R) 12.1 User's Guide.

RunnerOrder_1Order_2Order_3
A2*3
B542
C454
D121
E31*

The Basketball example is quite complicated with parts I'm not familiar with.  If one of you wants to take a crack at translating that example to fit the present one, that would be fantastic.

At least the data is now in the required format, a substantial step forward.

I'll continue working on it as well.

NKormanik
Barite | Level 11

Here's the stripped down version of the example mentioned above.  What's needed is to translate it for the data set we have:

proc means data=nicholas.bballm;

   output out=nicholas.maxrank

      max=mcsn mdurs mdurh mwas musa mspom mins mupi map mspoi;

run;

data nicholas.bball;

   set nicholas.bballm;

   if _n_=1 then set nicholas.maxrank;

   array services[10] CSN--SportsIllustrated;

   array maxranks[10] mcsn--mspoi;

   keep School CSN--SportsIllustrated;

   do i=1 to 10;

      if services=. then services=maxranks+1;

   end;

run;

ods graphics on;

proc prinqual data=nicholas.bball out=nicholas.tbball scores n=1 tstandard=z

   plots=transformations;

   transform untie(CSN -- SportsIllustrated);

   id School;

run;

proc factor nfactors=1 plots=scree;

   ods select factorpattern screeplot;

   var TCSN -- TSportsIllustrated;

run;

proc sort;

   by Prin1;

run;

proc print;

   var School Prin1;

run;

PGStats
Opal | Level 21

Another scoring option is to look at who finished first the most often and to declare that runner the winner. If there is a tie, look among the ties who finished second the most often, and so on. This ranking is not trivial to compute but I would rely on the absence of a documented maximum on the number of sorting (BY) variables in proc sort to do the job, as follows :


data input;
input rank (race1-race3) ($) ;
cards;
1 D A D
2 A D B
3 C B A
4 B C C
;

data list;
set input;
array race race:;
do _n_ = 1 to dim(race);
     name = race{_n_};
     output;
     end;
drop race:;
run;

proc freq data=list noprint;
table name*rank / sparse out=scores(drop=percent);
run;

proc transpose data=scores out=scoreTable(drop=_:) prefix=r_;
var count;
by name;
id rank;
run;

proc sort data=scoreTable; key r_: / descending; run;

proc print data=scoreTable(obs=5) obs="Final rank"; run;

PG

PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 1666 views
  • 7 likes
  • 6 in conversation