Help using Base SAS procedures

Determine top five winners in a series of races

Accepted Solution Solved
Reply
Regular Contributor
Posts: 212
Accepted Solution

Determine top five winners in a series of races

Suppose we have 100 runners. And they run 100 races. All we are given is the order that the racers come in, 1 through 100.

At the end of the race series what would be the fairest way to determine the top five winners?

What SAS procedure might one use to arrive at the solution?

We have 100 lists, of results. All lists numbered 1 to 100. A different ordering of names on each list, depending on the place they finish each race.  We concatenate horizontally and come up with one large spreadsheet.  Here would be a small sample of the spreadsheet:


Order FinishedRace 1Race 2Race 3
1DAD
2ADB
3CBA
4BCC


The whole spreadsheet is 100 x 100.  Let's assume that all runners run in all races.

Any help and insights greatly appreciated.

Nicholas Kormanik


Accepted Solutions
Solution
‎04-26-2013 09:09 PM
Super User
Posts: 17,750

Re: Determine top five winners in a series of races

Add rank by person over all races, persons with the 5 lowest rank are the top 5.

data input;

input rank race1 $ race2 $  race3 $;

cards;

1 D A D

2 A D B

3 C B A

4 B C C

;

run;

proc transpose data=input out=data1 ;

by rank;

var race1 race2 race3;

run;

proc means data=data1 sum noprint;

ways 1;

class col1;

var rank;

output out=summary1 sum(rank)=total;

run;

proc sort data=summary1;

by total ;

run;

View solution in original post


All Replies
Solution
‎04-26-2013 09:09 PM
Super User
Posts: 17,750

Re: Determine top five winners in a series of races

Add rank by person over all races, persons with the 5 lowest rank are the top 5.

data input;

input rank race1 $ race2 $  race3 $;

cards;

1 D A D

2 A D B

3 C B A

4 B C C

;

run;

proc transpose data=input out=data1 ;

by rank;

var race1 race2 race3;

run;

proc means data=data1 sum noprint;

ways 1;

class col1;

var rank;

output out=summary1 sum(rank)=total;

run;

proc sort data=summary1;

by total ;

run;

Regular Contributor
Posts: 212

Re: Determine top five winners in a series of races

Reeza, expertly done.  Thank you so very much.  I'll leave the question unanswered until tomorrow night in case anyone else has something more to add.

For now, though, you're the champ!

Super User
Posts: 9,662

Re: Determine top five winners in a series of races

I would suggest principal component analysis . make the data like these and get the final score for every competitor based on the first two principal components.

          race1  race2

A     (the score of winning the champion has highest score)

B

Ksharp

Regular Contributor
Posts: 212

Re: Determine top five winners in a series of races

Ksharp, why PCA as opposed to what Reeza has done?

Would you please write the complete code for the simple example above so one can compare?

Reeza's code seems to work flawlessly (copy and paste right into SAS).  It would be interesting to see if PCA comes up with a different result.

Super User
Posts: 9,662

Re: Determine top five winners in a series of races

Oh. Sorry . That doesn't matter whether the champion has highest score or lowest score. only if they has the same directory.

I used multi-variable analysis, not like Reeza , Principal Component Analysis will use the principal component of correlation matrix to explain the entire dataset data .

Or I want hear the opinion from Dr SteveDenham   who is a statistical expert.

Ksharp

Respected Advisor
Posts: 2,655

Re: Determine top five winners in a series of races

This is an interesting problem.  Suppose you have an entrant who never wins a race, yet finishes second in all one hundred races, while there are never more than three or four races won by any other individual.  If I were ranking the runners, I would certainly rank the first one highest.  Now, what sort of analysis would yield this?  This is an example of 's principal components analysis.  That method extracts the vectors that explain the greatest variation, in order.  I would then pick the scores on the first component's loadings to get my rankings.

But since the variable is ordinal, rather than continuous, and there may be ties, I would explore PROC PRINQUAL, and in particular, look at Example 74.2 Principal Components of Basketball Rankings in the PRINQUAL documentation.

Steve Denham

Super User
Posts: 9,662

Re: Determine top five winners in a series of races

Dr SteveDenham

I am glad to hear your explanation for this question and know a new proc   to process Principal Component analysis.

I hope I have some time to learn statistical methods which I really am amazed.

Best.

Ksharp

Regular Contributor
Posts: 212

Re: Determine top five winners in a series of races

Thanks very much for the lead.  Tonight I'll be reading up on PROC PRINQUAL.

Now what's needed here is to come up with the SAS code using PROC PRINQUAL to solve the initial problem.  I'm sure the same result will be had for this small test.  Will be very interesting, though, to see how it works with the larger data set.

Respected Advisor
Posts: 4,641

Re: Determine top five winners in a series of races

Steve Denham wrote:

Now, what sort of analysis would yield this?

Actually, Reeza's ranking would yield that. With her total ranks method, the always-second runner would score 100*2 = 200 whereas no other runner would score lower than 4+96*3=292.

PG

PG
Regular Contributor
Posts: 212

Re: Determine top five winners in a series of races

  We still need the code for the original small test in the Proc PRINQUAL manner, to compare results with the other ways suggested.  I read the Basketball example, and it appears to be very similar.  The original data set would need to be transformed, though.  Please include that as well in your code.

The approaches suggested by and by yielded the same results.

Thanks to for suggesting PCA to begin with.

  Yes there are many different methodologies for determining rankings in sports.  In the present circumstance, however, we are given this particular set of results, and must try to come up with the fairest conclusions.

Particular concerns include:

-- Ties.  In a tie between S (Strong) and W (Weak), if S beat better opponents than W, seems the tie should be broken in the favor of S.

-- Individuals missing some races, though in the races he/she participated in did very well.

Respected Advisor
Posts: 2,655

Re: Determine top five winners in a series of races

PRINQUAL is not going to behave well at all with a 4x4 sample matrix, due to the lack of sample size, so I'm going to pass on writing code for something that needs more data to be valid.  However, for the 100x100 situation, I would use the code for the basketball example, including the datasteps prior to calling the PROC.  These address ties and missing values.

Steve Denham

Regular Contributor
Posts: 212

Re: Determine top five winners in a series of races

Here is a transformed data set (thanks ), slightly expanded to include an additional runner, and some missing data (those runners missed a race).  This is in line with the data format in the very similar example mentioned above by , Example 74.2 Principal Components of Basketball Rankings, SAS/STAT(R) 12.1 User's Guide.

RunnerOrder_1Order_2Order_3
A2*3
B542
C454
D121
E31*

The Basketball example is quite complicated with parts I'm not familiar with.  If one of you wants to take a crack at translating that example to fit the present one, that would be fantastic.

At least the data is now in the required format, a substantial step forward.

I'll continue working on it as well.

Regular Contributor
Posts: 212

Re: Determine top five winners in a series of races

Here's the stripped down version of the example mentioned above.  What's needed is to translate it for the data set we have:

proc means data=nicholas.bballm;

   output out=nicholas.maxrank

      max=mcsn mdurs mdurh mwas musa mspom mins mupi map mspoi;

run;

data nicholas.bball;

   set nicholas.bballm;

   if _n_=1 then set nicholas.maxrank;

   array services[10] CSN--SportsIllustrated;

   array maxranks[10] mcsn--mspoi;

   keep School CSN--SportsIllustrated;

   do i=1 to 10;

      if services=. then services=maxranks+1;

   end;

run;

ods graphics on;

proc prinqual data=nicholas.bball out=nicholas.tbball scores n=1 tstandard=z

   plots=transformations;

   transform untie(CSN -- SportsIllustrated);

   id School;

run;

proc factor nfactors=1 plots=scree;

   ods select factorpattern screeplot;

   var TCSN -- TSportsIllustrated;

run;

proc sort;

   by Prin1;

run;

proc print;

   var School Prin1;

run;

Respected Advisor
Posts: 4,641

Re: Determine top five winners in a series of races

Another scoring option is to look at who finished first the most often and to declare that runner the winner. If there is a tie, look among the ties who finished second the most often, and so on. This ranking is not trivial to compute but I would rely on the absence of a documented maximum on the number of sorting (BY) variables in proc sort to do the job, as follows :


data input;
input rank (race1-race3) ($) ;
cards;
1 D A D
2 A D B
3 C B A
4 B C C
;

data list;
set input;
array race race:;
do _n_ = 1 to dim(race);
     name = race{_n_};
     output;
     end;
drop race:;
run;

proc freq data=list noprint;
table name*rank / sparse out=scores(drop=percent);
run;

proc transpose data=scores out=scoreTable(drop=_Smiley Happy prefix=r_;
var count;
by name;
id rank;
run;

proc sort data=scoreTable; key r_: / descending; run;

proc print data=scoreTable(obs=5) obs="Final rank"; run;

PG

PG
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 23 replies
  • 558 views
  • 7 likes
  • 6 in conversation