turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Determine top five winners in a series of races

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-26-2013 06:39 PM

Suppose we have 100 runners. And they run 100 races. All we are given is the order that the racers come in, 1 through 100.

At the end of the race series what would be the fairest way to determine the top five winners?

What SAS procedure might one use to arrive at the solution?

We have 100 lists, of results. All lists numbered 1 to 100. A different ordering of names on each list, depending on the place they finish each race. We concatenate horizontally and come up with one large spreadsheet. Here would be a small sample of the spreadsheet:

Order Finished | Race 1 | Race 2 | Race 3 |

1 | D | A | D |

2 | A | D | B |

3 | C | B | A |

4 | B | C | C |

The whole spreadsheet is 100 x 100. Let's assume that all runners run in all races.

Any help and insights greatly appreciated.

Nicholas Kormanik

Accepted Solutions

Solution

04-26-2013
09:09 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-26-2013 09:09 PM

Add rank by person over all races, persons with the 5 lowest rank are the top 5.

**data** input;

input rank race1 $ race2 $ race3 $;

cards;

1 D A D

2 A D B

3 C B A

4 B C C

;

**run**;

**proc** **transpose** data=input out=data1 ;

by rank;

var race1 race2 race3;

**run**;

**proc** **means** data=data1 sum noprint;

ways **1**;

class col1;

var rank;

output out=summary1 sum(rank)=total;

**run**;

**proc** **sort** data=summary1;

by total ;

**run**;

All Replies

Solution

04-26-2013
09:09 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-26-2013 09:09 PM

Add rank by person over all races, persons with the 5 lowest rank are the top 5.

**data** input;

input rank race1 $ race2 $ race3 $;

cards;

1 D A D

2 A D B

3 C B A

4 B C C

;

**run**;

**proc** **transpose** data=input out=data1 ;

by rank;

var race1 race2 race3;

**run**;

**proc** **means** data=data1 sum noprint;

ways **1**;

class col1;

var rank;

output out=summary1 sum(rank)=total;

**run**;

**proc** **sort** data=summary1;

by total ;

**run**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-27-2013 03:05 AM

Reeza, expertly done. Thank you so very much. I'll leave the question unanswered until tomorrow night in case anyone else has something more to add.

For now, though, you're the champ!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-27-2013 05:21 AM

I would suggest principal component analysis . make the data like these and get the final score for every competitor based on the first two principal components.

race1 race2

A (the score of winning the champion has highest score)

B

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-27-2013 10:38 AM

Ksharp, why PCA as opposed to what Reeza has done?

Would you please write the complete code for the simple example above so one can compare?

Reeza's code seems to work flawlessly (copy and paste right into SAS). It would be interesting to see if PCA comes up with a different result.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-27-2013 11:10 AM

Oh. Sorry . That doesn't matter whether the champion has highest score or lowest score. only if they has the same directory.

I used multi-variable analysis, not like Reeza , Principal Component Analysis will use the principal component of correlation matrix to explain the entire dataset data .

Or I want hear the opinion from Dr SteveDenham who is a statistical expert.

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-29-2013 09:28 AM

This is an interesting problem. Suppose you have an entrant who never wins a race, yet finishes second in all one hundred races, while there are never more than three or four races won by any other individual. If I were ranking the runners, I would certainly rank the first one highest. Now, what sort of analysis would yield this? This is an example of 's principal components analysis. That method extracts the vectors that explain the greatest variation, in order. I would then pick the scores on the first component's loadings to get my rankings.

But since the variable is ordinal, rather than continuous, and there may be ties, I would explore PROC PRINQUAL, and in particular, look at Example 74.2 Principal Components of Basketball Rankings in the PRINQUAL documentation.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-29-2013 10:01 AM

Dr SteveDenham

I am glad to hear your explanation for this question and know a new proc to process Principal Component analysis.

I hope I have some time to learn statistical methods which I really am amazed.

Best.

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-29-2013 08:57 PM

Thanks very much for the lead. Tonight I'll be reading up on PROC PRINQUAL.

Now what's needed here is to come up with the SAS code using PROC PRINQUAL to solve the initial problem. I'm sure the same result will be had for this small test. Will be very interesting, though, to see how it works with the larger data set.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-29-2013 09:40 PM

Steve Denham wrote:

Now, what sort of analysis would yield this?

Actually, Reeza's ranking would yield that. With her total ranks method, the always-second runner would score 100*2 = 200 whereas no other runner would score lower than 4+96*3=292.

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-30-2013 03:28 AM

We still need the code for the original small test in the Proc PRINQUAL manner, to compare results with the other ways suggested. I read the Basketball example, and it appears to be very similar. The original data set would need to be transformed, though. Please include that as well in your code.

The approaches suggested by and by yielded the same results.

Thanks to for suggesting PCA to begin with.

Yes there are many different methodologies for determining rankings in sports. In the present circumstance, however, we are given this particular set of results, and must try to come up with the fairest conclusions.

Particular concerns include:

-- Ties. In a tie between S (Strong) and W (Weak), if S beat better opponents than W, seems the tie should be broken in the favor of S.

-- Individuals missing some races, though in the races he/she participated in did very well.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-30-2013 07:08 AM

PRINQUAL is not going to behave well at all with a 4x4 sample matrix, due to the lack of sample size, so I'm going to pass on writing code for something that needs more data to be valid. However, for the 100x100 situation, I would use the code for the basketball example, including the datasteps prior to calling the PROC. These address ties and missing values.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-01-2013 03:08 AM

Here is a transformed data set (thanks ), slightly expanded to include an additional runner, and some missing data (those runners missed a race). This is in line with the data format in the very similar example mentioned above by , Example 74.2 Principal Components of Basketball Rankings, SAS/STAT(R) 12.1 User's Guide.

Runner | Order_1 | Order_2 | Order_3 |

A | 2 | * | 3 |

B | 5 | 4 | 2 |

C | 4 | 5 | 4 |

D | 1 | 2 | 1 |

E | 3 | 1 | * |

The Basketball example is quite complicated with parts I'm not familiar with. If one of you wants to take a crack at **translating** that example to fit the present one, that would be fantastic.

At least the data is now in the required format, a substantial step forward.

I'll continue working on it as well.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-02-2013 02:56 AM

Here's the stripped down version of the example mentioned above. What's needed is to translate it for the data set we have:

proc means data=nicholas.bballm;

output out=nicholas.maxrank

max=mcsn mdurs mdurh mwas musa mspom mins mupi map mspoi;

run;

data nicholas.bball;

set nicholas.bballm;

if _n_=1 then set nicholas.maxrank;

array services[10] CSN--SportsIllustrated;

array maxranks[10] mcsn--mspoi;

keep School CSN--SportsIllustrated;

do i=1 to 10;

if services*=. then services =maxranks+1;*

end;

run;

ods graphics on;

proc prinqual data=nicholas.bball out=nicholas.tbball scores n=1 tstandard=z

plots=transformations;

transform untie(CSN -- SportsIllustrated);

id School;

run;

proc factor nfactors=1 plots=scree;

ods select factorpattern screeplot;

var TCSN -- TSportsIllustrated;

run;

proc sort;

by Prin1;

run;

proc print;

var School Prin1;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-27-2013 11:37 PM

Another scoring option is to look at who finished first the most often and to declare that runner the winner. If there is a tie, look among the ties who finished second the most often, and so on. This ranking is not trivial to compute but I would rely on the absence of a documented maximum on the number of sorting (BY) variables in **proc sort** to do the job, as follows :

**data input;****input rank (race1-race3) ($) ;****cards;****1 D A D****2 A D B****3 C B A****4 B C C****;**

**data list;****set input;****array race race:;****do _n_ = 1 to dim(race);**** name = race{_n_};**** output;**** end;****drop race:;****run;**

**proc freq data=list noprint;****table name*rank / sparse out=scores(drop=percent);****run;**

**proc transpose data=scores out=scoreTable(drop=_ prefix=r_;****var count;****by name;****id rank;****run;**

**proc sort data=scoreTable; key r_: / descending; run;**

**proc print data=scoreTable(obs=5) obs="Final rank"; run;**

PG

PG