Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Ranking prediction | ordered logit/probit

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 04-11-2021 05:52 AM
(1042 views)

Hi everyone,

when trying to predict the final ranking of a football competition (dependent variable) based multiple independent variables, I wanted to use an ordered logit/probit model.

Therefore I have split my data into training data and test data as I wanted the model to learn from the training data and make predictions about the test data.

__My code is the following:__

proc logistic data=trainingdata;

model ranking = &inputvariables;

score data=testdata out=work.ologitoutput;

run;

The problem is that the predicted rankings are not unique, not every rank is given exactly one time.

For example, the model places 3 teams on the first position in the final ranking, no team second,... I think that the problem is that the model just assigns each team to the ranking with the highest probability when only taking into account this one team, without looking at the other teams' probabilities.

Could anyone help me out, I would really appreciate it!!

Best regards,

Simon

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello,

You touch an interesting problem.

I think I never had to make an ordered (ordinal) prediction where each 'category' could only be predicted (assigned) once.

Your diagnosis about why you get 3 times a rank 1 prediction and never a rank 2 predictions seems correct to me.

Although an ordered logit model is definitely a good choice,

I doubt that it can all be done with a simple extension to the (simple) code that you provide.

I say that an ordered logit model is a good choice because it outperformed other (more complex) models in this study:

Forecasting the FIFA World Cup – Combining result- and goal-based team ability parameters

Pieter Robberechts and Jesse Davis

KU Leuven, Department of Computer Science

(It's the university where I have studied by the way 😇)

I haven't read the article yet, and hence cannot come up with an answer ready-to-consume.

Look also at this interesting blog (it does not provide an answer to your question though)

Basketball tournaments, Moneyball, and sports analytics

By Robert Allison on SAS Learning Post March 21, 2013

https://blogs.sas.com/content/sastraining/2013/03/21/march-madness-moneyball-and-sports-analytics/

Somebody will for sure provide an appropriate answer. I will follow-up with great interest.

Cheers,

Koen

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Koen,

Thank you for your reply.

I have also read some interesting papers that try to predict a final ranking where the ordered probit model outperforms. However, none of them mentioned my problem, so I guess there must be a solution.

Regards,

Simon

Thank you for your reply.

I have also read some interesting papers that try to predict a final ranking where the ordered probit model outperforms. However, none of them mentioned my problem, so I guess there must be a solution.

Regards,

Simon

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Could you come up with a tie-breaker among the current tied ranks (say previous year's ranks?), and then use the new "synthesized ranks" as the outcome measure for your training data set?

Of course, you would want a tie-breaking rule that would make the new ranks completely distinct.

I say the following as a person who once knew a tiny bit about model estimation, but never involved in rank prediction:

Really, tied ranks in the training data set just tells you that there's not much difference between adjacent ranks, yes?

If so, isn't there some justification in randomly breaking ties, with the knowledge that adjacent ranks in the subsequent test data set won't be distinct. Maybe this is a case where a type of random resampling of the randomized tie-breaking of the training data would make sense.

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @mkeintz ,

Thank you for your reply, a tie-breaker is an interesting idea!

Proc logistic always estimates the probability of a club achieving ranking x and the ranking with the highest probability is then assigned to the club. Maybe an interesting tie-breaker would be the probability to obtain the first ranking? This would be an intuitive way to give the stronger teams the higher ranking in case of a tie.

Do you think this makes sense?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Simon123 wrote:

Hi @mkeintz ,

Thank you for your reply, a tie-breaker is an interesting idea!

Proc logistic always estimates the probability of a club achieving ranking x and the ranking with the highest probability is then assigned to the club. Maybe an interesting tie-breaker would be the probability to obtain the first ranking? This would be an intuitive way to give the stronger teams the higher ranking in case of a tie.

Do you think this makes sense?

So you want to run 2 logistics on the training data set: (1) generate probability of achieving first rank to generate tie-break scores, (2) using the adjusted ranks (i.e. with ties broken by those scores) as the basic training data model estimation.

Frankly, I have no opinion, but it does offer the advantage of avoiding the use of some external source of tie-breaking data. Fair warning: I've never done this, so I offer no experience.

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @StatDave ,

Thank you for your reply, my data looks as follows for each of the entities (football clubs):

football club Year Ranking obtained (=dependent var) Turnover Net income ....

The goal is rather to predict the final ranking directly rather than modeling it indirectly through pairs of competitors because of the 'nature' of the independent variables Turnover, Net income,... The independent variables are financial information from the football clubs' annual reports prior to the season of which the ranking is obtained.

By splitting the first years of my study period into training data and keeping the last years as test data, I hoped that the ordered probit model would notice that in each year, each ranking is given exactly once.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 25. Read more here about **why** you should contribute and **what is in it** for you!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.