Summer is here! All across Europe temperatures are soaring and after a long and difficult year of pandemic people are making the most of the good weather. And with summer, sport comes hand-in-hand. It’s certainly been an exciting and controversial summer of sport so far this year, but the event I’ll be watching most closely is more normally associated with muddy puddles than hot summer evenings. I am, of course, talking about rugby.
The British & Irish Lions is an invitational tour for the best players from the ‘Four Nations’: England, Scotland, Wales & Ireland. The team comes together once every four years to play one of the top Southern Hemisphere sides: New Zealand, Australia, or South Africa. This year the Lions have travelled to South Africa where they will be facing the current World Champions in a grueling series of three test matches as well as a series of warm up games against some of the regional sides.
So, what does this have to do with Data Analytics? A few weeks ago I was scrolling through social media and came across a sports pundit’s predictions on the starting line-up for the first test match in the series, scheduled (COVID permitting) for July 24th. My first thought was that it definitely wasn’t the line-up I’d have picked or expected and my second thought, rather sadly, was “I wonder if we could pick something better using analytics”.
I’d seen other demonstrations and examples in sports analytics where you perhaps pick a fantasy league using analytics, where you can use optimization procedures subject to a budget constraint. But picking the Lions line-up by numbers is a rather different task.
Firstly, we are picking the best starting line-up from a squad of players who have been shortlisted. They’ve boarded the plane to South Africa and been training in the camps readying themselves for selection. There is no ‘budget’ constraint here and there is a fixed pool of talent – the coach simply has to pick the best team to face South Africa. This line-up is likely to change between the different tests based on evolving tactics, injuries and second-guessing which side The Boks will choose.
The code files to run this simulation can be found on my GitHub profile here: https://github.com/HarrySnart/lionsOptimization
The goal of this exercise was to come up with a model that generates a starting line-up based on performance data and analyst preference weightings. I wanted to produce a model that was flexible enough that it could be tweaked to generate the best side even if playing preferences change. In simulation and optimization this is often referred to as a sensitivity analysis. This has a very practical application for this dataset too, if the coach wanted to re-run the model to select a side that is better for pure attacking or defensive play one would only need to change the parameter weights to re-select the side.
We do this in a two-fold approach. Firstly, each position has a weighted preference ranking based on a set of inputs using the PROMETHEE algorithm. Here we use PROC IML to implement PROMETHEE II in order to get a complete preference ranking for each eligible player of each position. Since many players cover multiple positions, these players get a rank for each position they can cover.
Once we have the complete rankings by position we reduce the list as a Linear Assignment Problem solved as the minimum-weight matching in a bipartite directed graph via PROC OPTMODEL.
It is worth noting that you could substitute the custom algorithm implementation by taking a weighted ranking approach via PROC RANK. I opted for PROMETHEE as its an algorithm I had used before, and the weightings were useful for the cost minimisation step in the graph solver.
This process is done twice – firstly for the starting line-up (positions 1-15) and then again for the remaining players to fill out bench (positions 16-23). The only difference between these steps is a change to our preference weightings. When picking the bench we increase our preference weighting for the number of positions a player can cover, since we often want utility players on the bench to cover possible match injuries.
Of course, as with any model, it serves best as a data-driven guide to business decisions – and should be seen as a decision-making tool rather than the de facto optimal solution. I feel this is particularly pertinent to something like selecting a team, because there is an element of Subject Matter Expertise relied on from a strategic perspective and many of the variables in the model may have a subjective element. For example, if you want to select the best goal kicker then choosing them based on purely their kicking accuracy is misleading as there are many factors such as point in game, wind, angle etc. that may influence the kicking accuracy. Therefore, how you define kicking accuracy may depend on what you are looking for in a player.
The same can be said with having an over-reliance on quantitative metrics for player selection in isolation. Take the flyhalf position for example, in many of the models Owen Farrell is selected as the starting 10. The reason for this is very simple. He has many international caps, previous Lions caps and has very strong performance metrics on average, he is well known for his laser precision kicking. Where he often comes under scrutiny is not being as creative in distribution and play making than other flyhalves. Most recently he has been playing at 12 for England with the play making being done by George Ford. It goes without saying that a mathematical model for player selection will be selecting players by looking at the numbers. If you cannot define a metric for creativity or flare, it won’t be considered when modelling player selection.
The final solution for this project was to build an interactive dashboard with several sides selected, and the ability to bring in a new set of analyst selections to re-select the side dynamically. Figure 1 shows the interactive report built in SAS Visual Analytics. This shows pre-ran selections for varying preference weights associated with team level attributes like Attacking, Defensive, Youthful and Experienced.
The value of this dashboard is seeing how it picks players based on league and international performance. Looking at the ‘Attacking’ side there is an interesting line-up. There are ‘bolter’ players who’ve not had much international experience but have lit their domestic leagues on fire this season, most notably Sam Simmonds and Louis Rees-Zammit. Ali Price is also selected at scrumhalf which, interestingly, has been debated in the BBC this week as a selection headache as he has been giving incumbent Conor Murray a run for his money.
What also makes this report interesting is that each of the players selected in the squad was selected as a contender for the starting line-up, and every fan will have a different view of who deserves the starting spot and the beauty of this dashboard is, depending on how the parameters are changed, every players gets a fair evaluation.
Figure 1 - Team Selection Simulations Dashboard
In the next article we explore the dataset before we begin building our model. Read Part 2 of the series here: Using SAS Viya to Select the British & Irish Lions Rugby Team: Part 2
Just uncovered this gem. Thank you for the interesting and fun optimization use-case @HarrySnart!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.