When I tell people I work with data science (I would never have the ambition to call myself a fully grown-up data scientist!), this is typically the response I get: “Oh, you do? Wow, so you’re a crazy geek secluded in her lab writing crazy lines of incomprehensible code at all times with no purpose whatsoever!”
Well, the answer is no no no. At the beginning of this century, I chose to focus on analytics (I guess the term data science had not been invented yet) because while I did love to be the crazy geek writing lines of code and learning about probabilities, I also wanted to live in the real world (possibly a world full of shoes)
The beauty of using analytics on data is that allows you to make informed decisions on… well, not about everything, but a lot of things: somehow, data shows you the way, it’s like having Roberto Baggio’s magic foot.
Speaking of which, today I’d like to talk to you about creating an analytics factory to predict the results of soccer games. If we look to describe in an analytical mindset what type of problems we’re looking to solve when predicting soccer games, these are things I might want to know:
However, let’s face it, not every country is the same, and not every division is the same. I need to build out proper segments to take into account the specific differences and characteristics when building out similar predictive models.
I also need to make sure I collect data in a way that well represent the events I want to predict. Say I want to predict if the home team will win the game, I’ll need to collect information regarding the “static” characteristics of the match itself but also regarding the “dynamic” events that happen during the game. I can summarize a subset of the information I need in the following schema:
A collection of this information at individual match level will provide me with the features, or inputs, of the model I want to predict, while the outcome of the game (win/no win) will represent the label, or the target. Keep in mind you’ll be able to use events as they happen in the game before the game is actually over (e.g., red/yellow cards) or events based on historic data, depending on how early in the process you want to make the prediction.
This is an example of the set of analytical models I want to build out:
A tool like SAS® Factory Miner is the perfect tool to build similar segmented models, as described in this article. If we think of this process as a strategic decision for this hypothetical soccer-invested company, it is actually very rare for decisions to be made on analytics only. Typically you’ll be bringing in “the experts”, aka leverage on business rules that are derived from the historical knowledge of the market you’re trying to model. If this were my husband, he’d probably be telling me that if Lionel Messi doesn’t play, there’s no way Barcelona will win the game no matter how high the odds of my model.
SAS® Decision Manager can help design similar rules in a straight-forward yet powerful way, and is easy to integrate with the outcome of the models.
Now all I’m missing is to really get the ball rolling (no pun intended!) and start making decisions on the results of soccer games across the world. You’d be surprised about the amount of public data that is available out there containing all kind of details on these games!
How do I make decisions? First, I need to decide how I want to act. Let's say I make group decisions every Friday afternoon, right before the soccer weekend begins, to gather an idea of how the games will play out. This allows me to get a first feeling of the results based the info I have at the time and then make my first set of informed decisions (i.e., prepare the team for my fantasy league).
Ideally, I could also set up a real-time execution that monitors events in the game as they happen and sends me a periodic update (say every 5 or 10 minutes), so that I could trigger the models and rules once again and update or correct my decision (maybe some soccer team will hire me as they coach too!).
Again, SAS® Decision Manager can help me set up the best design to trigger these decisions, and deploy them as batch jobs or real-time web services.
Do you still think I am a crazy geek secluded in her lab writing crazy lines of incomprehensible code??? Well, we’ll meet you when I am up there:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.