By “The Data Caddie”
Golf is often described as a game of art and feeling. But at its core, it is a physics engine governed by strict constraints: distance, friction, aerodynamics, and probability. For the data scientist, a round of golf isn't just a walk down the fairway, it’s a resource allocation problem waiting to be solved.
In this post, we break down how we utilized SAS PROC OPTMODEL to build a "Digital Caddie" for Oslo Golfklubb (Bogstad). By treating the course as a mathematical network and the player’s bag as a set of constrained resources, we moved beyond simple heuristics to find the mathematically optimal path to the hole.
This blog is of course not a complete guide of how to become Tiger Woods – it is merely a fun Christmas project showing how SAS procedures can hopefully support your improvement and simulate how to become a better golfer.
I do not take any responsibility if You are not improving your swing – that is done by practicing for You. I understand there are numerous numbers of options we could model into the optimization – however I believe we need to set a limit at some stage. This blog post will describe my approach to it. Don’t shoot the messenger – at least yell FORE before - so I can duck away.
Golf is a game played on a six-inch course—the one between Your ears. A perfect, optimal strategy is useless if the player cannot execute the required shot consistently under pressure. The optimization of strokes is a powerful data problem, but the real game is one of mental resilience, muscle memory, and managing the inevitable "noise" of a bad shot.
Our model might prove that a 3-Wood and a 9-Iron is the "optimal" path to the green. But this is worthless if the player's 3-Wood is their least consistent club, one they only hit purely 1 out of 5 times. The true best strategy for that player might be a "sub-optimal" 4-Hybrid and an 8-Iron, which they can hit confidently 9 out of 10 times.
So, what is the purpose of our PROC OPTMODEL caddie if it can't account for this human element?
The answer is that our model is not a replacement for the golfer; it is a decision-support tool designed to reduce cognitive load.
The average golfer is already processing dozens of variables: "Is the pin in the front or back? Is the wind helping or hurting? Will this lie cause a hook? Is a 5-Iron enough to clear that bunker?"
Our model's job is to solve all the complex physics before the round begins. It takes the "Plays Like" distance, the dynamic wind, the elevation, and the player's scaled ability and provides a clear, data-driven recommendation. It removes the guesswork.
By having a trusted, optimal plan, the player is freed from the burden of calculation. They can step up to the ball with a single, clear-minded task: focus on execution.
This post is not about replacing the art of golf with a cold algorithm. It's about using a powerful algorithm to handle the complex science, so the player can be free to perform the art.
Let's build the model.
Most GPS watches and casual golfers use a "Greedy Algorithm" approach: Hit the longest club that doesn't go over the green. While simple, this heuristic fails to account for complex trade-offs. I believe this approach will go for most amatures and even pros at some holes on a course.
We have a lot of help from study the course we are playing – however remember everything and to be prepared with data - will this make me a better player? I must say maybe, – as the most critical factor wrt. nailing that course is You. What I’m aiming for is to help you so that You can be prepared and hopefully improve your score based on the modelling of the course and Your stats et such.
Figure 1 Bogstad Golf Course overview of course
For example, on a 310-meter Par 4 with a strong tailwind:
Why might the second be better? Perhaps the Driver brings a hazard into play, or the 4-Iron landing zone is flatter. To solve this, we need Mixed Integer Linear Programming (MILP). We aren't just looking for a valid sequence of shots; we are looking for a set of shots that minimizes risk and effort while adhering to a strict stroke budget.
Optimization requires precise input. We defined the Bogstad course using vector mathematics rather than just static lengths.
Our model relies on three key pieces of information:
We utilized the official Oslo GK Slope/Scorecard to determine the Course Handicap (Spillehandicap), which gives us the "Allocated Strokes" per hole based on the hole's difficulty index (HCP Index).
We then modeled the physical properties of every hole:
Before the optimization runs, we calculate an Effective Distance Factor. This is not a static variable; it is calculated dynamically for every hole based on the interaction between the hole's bearing and the live weather data.
We use trigonometry to decompose the wind vector relative to the hole direction:
|
SAS
/* SAS Logic for Dynamic Wind */ |
This means a 15 km/h wind from the West will shorten a West-facing hole, lengthen an East-facing hole, and apply a sidewind penalty to North/South holes.
This is the heart of the operation. We use the SAS Optimization procedure to formulate the problem.
We define an integer variable NumberOfStrokes[h, k], representing how many times club k is used on hole h.
We want to minimize the total strokes. However, to prevent the solver from simply picking the longest clubs that "just barely" pass the hole, we introduced a secondary objective: Minimal Overshoot.
This tiny penalty ∑ forces the solver to choose the combination of clubs that lands closest to the pin without being short, rather than just any combination that covers the distance.
The model must obey two strict physical laws:
|
SAS
/* The MILP Model formulation */ |
PROC OPTMODEL returns an Optimal Inventory. For Hole 1, it might tell us: {Driver: 1, PW: 1}. It tells us what to use, but not when.
To generate the final "Caddie Card," we post-process this inventory. Since we assume a standard strategy of maximizing distance off the tee, we sort the optimal inventory by club distance (descending). This transforms the raw optimization data into a human-readable sequence:
Hole 1 (Par 4, 321m) - Headwind
Reports you can get:
And these can be available on your phone or tablet to bring on to the course. Even on your watch I presume.
By combining PROC OPTMODEL with robust data preparation, we created a system that adapts to the player's handicap, the course's specific layout, and the live weather conditions. It demonstrates that golf strategy isn't just about intuition; it's a solvable linear programming challenge.
For those playing Bogstad GK, remember to check your Course Handicap before running the model—parameters matter!
Happy holidays with coding and golfing.
/Ole-Martin Hafslund - novice green card golf player
I'm impressed, Ole Martin! I didn't even know you played golf! This is so interesting! Thank you for sharing!
Look at the SAS users program for Nordics, the UK and Ireland FANS! Network meetings and Events, Ask the Expert webinars, Nordic Newsletter, and SAS Analytics Explorers.
www.sas.com/fans | #SASFANS #sasnordicusers
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!