Finding a Needle in a Haystack: Bayesian Search Theory with SAS IML

6 Likes

Introduction

Having recently finished reading “The Theory That Would Not Die”, which is a history on the evolution and applications of Bayesian statistics over the decades, I found myself particularly in awe of one of the applications during the Cold War for orchestrating search effectiveness. In 1966 a US bomber which had flown to Europe from Raleigh, North Carolina, unfortunately crashed into the waters around Palomares, Spain, after a collision during mid-air refuelling. The bomber was carrying four hydrogen bombs at the time. Three were found on land, but the fourth fell into the Mediterranean Sea.

A salvage operation was organised and after two and a half months the lost bomb was found. In order to mount the search effectively, US Naval scientist Dr. John P. Craven developed a novel mathematical approach to guide the coordination of search effort effectively.

Search effectiveness is an important factor in the success of any salvage operation. Salvage operations can be challenging, costly, and risky, depending on the depth, location, and condition of the wreckage. It is, therefore, essential to have a reliable and efficient method of locating and identifying the lost object before deploying a search.

In this blog, I will discuss how search effectiveness can be improved by using statistical methods, namely Bayesian Search Theory. By the end of this blog, you will have a better understanding of how Bayesian Search Theory works and how it can be used to estimate effective search windows when coordinating a salvage operation. In particular I use a combination of SAS code and SAS Visual Analytics to demonstrate a simulation of 250 ‘search days’ and evaluate how the search coordination evolves over time, and how the probability of success evolves. The code for implementing the search algorithm is on GitHub here.

I would also like to thank Katie Howgate, a mathematician and PhD candidate with the University of Lancaster, for her excellent blog introducing the computational steps in applying Bayesian Search Theory. With her blessing I have used her example search grid probabilities to apply Bayesian Search Theory with SAS IML. If you are interested in understanding how the probabilities are computed in more detail, I’d strongly recommend reading her original blog post found here.

What is Bayesian Search Theory?

Bayesian Search Theory, also known as Search Effectiveness Probability, is an application of mathematics and statistics that deals with the problem of finding a target in a large and uncertain space. It is based on the idea of using prior knowledge and evidence to update the probability of the target's location, and then choosing the optimal search strategy to maximise the chance of success.

The evidence in this case is not finding the lost object in a given search. We use this to update the probability of finding the object such that, after several unsuccessful attempts, we shift our search effort from one area to another by looking at the relative probabilities of finding the lost object.

At a high level, the Bayesian Search Theory is applied by:

Initially formulating beliefs on where the object might be
Assign initial probabilities to each location it might be
Assign a probability of successfully finding it, given it is actually there. For underwater salvage this might be a consideration on depth, visibility or known hazards
Combine these probabilities to give an overall probability of finding the object in a given search effort. This represents an overall likelihood of finding the object on a salvage operation. The highest probability location should be seen as the initial search location
If the search is unsuccessful this is used to update the combined probability in step 4.
Continue steps 4 & 5 to direct search effort until either the object is located or the search is abandoned

This methodology uses Bayes’ Theorem to update the probabilities. As our search continues, this new information on unsuccessful searches is used to update the probability of finding the object and consider refocusing on a different location. This approach is more efficient that just using random search, and both less costly and faster than running a full exhaustive search by optimizing the distribution of search effort.

Bayes' Theorem is a powerful tool for reasoning under uncertainty and learning from data. It can help us make better decisions and understand complex phenomena. A benefit of applying Bayes' Theorem in search effectiveness is the ability to continuously update our beliefs. Bayesian updating can be applied to any situation where we have uncertain beliefs and new information and it is a powerful tool for rational reasoning and decision making under uncertainty.

These methods have been applied to many salvage operations including the search for the USS Scorpion, Air France Flight 447 and the attempt to find Malaysia Airlines Flight 370.

Applying Search Effectiveness to an example of Maritime Salvage Operations

Let's see how this can help us with search effectiveness and salvage operations. Suppose we are looking for a sunken ship in a large area of water. Having been raised in Devon I’ve chosen to illustrate this as an example of a lost object in the waters around Plymouth, a port city known for its maritime history and ship building.

We have some prior information about where the ship might be, such as its last known location. Using this information we establish a search grid around this point. On any given day we coordinate one search operation in one of the grids of the search area. Our search zone is shown in Figure 1.

Figure 1 - Our best guess on where the object might be with search zone

We also have some domain understanding on the environment we’re searching in. We know which areas of the search will be harder given depth, visibility, etc.

Finally, we use Bayes' Theorem to update our probabilities with the new evidence (following a failed search). The posterior probability will tell us how likely it is that the ship is in each location given the evidence. We can then focus our search on the locations with high posterior probabilities and ignore the locations with low posterior probabilities. Over time, as the search progresses, the probabilities will begin to shift and we start to focus on the lower posterior regions.

Using this approach we can improve our search effectiveness by using Bayesian updating to narrow down our search space and increase our chances of finding what we are looking for.

Implementing an example of Bayesian Search Theory with SAS IML

Using PROC IML I created some initial probabilities to demonstrate the search zones and beliefs shown in Figure 1. Figure 2 shows our probabilities as matrices. Firstly, we have our initial belief on where the item was lost. Secondly, we have our belief on how likely we are to find the object in a search zone provided it is actually there. Finally, we combine these two probability tables to estimate a total probability of finding the object in a given search zone.

Figure 2 - Starting probabilities at the beginning of the search

Once the initial probabilities are created a macro is used to simulate a failed search in the area with the highest probability of success. Looking at Figure 2 we can see we would start our search at [x=3,y=3] as it has the highest probability (0.1462). We then update the tables using Bayes’ Theorem following an unsuccessful search. The updated probability of finding the object at [x=3,y=3] falls and our next highest probability zone is then searched.

Visualizing Search Coordination in SAS Visual Analytics

By simulating 250 search days we can visualize how the probability of success changes over time and where the allocation of search effort shifts too. Intuitively, we start our search closest to the last known point searching the inner four squares the most. Looking at the dashboard shown in Figure 3, we can see that Zone 6 has a relatively low combined probability of success. This reflects that it is an area which has a more challenging terrain. It therefore makes sense to not search there initially if there is an equal chance it is in Zones 7, 10 and 11. However, over time, proportionally more search effort will be allocated to Zone 6 as the other zones become exhausted.

Figure 3 - Using SAS Visual Analytics to our posterior probabilities

We can see how the probabilities shift in the animation shown in Figure 4.

Figure 4 - Animated plot of how the posterior probabilities change as the search progresses

We can see that the search spirals out from the centre over time as we become less confident that the lost object is near the last known location. Figure 5 shows how the combined probability of success decays and after August 12^th there is a sharp drop in probability of success. This is where we begin to shift the search outside of the inner squares near the last known location.

Figure 5 also shows us that probability of success continues to drop with every failed search day. Despite simulating 250 days from August 1^st, it is clear that by September 24^th the combined probability is getting very close to zero.

Figure 5 - Plot of expected probability of success by day of search operation

This shows us another advantage of using Bayesian Search Theory in that not only can we estimate where we should be concentrating our effort, but we can also use this to estimate what our optimal search window is in order to plan budget and resources.

Finally, Figure 6 shows an animated bar chart of search coordination across time. As we approach September we significantly increase the number of searches focused on Zone 6.

Figure 6 - Animated bar chart of the total number of searches by zone by day of search

Summary

We’ve seen how Bayesian Search Theory is a powerful tool for improving search effectiveness and salvage operations. Bayesian Search Theory can help us make better decisions in search and salvage operations by allowing us to incorporate new information as it becomes available, and adjust our search strategies accordingly.

We’ve also seen that Bayesian Search Theory can also help us optimise our budget and allocation of resources for search and salvage operations. By using Bayesian updating, we can estimate how likely we are to find the target in a given area, and how much it would cost to search that area. We can then compare these estimates with the expected value or benefit of finding the target, and decide whether it is worth searching that area or not. This way, we can avoid wasting time and money on areas that are unlikely to yield results, and allocate more resources to areas that are more promising.

And finally, we’ve demonstrated how easy it is to implement these principles using SAS IML and visualize our results for scenario analysis in SAS Visual Analytics.

Finding a Needle in a Haystack: Bayesian Search Theory with SAS IML

Free course: Data Literacy Essentials

Get Started