Keep on with the force don’t stop
Don’t stop ’til you get enough.
– Michael Jackson
The world is a rich place of opportunities waiting to be discovered, of actions to be taken and choices to be made.
We mere mortals have to navigate this world with partial and biased information and take decisions under uncertainty. Add to that the issue that many decisions have to be taken quick and dirty, without too much time to consider them in-depth and without having the luxury of looking at them from all sides. This is especially true if we have to act in an environment full of competitors, all pursuing their own self-interests and all struggling with similar uncertainties: be it when looking for an apartment in a popular city, looking for the dream job, or waiting until you finally meet your soulmate and get to live happily ever after.
In some sense, many of our strategies in the world can be summed up by a tradeoff between exploration and exploitation. The unknown can both offer threats and rewards. When we settle on a path through reality, we are always at the same time discarding many unexplored possibilities.
This is the tragedy of having all of our dreams and imaginations collapse into just one lived reality: when we are children, we are roaming around, free to discover and free to imagine many different versions of ourselves. After we grow up, we have to deal with the life we chose, for better or worse.
Mathematicians and computer scientists have been dealing with very similar questions, and have been working on formalizing them for decades. As an example, in the branch of artificial intelligence called reinforcement learning, the exploration-exploitation trade-off is a well-known problem.
A simple example to illustrate this is the multi-armed bandit, endowed with K levers, all offering different rewards with different probabilities.
Say you are faced with this machine and want to make as much money as possible. At every point in time, you can pull one of K levers, and after each lever-pull, you receive a reward based on a probability distribution unknown to you (we assume that you don’t know the probabilities of the individual arms for giving you rewards, but they stay constant in time). Given you have a certain amount of lever-pulls available to you, what is the optimal policy to choose that maximizes the chance of you walking away with the largest amount of money? In other words: assuming there is a lever that offers the highest reward, how much time should you spend exploring all of the levers, and at what point in time should you start exploiting the lever that you think is the most rewarding?
There are indeed optimal strategies to tackle this problem, having such beautiful names as the Adaptive epsilon-greedy strategy based on Bayesian ensembles, the intricacies of which lie beyond the scope of this article. But to bring this a little closer to everyday life, we can move on to the job market, and apply it to the famous secretary problem.
Say you are looking for a secretary, and you have 100 days scheduled for this task, at each of which you can lead one interview with a potential candidate. Making two somewhat idealized assumptions, every secretary that you interview would instantly agree to start working for you, but you need to make a decision right after each interview, and if you don’t hire the candidate, there is no turning back.
Given these conditions, what is the optimal policy to pursue in order to find the best possible secretary? The perhaps surprising result is that you should start by interviewing 37 secretaries without hiring anyone, and then continue your interviews until you find one that was better than all the previous 37 candidates.
It can be mathematically proven that this policy maximizes your chance of finding the best possible applicant, which also lies at 37 percent. Rephrasing this in terms of the exploration-exploitation dilemma, you should start out with an exploration period that lasts for 1/e (where e=2.71828… is Euler’s constant) of the total period, and in which you force yourself to not make any decision, even though some candidates might already seem perfectly suitable (this of course only makes sense if you really have 100 days to spent and no additional incentive to be done as quickly as possible, and are looking for the best possible candidate), and then after the exploratory you use your opportunity to hire the candidate that impresses you more than everyone before (exploitation).
To make a perfectly unromantic transition, the multi-headed lever and the secretary problem are a bit like dating. There are, after all, plenty of fish in the sea, but only limited time, and the pressure of justifying yourself at family gatherings increases rapidly at the end of your twenties (I have also made the experience that talking about multi-armed bandits instead doesn’t help).
You are usually not dating K partners at the same time, and dating is not quite as simple as pulling someone’s lever on the first date and checking for rewards. Dating is more of a sequential enterprise, with us meeting a person for a while and then switching to the next one if it doesn’t work out.
Some of the mathematical assumptions of the secretary problem also don’t seem to translate too well to dating at first glance. We usually don’t know for how long we will be dating in total and how many potential partners we have, and it is difficult for us to objectively rank the quality of a partner. We learn much along the way, and sometimes there is in fact a turning back (although you can ask yourself how many people you know that successfully got back together with their ex-partners).
But biologically speaking, dating and mating have evolved for a reason, and in a time before contraceptives and sparser resources, this put a certain responsibility on the question with whom to mate: as evolutionary psychologist David Buss points out, we have evolved sophisticated strategies to quickly assess potential mate value and to judge if a respective partner will have the resources and genes to increase the chance of healthy and surviving offspring, and indeed we are quite good at picking up many important often unconscious cues already in the first minutes of meeting someone new. And while now we could potentially date until we are 100, the aim of many marriages is to build a family, which puts additional biological constraints on a realistic time period we have for exploring our options.
Finding your soulmate ain’t easy, but assuming you want to be married by 32-35 (which is the average age for women and men, respectively), and start seriously dating around 18, you have some 15 years to roam around living the single life. Mathematics then teaches us that should explore dating for at least 15/e=5.3 years and then settle on the partner that was “better” than everyone else before, whatever that means. Sorry to break this to all the high-school sweethearts, but optimal stopping theory is not on your side.
This insight can have practical consequences: as Danny Kahneman explains in Thinking, Fast and Slow, human brains are prone to loss aversion. We fear loosing something much more than we are happy about winning something of equivalent value. So be it in our dating lives, on the house market or when looking for a job: many of us tend to agree to the next best thing for fear of ending up with nothing.
The pressure we put on ourselves and the fear of ending up with empty hands can frequently stand in the way of giving ourselves enough time for exploration. While this might come across as hyperrational, it has actually changed my perspective in a very useful way on several things in my life, for example when it came to taking a sabbatical and switching fields after my master’s degree (in the sense you shouldn’t necessarily settle your life on the first thing you picked at the end of your teenage years).
Mathematics shows us we shouldn’t exclusively see exploring as taking a risk, but also as a means to find places in our lifes where it might be more optimal to stop.
And who knows, if we play by this rule we might even end up with our soulmate, and live (with 37% probability) happily ever after.
About the Author
Manuel Brenner studied Physics at the University of Heidelberg and is now pursuing his PhD in Theoretical Neuroscience at the Central Institute for Mental Health in Mannheim at the intersection between AI, Neuroscience and Mental Health. He is head of Content Creation for ACIT, for which he hosts the ACIT Science Podcast. He is interested in music, photography, chess, meditation, cooking, and many other things. Connect with him on LinkedIn at https://www.linkedin.com/in/manuel-brenner-772261191/