Simple Heuristics That Make Us Smart

I have recommended Gerd Gigerenzer, Peter Todd and the ABC Research Group’s  Simple Heuristics That Make Us Smart enough times on this blog that I figured it was time to post a synopsis or review.

After re-reading it for the first time in five or so years, this book will still be high on my recommended reading list. It provides a nice contrast to the increasing use of complex machine learning algorithms for decision making, although it is that same increasing use that makes some parts of the book are seem a touch dated.

The crux of the book is that much human (or other animal) decision making is based on fast and frugal heuristics. These heuristics are fast in that they do not rely on heavy computation, and frugal in that they only search for or use some of the available information.

Importantly, fast and frugal heuristics do not simply trade-off speed for accuracy. They can be both fast and accurate as the tradeoff is between generality versus specificity. The simplicity of fast and frugal heuristics allows them to be robust in the face of environmental change and generalise well to new situations, leading to more accurate predictions for new data than a complex, information-guzzling strategy. The heuristics avoid the problem of overfitting as they don’t assume every detail to be of utmost relevance, and tend to ignore the noise in many cues by looking for the cues that swamp all others.

These fast and frugal heuristics often fail the test of logical coherence, a point often made in the heuristics and biases program kicked off by Kahneman and Tversky. But as Gigerenzer and Todd argue in the opening chapter, pursuing rationality of this nature as an ideal is misguided, as many of our forms of reasoning are powerful and accurate despite not being logically coherent. The function of heuristics is not to be coherent. Their function is to make reasonable adaptive inference with limited time and knowledge.

As a result, Gigerenzer and Todd argue that we should replace the coherence criteria with an assessment of real-world functionality. Heuristics are the way the mind takes advantage of the structure of the environment. They are not unreliable aids used by humans despite their inferior performance.

This assessment of the real-world functionality is also not a general assessment. Heuristics will tend to be domain specific solutions, which means that “ecological rationality” is not simply a feature of the heuristic, but a result of the interaction between the heuristic and the environment.

Bounded rationality

If you have read much Gigerenzer you will have seen his desire to make clear what bounded rationality actually is.

Bounded rationality is often equated with decision making under constraints (particularly in economics). Instead of having perfect foresight, information must be obtained through search. Search is conducted until the costs of search balance the benefits of the additional information.

One of the themes of the first chapter is mocking the idea that decision making under constraints brings us closer to a model of human decision making. Gigerenzer and Todd draw on the example of Charles Darwin, who created a list of the pros and cons of marriage to assist his decision. This unconstrained optimisation problem is difficult. How do you balance children and the charms of female chit chat against the conversation of clever men at clubs?

But suppose a constrained Darwin is starting this list from scratch. He already has two reasons for marriage. Should he try to find another? To understand whether he should continue his search he effectively needs to know the costs and benefits of all the possible third options and understand how each would affect his final decision. He effectively needs to know and consider more than the unconstrained man. You could even go the next order of consideration and look at the costs and benefits of all the cost and benefit calculations, and so on. Infinite regress.

So rather than bounded rationality being decision making under constraints, Gigerenzer argues for something closer to Herbert Simon’s conception, where bounded rationality is effectively adaptive decision making. The mind is computationally constrained, and uses approximations to achieve most tasks as optimal solutions often do not exist or are not tractable (think the relatively simple world of chess). The effectiveness of this approximation is then assessed in the environment in which the mind makes the decisions, resulting in what Gigerenzer terms the “ecological rationality” of the decision.

The recognition heuristic

The first fast and frugal heuristic to be examined in detail in the book is the recognition heuristic. Goldstein and Gigerenzer (the authors of that chapter) define the  recognition heuristic as “If one of two objects is recognized and the other is not, then infer that the recognized object has the higher value.”

The recognition heuristic is frugal as it requires a lack of knowledge to work – a failure to recognise one of the alternatives. The lack of computation required to apply it points to its speed. Goldstein and Gigerenzer argue that the recognition heuristic is a good model for how people actually choose, and present evidence that it is often applied despite conflicting or additional information being available.

Recognition is different from the concept of “availability” developed by Tversky and Kahneman. The availability heuristic works by drawing on the most immediate or recent examples when making an evaluation.  Availability refers to the availability of terms or concepts in memory, whereas recognition relies on the differences between things in and out of memory.

As an example application (and success) of the recognition heuristic, American and German students were asked to compare pairs of German or American cities and select the larger. American students comparing pairs of American cities did worse than Germans on those same American cities – the Americans knew too much to apply the recognition heuristic. The Americans do as well comparing less familiar German cities as they do American cities.

The success of the recognition heuristic results in what could be described as a “less is more” effect. There are situations where decisions based on missing information can be more accurate than those made with more knowledge. There is information implicit in the failure to recognise something.

A second chapter on the recognition heuristic by Borges and friends involves the authors using the recognition heuristic to guide their stock market purchases. They surveyed US and German experts and laypeople about US and German shares and invested based on those that were recognised.

Overall, the authors’ returns beat the aggregate market indices. A German share portfolio based on the recognition of any of the US and German experts or US and German laypeople outperformed the market indices, as did the US stock portfolio based on recognition by Germans. The only group for which recognition delivered lower returns was the US portfolio based on US expert or layperson recognition.

Borges and friends did note that this was a one-off experiment in a bull market, so there is a question of whether it would generalise to other market conditions (or even if it was more than a stroke of luck). But the next chapter took the question of the robustness of simple heuristics somewhat more seriously.

The competition

One of the more interesting chapters in the book is a contest across a terrain of 20 datasets between a fast and frugal heuristic, “take-the-best”, and a couple of other approaches, including the more computationally intensive multiple linear regression. In each of these 20 contests, the competitors were tasked with selecting for all pairs of options which has the highest value. This includes predicting which of two schools had the highest drop out rates, which stretches of highway had the highest accident rates, or which people had the highest body fat percentage.

The take-the-best heuristic works as follows: Choose the cue most likely to distinguish correctly between the two. If the two choices differ on that cue, select the one with the highest value, and end the search. If they are the same, move to the cue with the next highest validity and repeat.

For example, suppose you are comparing the size of two German cities and the best predictor (cue) of size is whether they are a capital city. If neither is a capital city, you then move to the next best cue of whether they have a soccer team. If one does and the other doesn’t, select the city with the soccer team as being the larger.

The general story is that in terms of fitting the full dataset, take-the-best performs well but is narrowly beaten by multiple regression (75% to 77% – although multiple regression was only fed cue direction, not quantitative variables). The closeness across the range of datasets suggests that the power of take the best is not just restricted to one environment.

The story changes more in favour of take-the-best when the assessment shifts to prediction out-of-sample, with multiple regression suffering a severe penalty. Regression accuracy dropped to 68%, whereas take-the-best dropped less to 71%.

There was a model in the competition – the minimalist – which only considered a randomly chosen cue and seeing if it points in one direction or the other. If so, select that choice, otherwise select another cue. The performance of the minimalist suggested frugality can be pushed too far, although it did perform only 3 percentage points below regression in out-of-sample prediction.

The results of the challenge suggests that take-the-best tends not to sacrifice accuracy for its frugality. The relative performance of take-the-best is particularly strong when there is a low number of training examples, with regression having less chance of overfitting in larger environments. Regression tended to perform relatively worse when there were less examples per cue. One point that favoured take-the-best is that the trial didn’t have many large environments. Only two had more than 100 examples, and many had between 10 and 30.

The restriction of regression to use cue direction rather than the quantitative variable also dampened its effectiveness. If able to use quantitative predictors, regression tied take the best on 76% out of sample, even though take-the-best doesn’t use these quantitative values. There was effectively no penalty for the frugality.

A later chapter added to the competition computationally expensive Bayesian models. Bayesians networks won the competition on out-of-sample testing by three percentage points over take-the-best. Again, take-the-best did best relatively when there were small numbers of examples. The more frugal naive Bayes also did pretty well – falling somewhere between the two approaches.

The results suggest that each approach has its place. Use fast and frugal approaches when you need to be quick with low numbers of examples, and use Bayesian approaches when have time, computational power and knowledge. This is where some of the examples start to feel dated when the size of the datasets in many domains is rapidly growing in combination with cheaper computational power.

This dated feel is even more apparent in the competition between another heuristic, categorisation by elimination, and neural networks across 3 datasets.

Categorisation by elimination is a classification algorithm that walks through examples and cues, starting from the cue with the highest probability of success. If the example can be categorised, categorise it and move to the next example. If not, move to the next cue, with possible categories limited to those possible given earlier cues. Repeat until classified.

In measured performance, categorisation by elimination was only a few percentage points behind neural networks, although the datasets contained only 150, 178 and 8124 examples. The performance of neural networks also capped out at 100% on the largest mushroom dataset (not bad when picking what should eat and consequences) and 94 and 96% on the other two. There wasn’t much room for a larger victory.

A couple of the chapters are also just a touch too keen to show the effectiveness of the simple heuristics. This was one such case. An additional competition was run giving neural networks only a limited number of cues, in which case its performance plunges. But these cues were chosen based on the number of cues used by categorisation by elimination, rather than a random selection.

The 37% rule

One interesting chapter is on the “secretary problem” and the resulting 37% rule. The basic idea is that you have a series of candidates you are interviewing for the role of secretary (this conception of the problem spread in the 1950s). You view each candidate one by one and must decide on the spot if you will stop your search there and hire the candidate in front of you. If you move to the next candidate, the past candidate is gone forever.

To maximise your probability of finding the best secretary, you should view 37% of the candidates without making any choice, and then accept the next candidate who is better than all you have seen to date. This rule gives (coincidentally) a 37% chance of ending up with the best mate.

But this rule is not without risks. If the best candidate was in that first 37%, you will end up with the last person you see, effectively a random person from the population. So there is effectively a 37% chance of a random choice. Because of that random choice, the 37% rule leaves you with a 9% chance you will end up with someone in the bottom quartile.

But what if, like most people, you have a degree of risk aversion – particularly if you are applying the rule to serious questions such as mate choice. Suppose there are 100 candidates and you want someone out of the top 10%. In that case you only want to look at the first 14% of candidates and choose the next candidate who is better than all previous candidates. That gives you an 83% chance of a top 10% candidate. If you will settle for the top 25%, you only need look at the first 7% for a 92% chance of getting someone in the top quartile.

In larger populations, you need to look at even less. With 1000 people, you need only look at only 3% of the candidates to maximise chance of top 10% at 97% probability. For a top 25% mate, you should only check out 1 to 2%.

The net result is that the 37% rule sets aspirations too high unless you will settle for nothing but the best. It is less robust than other rules.

This exploration points to the potential for a simple search heuristic. Try a dozen will generally outperform the 37% rule across most population sizes for getting a good but not perfect mate. Try a few dozen is a great rule for someone in New York who wants close to the best.

Then there is the issue that the success of the 37% rule depends on your own value. On finding the mate you will finally propose to, what is the probability that the two-sided choice will end up with them saying yes? In domains such as mate choice, only one or two people could get away with applying that rule – and that leads to a whole new range of considerations.

Odds and ends

The book is generally interesting throughout. Here are a few odds and ends:

  • One chapter argues that the hindsight bias is the product of fast and frugal approach to recalling decisions. We update knowledge when it is received. If we cannot recall the original decision, we can approximate it by going through the same process as used to generate the decision last time. But if we have updated our knowledge, we get a new answer.
  • As mentioned, some chapters are a bit out of date. One chapter is on using heuristics to predict intention from motion. I expect neural networks will likely be in another league on domains such as this compared to when the book was written.
  • Another chapter is on investment in offspring. Heuristics such as invest in the oldest do almost as well as the optimal investment rules developed by Becker, despite their lack of relative complexity. The best rule for a particular time will depend on the harshness of the environment.

Domingos’s The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

My view of Pedro Domingos’s The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World depends on which part of the book I am thinking about.

The opening and the close of the book verge on techno-Panglossianism. The five chapters on the various “tribes” of machine learning, plus the chapter on learning without supervision, are excellent. And I simply don’t have the knowledge to judge the value of Domingos’s reports on his own progress to the master algorithm.

Before getting to the details, The Master Algorithm is a book on machine learning. Machine learning involves the development of algorithms that can learn from data. Domingos describes it as computers programming themselves, but I would prefer to describe it as humans engaging in a higher level of programming. Give the computer some data and the objective, provide a framework for developing the solution (each of Domingos’s tribes has a different approach to this), and let the computer develop it.

Machine learning’s value is becoming more apparent with increasing numbers of problems involving “big data” and mountains of variables that cannot be feasibly be incorporated into explicitly designed programs. Tasks such as predicting the tastes of Amazon’s customers or deciding which updates to show each Facebook user are effectively intractable given the millions of choices available. In response, the Facebooks and Amazons of the world are designing learning algorithms that can use the massive amounts of data available to attempt to determine what their customers or users want.

Similarly, explicitly programming a self-driving car for every possible scenario is not feasible. But train it on massive amounts of data and it can learn to drive itself.

The master algorithm of the book’s title is a learning algorithm that can be used across all domains. Today there are five tribes (as categorised by Domingos), each with their own master algorithm. The ultimate master algorithm combines them into a general purpose learning machine.

The first tribe, the symbolists, believe in the power of logic. Their algorithms build sets of rules that can classify the examples in front of it. Induction, or as Domingos notes, inverse deduction, can be used to generate further rules to fill in the gaps.

To give the flavour of this approach, suppose you are trying to find the conditions under which certain genes are expressed. You run a series of experiments and your algorithm generates an initial set of rules from the results.

If the temperature is high, gene A is expressed.

If the temperature is high, genes B and D are not expressed.

If gene C is expressed, gene D is not.

Gaps in these rules can then be filled in by inverse deduction. From the above, the algorithm might induce If gene A is expressed and gene B is not, gene C is expressed. This could then be tested in experiments and possibly form the basis for further inductions. These rules are then applied to new examples to predict whether the gene will be expressed or not.

One tool in the symbolist toolbox is the decision tree. Start at the first rule, and go down the branch pointed to by the answer. Keep going until you reach the end of a branch. Considering massive bodies of rules together is computationally intensive, but the decision tree saves on this by ordering the rules and going through them one-by-one until you get the class. (This also solves the problem of conflicting rules.)

The second tribe are the connectionists. The connectionists take their inspiration from the workings of the brain. Similar to the way that connections between neurons in our brain are shaped by experience, the connectionists build a model of neurons and connect them in a network. The strength of the connections between the neurons is then determined by training on the data.

Of the tribes, the connectionists could be considered to be in the ascendency at the moment. Increases in computational power and data have laid the foundations for the success of their deep learning algorithms – effectively stacks or chains of connectionist networks – in applications such as image recognition, natural language processing and driving cars.

The third tribe are the evolutionaries, who use the greatest algorithm on earth as their inspiration. The evolutionaries test learning algorithms by their “fitness”, a scoring function as to how well the algorithm meets its purpose. The fitter algorithms are more likely to live. The successful algorithms are then mutated and recombined (sex) to produce new algorithms that can continue the competition for survival. Eventually an algorithm will find a fitness peak where further mutations or recombination do not increase the algorithm’s success.

A major contrast with the connectionists is the nature of evolutionary progress. Neural networks start with a predetermined structure. Genetic algorithms can learn their structure (although a general form would be specified). Backpropogation, the staple process by which neural networks are trained, starts from an initial random point for a single hypothesis but then proceeds deterministically in steps to the solution. A genetic algorithm has a sea of hypotheses competing at any one moment, with the randomness of mutation and sex potentially producing big jumps at any point, but also generating many useless algorithm children.

The fourth tribe are the Bayesians. The Bayesian’s start with a set of hypotheses that could be used to explain the data, each of which has a probability of being true (their ‘priors’). Those hypotheses are then tested against the data, with those hypotheses that better explain the data increasing in their probability of being true, and those that can’t decreasing in their probability. This updating of the probability is done through Bayes’ Rule. The effective result of this approach is that there is always a degree of uncertainty – although often the uncertainty relating to improbable hypotheses is negligible.

This Bayesian approach is typically implemented through Bayesian networks, which are arrangements of events that each have specified probabilities and conditional probabilities (the probability that an event will occur conditional on another event or set of events occurring). To prevent explosions in the number of probability combinations required to specify a network, assumptions about the degree of independence between events are typically made. Despite these possibly unrealistic assumptions, Bayesian networks can still be quite powerful.

The fifth and final tribe are the analogisers, who, as the name suggests, reason by analogy. Domingos suggests this is perhaps the loosest tribe, and some members might object to being grouped together, but he suggests their common reliance on similarity justifies their common banner.

The two dominant approaches in this tribe are nearest neighbour and support vector machines. Domingos describes nearest neighbour as a lazy learner, in that there is no learning process. The work occurs when a new test example arrives and it needs to be compared across all existing examples for similarity. Each data point (or group of data points for k-nearest neighbour) is its own classifier, in that the new example is classified into the same class as that nearest neighbour. Nearest neighbour is particularly useful in recommender systems such as those run by the Netflixes and Amazons of the world.

Support vector machines are a demonstration of the effectiveness of gratuitously complex models. Support vector machines classify examples by developing boundaries between the positive and negative examples, with a specified “margin” of safety between the examples. They do this by mapping the points into a hyper-dimensional space and developing boundaries that are straight lines. The examples along the margin are the “support vectors”.

Of Domingos’s tribes, I feel a degree of connection to them all. Simple decision trees can be powerful decision tools, despite their simplicity (or possibly because of it). It is hard not to admire the progress of the connectionists in recent years in not just technical improvement but also practical applications in areas such as medical imaging and driverless cars. Everyone seems to be a Bayesian nowadays (or wants to be), including me. And having played around with support vector machines a bit, I’m both impressed and perplexed by their potential.

From a machine learning perspective, it is the evolutionaries I feel possibly the least connection with. Despite my interest and background in evolutionary biology, it’s the one group I haven’t seen practically applied in any of the domains I operate. I’ve read a few John Holland books and articles (Holland being one of the main protagonists in the evolutionary chapter) and always appreciate the ideas, but have never felt close to the applications.

Outside of the chapters on the five tribes, Domingos’s Panglossianism grates, but it is relatively contained to the opening and closing of the book. In Domingos’s view, the master algorithm will make stock market crashes fewer and smaller, and the play of our personal algorithms with everyone else’s will make our lives happier, longer and more productive. Every job will be better than it is today. Democracy will work better because of higher bandwidth communication between voters and politicians.

But Domingos’s gives little thoughts to what occurs where people have different algorithms, different objectives, different data they have trained their algorithm on and, in effect, different beliefs. Little thought is given to the complex high-speed interaction of these algorithms.

There are a few other interesting threads in the books worth highlighting. One is the idea that you need bias to learn. If you don’t have preconceived notions of the world, you could conceive of a world where everything you haven’t seen is the opposite of what you predict (known as the ‘No free lunch theorem’).

Another is the idea that once computers get to a certain level of advancement, the work of scientists will largely be trying to understand the outputs of computers rather than generate the outputs themselves.

So all up, a pretty good read. For a snapshot of the book, the Econtalk episode featuring Domingos is (as usual) excellent.

Coursera’s Executive Data Science Specialisation: A Review

As my day job has shifted toward a statistics and data science focus, I’ve been reviewing a lot of online materials to get a feel for what is available – both for my learning and to see what might be good training for others.

One course I went through was Coursera’s Executive Data Science Specialisation, created by John Hopkins University. Billed as the qualification to allow you to run a data science team, it is made up of five “one week” courses covering the basics of data science, building data science teams and managing data analysis processes.

There are some goods parts to the courses, but unlike the tagline that you will learn what you need to know “to begin assembling and leading a data science enterprise”, it’s some way short of that benchmark. For managers who have data scientists sitting under them, or who use a data science team in their organisation, it might give them a sense of what is possible and an understanding of how data scientists think. But it is not much more than that.

If I were to recommend any part of the specialisation, it would be the third and fourth courses – Managing Data Analysis and Data Science in Real Life (notes below). They offer a better crash course in data science than the first unit, A Crash Course in Data Science, and might help those unfamiliar with data science processes to understand how to think about statistical problems. That said, someone doing them with zero statistical knowledge will likely find themselves lost.

With Coursera’s subscription option you can subscribe to the specialisation for $50 or so per month, and smash through all five units in a few days (as I did, and you could do it in one day if you had nothing else on). From that perspective, it’s not bad value – although the only material change through paying versus auditing is the ability to submit the multiple choice quizzes. Otherwise, just pick videos that look interesting.

Here’s a few notes on the five courses:

  1. A Crash Course in Data Science: Not bad, but likely too shallow to give someone much feeling about data science. The later units provide a better crash course for managers as they focus on methodology and practice rather than techniques.
  1. Building a Data Science Team: Some interesting thoughts on the skills required in a team, but the material on managing teams and communication was generic.
  1. Managing Data Analysis: A good crash course in data science – better than the course with that title. Walks through the data science process.
  1. Data Science in Real Life: Another good crash course in data science, although you will likely need some statistical background to fully benefit. A reality check on how the data science process is likely to go relative to the perfect scenario.
  1. Executive Data Science Capstone: You appreciate the effort that went into producing an interactive “choose your own adventure”, but the entire effort was around half a dozen decisions in less than an hour.

Christian and Griffiths’s Algorithms to Live By: The Computer Science of Human Decisions

christianIn a sea of books describing a competition between perfectly rational decision makers and biased humans who make systematic errors in the way they decide, Brian Christian and Tom Griffiths’s Algorithms to Live By: The Computer Science of Human Decisions provides a nice contrast.

Christian and Griffiths’s decision-making benchmarks are the algorithms developed by mathematicians, computer scientists and their friends. In that world, decision making under uncertainty involves major trade-offs between efficiency, accuracy and the types of errors you are willing to accept. As they write:

The solutions to everyday problems that come from computer science tell a different story about the human mind. Life is full of problems that are, quite simply, hard. And the mistakes made by people often say more about the intrinsic difficulties of the problem than about the fallibility of human brains. Thinking algorithmically about the world, learning about the fundamental structures of the problems we face and about the properties of their solutions, can help us see how good we actually are, and better understand the errors that we make.

Even where perfect algorithms haven’t been found, however, the battle between generations of computer scientists and the most intractable real-world problems has yielded a series of insights. These hard-won precepts are at odds with our intuitions about rationality, and they don’t sound anything like the narrow prescriptions of a mathematician trying to force the world into clean, formal lines. They say: Don’t always consider all your options. Don’t necessarily go for the outcome that seems best every time. Make a mess on occasion. Travel light. Let things wait. Trust your instincts and don’t think too long. Relax. Toss a coin. Forgive, but don’t forget. To thine own self be true.

And as they close:

The intuitive standard for rational decision-making is carefully considering all available options and taking the best one. At first glance, computers look like the paragons of this approach, grinding their way through complex computations for as long as it takes to get perfect answers. But as we’ve seen, that is an outdated picture of what computers do: it’s a luxury afforded by an easy problem. In the hard cases, the best algorithms are all about doing what makes the most sense in the least amount of time, which by no means involves giving careful consideration to every factor and pursuing every computation to the end. Life is just too complicated for that.

Here’s a few examples.

Suppose you face a choice between two uncertain options. Those options have an expected value – the most likely result. If your objective is to maximise your outcome, you pick the option with the highest expected value.

But what if you objective is to minimise regret – the feeling of pain when you look back at what you did compared to what you could have done? In that case it may be worth looking at the confidence intervals around that expected value – the plausible ranges in which the actual value could lie. Picking the option which has the highest upper confidence interval – the highest plausible value – is the rational approach, even if it has the lower expected value. It is “optimism” in the way a behavioural scientist might frame it, but for an objective of minimising regret, it is rational.


Or consider memory. From a computer science perspective, memory is often not a question of storage but of organisation – particularly in today’s world of cheap storage. How does a computer predict which items it will want from its memory in the future such that they are accessible within a reasonable time? Faced with that problem, it makes sense to forget things. In particular, it is often useful to forget things with time – those items least recently used. The human mind mimics this strategy, as more recently used items are more likely to be used in the future. It is too expensive to maintain access to an unbounded number of items.

One chapter of the book covers the idea of “less is more”, which you may be familiar if you know the work of Gerd Gigerenzer and friends. The idea behind “less is more” it that it is often rational to ignore information in making decisions to prevent “overfitting”. Overfitting is an over-sensitivity to the observed data in developing a model. The inclusion of every detail helps the model match the observed data, but prevents generalisation to new situations and predictions based on new data lack reliability.

To avoid overfitting you might deliberately exclude certain factors, impose penalties for including factors in analysis, or stop the analysis early. There is no shortage of computer science or machine learning applications where these types of strategies, often employed by humans in their decision making, can result in better decisions.

Christian and Griffiths suggest that evolution tends not to overfit as it is constrained by existing infrastructure and time – features of the environment need some degree of persistence before adaptations to that environment spread, preventing major changes in response to short-term phenomena. Preventing overfitting is also a benefit of a conservative bias in society – preventing us getting caught up in the boom and bust of fads.

There are times in the book where Christian and Griffiths jump too far from experiment or algorithm to real world application. As an example, they suggest that analysis of a routing tells us not to try to control traffic congestion using a top down coordinator, as the selfish solution is only 33% worse than best case top down coordination. They give little thought to whether congestion has more dimensions of control than just routing. The prisoner’s dilemma chapter also seemed shallow at points – possibly reflecting that it is the area for which I already had the most understanding.

But those are small niggles about an otherwise excellent book.

Best books I read in 2016

The best books I read in 2016 – generally released in other years – are below (in no particular order). For the non-fiction books, the links lead to my reviews.

henrichJoe Henrich’s The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter (2015): A lot of interesting ideas, but left me with a lot of questions.
jonesGarett Jones’s Hive Mind: How Your Nation’s IQ Matters So Much More Than Your Own (2015): A fantastic exposition of some important but neglected features of the world.
rosenzweigPhil Rosenzweig’s Left Brain, Right Stuff: How Leaders Make Winning Decisions (2014): An entertaining examination of how behavioural economics findings hold up for real world decision making.
EPJPhilip Tetlock’s Expert Political Judgment: How Good Is It? How Can We Know? (2006): I re-read this before reading Tetlock’s also good Superforecasting, but Expert Political Judgement is the superior of the two books.
lastJonathan Last’s What to Expect When No One’s Expecting: America’s Coming Demographic Disaster (2014): Much to disagree or argue with, but entertaining and a lot to like.
Other Peoples MoneyJohn Kay’s Other People’s Money (2015): A fantastic critique of the modern financial system and regulation.
dostoevskyFyodor Dostoevsky’s Crime and Punishment: The classic I most enjoyed.

Newport’s So Good They Can’t Ignore You: Why Skills Trump Passion in the Quest for Work You Love

newportI suspect I would have enjoyed Cal Newport’s So Good They Can’t Ignore You more if it had been written by a grumpy armchair economist. Newport’s advice is just what you would expect that economist to give:

  • Get good at what you do (build human capital), then someone might be willing to pay you for it. If you simply follow your passion but you don’t offer anything of value, you likely won’t succeed.
  • If you become valuable, you might be able to leverage that value into control of your career and a mission. Control without value is dangerous – ask anyone who tries to set up their own business or passive income website without having something that people are willing to pay for.

Since we have Newport’s version and not the grumpy economist’s, the advice is framed somewhat less bluntly and Newport tells us a series of stories about people who became good (through deliberate practice) and leveraged that skill into a great career. It’s not easy to connect with many of the examples – TV hosts, script writers, organic farmers, a programmer working at the boundary of programming and music – but I suppose they are more interesting than stories of those in dreary jobs who simply bite the bullet, skill up and get promoted.

In TED / self-help style, Newport introduces us to a new set of buzzwords (“career capital”, “craftsman mindset” etc.) and “laws”. I’m glad Newport independently discovered of the “law of financial viability” –  do what people are willing to pay for – but at many points of the book we are left witnessing a battle between common sense and “conventional wisdom” rather than the discovery of new deep insights.

One piece of advice that the economist might not have given was how to find a mission. Newport’s advice is that you should become so skilled that you are the frontier of your field. You then might be able to see new possibilities in the “adjacent possible” that you can turn into a mission. And not only does the approach need to be cutting edge, it should also be “remarkable”, defined as being so compelling that people remark on it and it can be launched in a venue that compels remarking (luckily Newport has the venue of peer review….). I suspect this might be interesting advice for a few people, but I suspect not a lot of help for the average person stuck behind a desk.

Despite entering the book with a high degree of mood affiliation – I believe the basic argument is right – there was little in the book that convinced me either way. The storytelling and buzzwords were accompanied by little data. Threads such as those on the 10,000 hours rule and the unimportance of innate ability were somewhat off-putting.

That said, some points were usefully framed. Entering the workplace expecting to follow your passion will likely lead to chronic unhappiness and job shifting. Instead, suck it up and get good. There are a lot of books and blogs encouraging you to follow your passion, and most of them are garbage. So if more people follow Newport’s fluffed up way of giving some basic economic advice, that seems like a good thing.

Rosenzweig’s Left Brain, Right Stuff: How Leaders Make Winning Decisions

rosenzweigI was triggered to write my recent posts on overconfidence and the illusion of control – pointing to doubts about the pervasiveness of these “biases” – by Phil Rosenzweig’s entertaining Left Brain, Right Stuff: How Leaders Make Winning Decisions. Some of the value of Rosenzweig’s book comes from his examination of some classic behavioural findings, as those recent posts show. But much of Rosenzweig’s major point concerns the application of behavioural findings to real-world decision making.

Rosenzweig’s starting point is that laboratory experiments have greatly added to our understanding about how people make decisions. By carefully controlling the setup, we are able to focus on individual factors affecting decisions and tease out where decision making might go wrong (replication crisis notwithstanding). One result of this body of work is the famous catalogue of heuristics and biases where we depart from the model of the perfectly rational decision maker.

Some of this work has been applied with good results to areas such as public policy, finance or forecasting political and economic events. Predictable errors in how people make decisions have been demonstrated, and in some cases substantial changes in behaviour have been generated by changing the decision environment.

But as Rosenzweig argues  – and this is the punchline of the book – this research does not easily translate across to many areas of decision-making. Laboratory experiments typically involve choices from options that cannot be influenced, involve absolute payoffs, provide quick feedback, and are made by individuals rather than leaders. Change any of these elements, and crude applications of the laboratory findings to the outside world can go wrong. In particular, we should be careful not to compare predictions of an event with scenarios where we can influence the outcomes and will be in competition with others.

Let’s take the first, whether outcomes can be influenced. Professional golfers believe they sink around 70 per cent of their 6 foot putts, compared to an actual success rate closer to 55 per cent. This is typically labelled as overconfidence and an error (although see my recent post on overconfidence).

Now, is this irrational? Not necessarily suggests Rosenzweig, as the holder of the belief can influence the outcome. Thinking you are better at sinking six-foot putts than you actually are will increase the chance that you will.

In one experiment, participants putted toward a hole that was made to look bigger or smaller by using lighting to create an optical illusion. Putting from a little less than six feet, the (amateur) participants sank almost twice as many putts when putting toward the larger looking hole. They were more likely to sink the putts when it appeared an easier task.

This points to the question of whether we want to ward off biases. Debiasing might be good practice if you can’t influence the outcome, but if it’s up to you to make something happen, that “bias” might be an important part of making it happen.

More broadly, there is evidence that positive illusions allow us to take action, cope with adversity and persevere in the face of competition. Positive people have more friends and stronger social bonds, suggesting a “healthy” person is not necessarily someone who sees the world exactly as it is.

Confidence may also be required to lead people. If confidence is required to inspire others to succeed, it may be necessary rather than excessive. As Rosenzweig notes, getting people to believe they can perform is the supreme act of leadership.

A similar story about the application of laboratory findings is the difference between relative and absolute payoffs. If the competition is relative, playing it safe may be guaranteed failure. The person who comes out ahead will almost always be the one who takes the bigger risk, meaning that an exaggerated level of confidence may be essential to operate in some areas – although as Rosenzweig argues, the “excessive” risk may be calculated.

One section of the book focuses on people starting new ventures. With massive failure rates – around 50% failure after five years (depending on your study) – it is common for entrepreneurs to be said to be overconfident or naive. Sometimes their “reckless ambition” and “blind faith” is praised as necessary for the broader economic benefits that flow from new business formation. (We rarely hear people lamenting we aren’t starting enough businesses).

Rosenzweig points evidence that calls this view into question – from the evidence of entrepreneurs as persistent tinkerers rather than bold arrogant visionaries, to the constrained losses they incur in event of failure. While there are certainly some wildly overconfident entrepreneurs, closure of their business should not always be taken as failure and overconfidence as the cause. There are many types of errors – calculation, memory, motor skills, tactics etc. – and even good decisions sometimes turn out badly. Plus, as many as 92% firms close with no debt – 25% with a profit.

Rosenzweig also notes evidence that, at least in an experimental setting, entrepreneurs enter at less than optimal rates. As noted in my recent post on overconfidence, people tend to overplace themselves relative to the rest of the population for easy tasks (e.g. most drivers believe they are above average). But for hard tasks, they underplace. In experiments by Don Moore and friends on firm entry, they found a similar effect – excess entry when the industry appeared an easy one in which to compete, but too few entered when it appeared difficult. Hubristic entrepreneurs didn’t flood into all areas, and myopia about one’s own and competing firms’ abilities appears a better explanation for what is occurring than being the result of the actions of overconfident entrepreneurs.

There is the occasional part of the book that falls flat with me – the section on the limitations of mathematical models and some of the story telling around massive one-off decisions – but it’s generally a fine book.

* Listen to Russ Roberts interview Rosenzweig on Econtalk for a summary of some of the themes from the book.

The illusion of the illusion of control

In the spirit of my recent post on overconfidence, the illusion of control is another “bias” where imperfect information might be a better explanation for what is occurring.

The illusion of control is a common finding in psychology that people believe they can control things that they cannot. People would prefer to pick their lottery numbers than have them randomly allocated – being willing to even pay for the privilege. In laboratory games, people often report having control over outcomes that were randomly generated.

This effect was labelled by Ellen Langer as the illusion of control (for an interesting read about Langer’s other work, see here). The decision making advice that naturally flows out of this – and you will find in plenty of books building on the illusion of control literature – is that we need to recognise that we can control less than we think. Luck plays a larger role than we believe.

But when you ask about people’s control of random events, which is the typical experimental setup in this literature, you can only get errors in one direction – the belief that they have more control than they actually do. It is not possible to believe you have less than no control.

So what do people believe in situations where they do have some control?

In Left Brain, Right Stuff, Phil Rosenzweig reports on research (pdf) by Francesca Gino, Zachariah Sharek and Don Moore in which people have varying degrees of control over whether clicking a mouse would change the colour of the screen. For those that had no or little control (clicking the mouse worked 0% or 15% of the time), the participants tended to believe they had more control than they did – an illusion of control.

But when it came to those who had high control (clicking the mouse worked 85% of the time), they believed they had less control than they did. Rather than having an illusion of control, they failed to recognise the degree of control that they had. The one point where there was accurate calibration was when there was 100% control.

The net finding of this and other experiments is that we don’t systematically have an illusion on control. Rather, we have imperfect information about our level of control. When low, we tend to overestimate. When high (but not perfect), we tend to underestimate.

That the illusion of control was previously seen to be largely acting in one direction was due to experimental design. When people have no control and can only err in one way, that is naturally what will be found. Gino and friends term this problem as the illusion of the illusion of control.

So when it comes to decision making advice, we need to be aware of the context. If someone is picking stocks or something of that nature, the illusion of control is not helpful. But in their day-to-day life where they have influence over many outcomes, underestimating control could be a serious error.

Should we be warning against underestimating control? If we were to err consistently in one direction, it is not clear to me that having an illusion of control is of greater concern. Maybe we should err on the side of believing we can get things done.

*As an aside, there is a failed replication (pdf) of one of Langer’s 1975 experiments from the paper for which the illusion is named.

Overconfident about overconfidence

In 1995 Werner De Bondt and Richard Thaler wrote “Perhaps the most robust finding in the psychology of judgment and choice is that people are overconfident.” They are hardly been alone in making such a proclamation. And looking at the evidence, they seem to have a case. Take the following examples:

  • When asked to estimate the length of the Nile by providing a range the respondent is 90% sure contains the correct answer, the estimate typically contains the correct answer only 50% of the time.
  • PGA golfers typically believe they sink around 75% of 6 foot putts – some even believe they sink as many as 85% – when the average is closer to 55%.
  • 93% of American drivers rate themselves as better than average. 25% of high school seniors believe they are in the top 1% in ability to get along with others.

There is a mountain of similar examples, all seemingly making the case that people are generally overconfident. 

But despite all being labelled as showing overconfidence, these examples are actually quite different. As pointed out by Don Moore and Paul Healy in “The Trouble with Overconfidence” (pdf), several different phenomena are being captured. Following Moore and Healy, let’s call them overprecision, overestimation and overplacement.

Overprecision is the tendency to believe that our predictions or estimates are more accurate than they actually are. The typical study seeking to show overprecision asks for someone to give confidence ranges for their estimates, such as estimating the length of the Nile. The evidence that we are overprecise is relatively robust (although I have to admit I haven’t seen any tests asking for 10% confidence intervals).

Overestimation is the belief that we can perform at a level beyond that which we realistically can (I tend to think of this as overoptimism). The evidence here is more mixed. When attempting a difficult task such as a six foot putt, we typically overestimate. But on easy tasks, the opposite is often the case – we tend underestimate our performance. Whether over or underestimation occurs depends upon the domain.

Overplacement is the erroneous relative judgment that we are better than others. Obviously, we cannot all be better than average. But this relative judgment, like overestimation, tends to vary with task difficulty. For easy tasks, such as driving a car, we overplace and consider ourselves better than most. But as Phil Rosenzweig points out in his book Left Brain, Right Stuff (which contains a great summary of Moore and Healy’s paper), ask people where they rate for a skill such as drawing, and most people will rate themselves as below average. People don’t suffer from pervasive overplacement. Whether they overplace depends on what the situation is.

You might note from the above that we tend to both underestimate and overplace our performance on easy tasks. We can also overestimate but underplace our performance on difficult tasks.

So are we both underconfident and overconfident at the same time? The blanket term of overconfidence does little justice to what is actually occurring.

Moore and Healy’s explanation for what is going on is these situations is that, after performing a task, we have imperfect information about our own performance, and even less perfect information about that of others. As Rosenzweig puts it, we are myopic, which is a better descriptor of what is going on than saying we are biased.

Consider an easy task. We do well because it is easy. But because we imperfectly assess our performance, our assessment is regressive – that is, it tends to revert to the typical level of performance. Since we have even less information about others, our assessment of them is even more regressive. The net result is we believe we performed worse than we actually did but better than others.

Rosenzweig provides a couple of more intuitive examples of myopia at work. Taking one, we know about our excellent driving record and that there are plenty of people out there who die in car accidents. With a narrow view of that information, it seems logical to place yourself above average.

But when considering whether we are an above or below average juggler, the knowledge of our own ineptitude and the knowledge of the existence of excellent jugglers makes for a myopic assessment of being below average. In one example Rosenzweig cites, 94% of students believed they would be below average in a quiz on indigenous Amazon vegetation – hardly a tendency for overplacement, but rather the result of myopic consideration of the outcomes from a difficult task.

The conflation of these different effects under the umbrella of overconfidence often plays out in stories of how overconfidence (rarely assessed before the fact) led to someone’s fall. Evidence that people tend to believe they are better drivers than average (overplacement) is not evidence that overconfidence led someone to pursue a disastrous corporate merger (overestimation). Evidence that people tend to be overprecise in estimating the year of Mozart’s birth is not evidence that hubris led the US into the Bay of Pigs fiasco.

Putting this together, the claims we are systematically overconfident can be somewhat overblown and misapplied. I am not sure Moore and Healy’s labelling is the best available, but recognising the differing forces are at play seems important in understanding how “overconfidence” affects our decisions.

Henrich’s The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter

henrichWhen humans compete against chimps in tests of working memory, information processing or strategic play, chimps often come out on top. If you briefly flash 10 digits on a screen before covering them up, a trained chimp will often better identify the order in which the numbers appeared (see here). Have us play matching pennies, and the chimp can converge on the predicted (Nash equilibrium) result faster than the slow to adapt humans.

So given humans don’t appear to dominate chimps in raw brain power (I’ll leave contesting this particular fact until another day), what can explain the ecological dominance of humans?

Joe Henrich’s answer to this question, laid out in The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter, is that humans are superior learning machines. Once there is an accumulated stock of products of cultural evolution – fire, cutting tools, clothing, hunting tools and so on – natural selection favoured those who were better cultural learners. Natural selection shaped us to be a cultural species, as Henrich explains:

The central argument in this book is that relatively early in our species’ evolutionary history, perhaps around the origin of our genus (Homo) about 2 million years ago … cultural evolution became the primary driver of our species genetic evolution. The interaction between cultural and genetic evolution generated a process that can be described as autocatalytic, meaning that it produces the fuel that propels it. Once cultural information began to accumulate and produce cultural adaptations, the main selection pressure on genes revolved around improving our psychological abilities to acquire, store, process, and organize the array of fitness-enhancing skills and practices that became increasingly available in the minds of the others in one’s group. As genetic evolution improved our brains and abilities for learning from others, cultural evolution spontaneously generated more and better cultural adaptations, which kept the pressure on for brains that were better at acquiring and storing this cultural information.

The products of cultural evolution make us (in a sense) smarter. We receive a huge cultural download when growing up, from a base 10 counting system, to a large vocabulary allowing us to communicate complex concepts, to the ability to read and write, not to mention the knowhow to survive. Henrich argues that we don’t have all these tools because we are smart – we are smart because we have these tools. These cultural downloads can’t be devised in a few years by a few smart people. They comprise packages of adaptations developed over generations.

As one illustration of this point, Henrich produces a model where people can be either geniuses who produce more ideas, or social with more friends. Parameterise the model right and social groups end up much “smarter” with a larger stock of ideas. It is better to be able to learn and have more friends to learn from (again, within certain parameters) than have a fewer number of smarter friends. The natural extension of this is that larger populations will have more complex technologies (as Michael Kremer and others have argued – although see my extension on the evolving capacity to generate ideas).

One interesting feature of these cultural adaptations is that the bearers don’t necessarily understand how they work. They simply know how to effectively use them. An example Henrich draws on are food processing techniques developed over generations to remove toxins from otherwise inedible plants. People need, to a degree, to learn on faith. An unwillingness to learn can kill.

Take the consumption of unprocessed manioc (cassava), which can cause cyanide poisoning. South American groups that have consumed it for generations have developed multi-stage processes involving grating, scraping, separating, washing, boiling and waiting. Absent those, the poisoning emerges slowly after years of eating. Given the non-obvious nature of the negative outcomes and link between the practices and outcomes, the development of processing techniques is a long process.

When manioc was transported from South America to West Africa by the Portuguese, minus the cultural protocols, the result has been hundreds of years of cyanide poisoning. The problem that remains today. Some African groups have evolved processing techniques to remove the cyanide, but these are only slowly spreading.

Beyond the natural selection for learning ability, Henrich touches on a few other genetic and biological angles. One of the more interesting is the idea that gene-culture co-evolution can lead to non-genetic biological adaptations. The culture we are exposed to shapes our minds during development, leading to taxi drivers in London having a larger hippocampus, or people from different cultures having different perceptual ability when it comes to judging relative or absolute size. Growing up in different cultures also alters fairness motivations, patience, response to honour threats and so on.

Henrich is right to point out that his argument does not imply that seeing differences between groups implies cultural differences. They could be genetic, and different cultures over time could have moulded group differences. That said, Henrich also suggests genes play a tiny role, although it’s not a position brimming with analysis. As an example, he points out the high levels of violence among Scottish immigrants in the US Deep South who transported and retained an honour culture, compared to the low levels of violence in Scotland itself (or New England where there were also Scottish immigrants), without investing much effort in exploring other possibilities.

Henrich briefly addresses some of the competing hypotheses for why we evolved large brains and developed a the theory of mind (the ability to infer others’ goals). For example the Machiavellian hypothesis posits that our brains evolved to outthink each other in strategic competition. As Henrich notes, possessing a theory of mind can also lead to us more effectively copy and learn from them (the cultural intelligence hypothesis). Successful Machiavellian’s must be good cultural learners – you need to learn the rules before you can bend them.

Since the release of Henrich’s book, I have seen little response from the Stephen Pinker’s and evolutionary psychologists of the world, and I am looking forward to some critiques of Henrich’s argument.

So let me pose a few questions. As a start, until the last few hundred years, most of the world’s population didn’t use a base 10 counting system, couldn’t write and so on. Small scale societies might have a vocabulary of 3,000 to 5,000 words, compared to the 40,000 to 60,000 words held in the mind of the typical American 17-year old. The cultural download has shifted from something that could be passed on in a few years to something that takes a couple of decades of solid learning. Why did humans have so much latent capacity to increase the size of the cultural download? Was that latent capacity possibly generated by other mechanisms? Or has there been strong selection to continue to increase the stock of cultural knowledge we can hold?

Second, is there any modern evidence for the success of those who have better cultural learning abilities? We have evidence the higher reproductive success of those who kill in battle (see Napoleon Chagnon’s work) or those with higher income. What would an equivalent study to show the higher reproductive success of better cultural learners look like (assuming selection for that trait is still ongoing)? Or is it superior learning ability that leads to people to have higher success in battle or greater income? And in that case, are we just talking IQ?

Having been reading about cultural evolution for a few years now, I still struggle to grasp the extent to which it is a useful framework.

Partly, this question arises due to the lack of a well-defined cultural evolution framework. The definition of culture is often loose (see Arnold Kling on Henrich’s definition) and it typically varies between cultural evolution proponents. Even once it is defined, what is the evolutionary mechanism? If it is natural selection, what is the unit of selection? And so on.

Then there is the question of whether evolution is the right framework for all the forms of cultural transmission? Are models for the spread of disease a better fit? You will find plenty of discussions of this type of question across the cultural evolution literature, but little convergence.

Contrast cultural evolution with genetic natural selection. In the latter, high fidelity information is transmitted from parent to offspring in particulate form. Cultural transmission (whatever the cultural unit is) is lower-fidelity and can be in multiple directions. For genetic natural selection, selection is at the level of the gene, but the future of a gene and its vessels are typically tightly coupled within a generation. Not so with culture. As a result we shouldn’t expect to see the types of results we see in population/quantitative genetics in the cultural sphere. But can cultural evolution get even close?

You get a flavour of this when you look through the bespoke models produced in Henrich’s past work or, say, the work by Boyd and Richerson. Lot’s of interesting thinking tools and models, but hardly a unified framework.

A feature of the book that I appreciated was that Henrich avoided framing the group-based cultural evolutionary mechanisms he describes as “group selection”, preferring instead to call them “intergroup competition” (the term group selection only appears in the notes). In the cultural evolution space, group selection is a label that tends to be attached to all sorts of dynamics – whether they resemble genetic group selection processes or not – only leading to confusion. Henrich notes at one point that there are five forms of intergroup competition. Perhaps one of these might be described as approaching a group selection mechanism. (See West and friends on this point that in much of the cultural evolution literature, group selection is used to refer to many different things). By avoiding going down this path, Henrich has thankfully not added to the confusion.

One thread that I have rarely seen picked up in discussion of the book (excepting Arnold Kling) is the inherently conservative message that can be taken out of it. A common story through the book is that the bearers of cultural adaptations rarely understand how they work. In that world, one should be wary of replacing existing institutions or frameworks.

When Henrich offers his closing eight “insights”, he also seems to be suggesting we use markets (despite the absence of that world). Don’t design what we believe will work and impose it on all:

Humans are bad at intentionally designing effective institutions and organizations, although I’m hoping that as we get deeper insights into human nature and cultural evolution this can improve. Until then, we should take a page from cultural evolution’s playbook and design “variation and selection systems” that will allow alternative institutions or organizational forms to compete. We can dump the losers, keep the winners, and hopefully gain some general insights during the process.