Replicating anchoring effects

The classic Ariely, Loewenstein, and Prelec experiment (ungated pdf) ran as follows. Students are asked to think of the last two digits of their social security number – essentially a random number – as a dollar price. They are then asked whether they would be willing to buy certain consumer goods for that price or not. Finally, they are asked what is the most they would be willing to pay for each of these goods.

The result was that those with a higher starting price – that is, a higher last two digits on their social security number – were willing to pay more for the consumer goods. That random number “anchored” how much they were willing to pay.

Reading David Levine’s Is Behavioural Economics Doomed? (review to come soon), Levine mentions the following attempted replication:

On the Robustness of Anchoring Effects in WTP and WTA Experiments (ungated pdf)

Drew Fudenberg, David K. Levine, and Zacharias Maniadis

We reexamine the effects of the anchoring manipulation of Ariely, Loewenstein, and Prelec (2003) on the evaluation of common market goods and find very weak anchoring effects. We perform the same manipulation on the evaluation of binary lotteries, and find no anchoring effects at all. This suggests limits on the robustness of anchoring effects.

And from the body of the article:

Our first finding is that we are unable to replicate the results of ALP [Ariely, Loewenstein, and Prelec]: we find very weak anchoring effects both with WTP [willingness to pay] and with WTA [willingness to accept]. The Pearson correlation coefficients between the anchor and stated valuation are generally much lower than in ALP, and the magnitudes of the anchoring effects (as measured by the ratio of top to bottom quintile) are smaller. Repeating the ALP procedure for lotteries we do not find any anchoring effects at all.

Unlike ALP, we carried out laboratory rather than classroom experiments. This necessitated some minor changes—discussed below—from ALP’s procedures. It is conceivable that these changes are responsible for the differences in our findings; if so the robustness of their results is limited.

Our results do not confirm the very strong anchoring effects found in ALP. They are more in agreement with the results of Simonson and Drolet (2004) and Alevy, Landry, and List (2011). Simonson and Drolet (2004) used the same SSN-based anchor as ALP, and found no anchoring effects on WTA, and moderate anchoring effects on WTP for four common consumer goods. Alevy, Landry, and List (2011) performed a field experiment, eliciting the WTP for peanuts and collectible sports cards, and they found no anchoring effects. Bergman et al. (2010) also used the design of ALP for six common goods, and found anchoring effects, but of smaller magnitude than in ALP.

Tufano (2010) and Maniadis, Tufano, and List (2011) also failed to confirm the robustness of the magnitude of the anchoring effects of ALP, using hedonic experiences, rather than common goods. Tufano (2010) used the anchoring manipulation to increase the variance in subjects’ WTA for a bad-tasting liquid, but the manipulation had no effect. Notice that this liquid offers a simple (negative) hedonic experience, like the “annoying sounds” used in Experiment 2 of ALP. Maniadis, Tufano, and List (2011) replicated Experiment 2 of ALP and found weaker (and nonsignificant) anchoring effects. Overall our results suggest that anchoring is real—it is hard to reconcile otherwise the fact that in the WTA treatment with goods the ratios between highest and lowest quintile is always bigger than one—but that quantitatively the effect is small. Additionally our data supports the idea that anchoring goes away when bidding on objects with greater familiarity, such as lotteries.

Saint-Paul’s The Tyranny of Utility: Behavioral Social Science and the Rise of Paternalism

Saint-PaulThe growth in behavioural science has given a new foundation for paternalistic government interventions. Governments now try to help “biased” humans make better decisions – from nudging them to pay their taxes on time, to constraining the size of the soda they can buy, to making them save for that retirement so far in the future.

There is no shortage of critics of these interventions. Are people actually biased? Do these interventions change behaviour or improve outcomes for the better? Is an also biased government the right agent to fix these problems? Ultimately, do the costs outweigh the benefits of government action?

In The Tyranny of Utility: Behavioral Social Science and the Rise of Paternalism, Gilles Saint-Paul points out the danger in this line of defence. By fighting the utilitarian battle based on costs and benefits, there will almost certainly be circumstances in which the scientific evidence on human behaviour and the effect of the interventions will point in the freedom-reducing direction. Arguing about whether a certain behaviour is rational at best leads to an empirical debate. Similarly, arguments about the irrationality of government can be countered by empirical debate on how particular government interventions change behaviour and outcomes.

As a result, Saint-Paul argues that:

[I]f we want to provide intellectual foundations for limited governments, we cannot do it merely on the basis of instrumental arguments. Instead, we need a system of values that delivers those limits and such a system cannot be utilitarian.

Saint-Paul argues that part of the problem is that the utilitarian approach is the backbone of neoclassical economics – once (and still in some respects) a major source of arguments in favour of freedom. Now that the assumptions about human behaviour underpinning many neoclassical models are seen to no longer hold, you are still left with utility maximisation as the policy objective. As Saint-Paul writes:

It should be emphasized that the drift toward paternalism is entirely consistent with the research program of traditional economics, which supposes that policies should be advocated on the basis of a consequentialist cost-benefit analysis, using some appropriate social welfare function. Paternalism then derives naturally from these premises, by simply adding empirical knowledge about how people actually behave …

When Saint-Paul describes the practical costs of this increased paternalism, his choice of examples often make it hard to share his anger. One of his prime cases of infringed liberty is a five-times public transport molester who is banned from using the train as a court determined he lacked the self-control to travelling on it. On gun control laws he suggests authoritarian governments could rise in the absence of an armed citizenry.

Still, some of the other stories (or even these more extreme examples) lead to an important point. Saint-Paul points out that many of these interventions extend beyond the initial cause of the problem and impose responsibility on people for the failings of others. For example, in many countries you need a pool fence even if don’t have kids. You effectively need to look after other people’s children. Similarly, liquor laws can extend to preventing sales to people who are drunk or likely to drive. Where does the chain of responsibility transfer stop?

One of the more interesting threads in the book concerns what the objective of policy is. Is it consumption? Or happiness? And based on this objective, how far does the utilitarian argument extend. If it is happiness, should we just load everyone up with Prozac? And then what of the flow on costs if everyone decides to check out and be happy?

What if a cardiologist decides that experts and studies are right, that it’s stupid after all to buy a glossy Lamborghini, and dumps a few of his patients in order to take more time off with his family? How is the well-being of the patients affected? What if that entrepreneur who works seventy hours a week to gain market shares calls it a day and closes his factory? In a market society the pursuit of status and material achievement is obtained through voluntary exchange, and must thus benefit somebody else. Owning a Lamborghini is futile, but curing a heart disease is not. The cardiologist may be selfish and alienated; he makes his neighbors feel bad; and he is tired of the Lamborghini. His foolishness, however, has improved the lives of many people, even by the standards of happiness researchers. Competition to achieve status may be unpleasant to my future incarnations and those of my neighbors, but it increases the welfare of those who buy the goods I am producing to achieve this goal.

Saint-Paul’s response to these problems – presented more as suggestions than a manifesto, and thinly summarised in only two pages  at the end of the book – is not to ignore science but to set some limits:

I am not advocating that scientific evidence should be disregarded in the decision-making process. That is obviously a recipe for poor outcomes. Instead, I am pointing out that the increased power and reliability of Science makes it all the more important that strict limits define what is an acceptable government intervention and that it is socially accepted that policies which trespass those limits cannot be implemented regardless of their alleged beneficial outcomes. We are going in the opposite direction from such discipline.

These limits could involve a minimal redistributive state to rule out absolute poverty – allowing some values to supersede freedom – but these values would not include “statistical notions of public health or aggregate happiness”, nor most forms of strong paternalism.

But despite pointing to the dangers of utilitarian arguments against paternalistic interventions, Saint-Paul finds them hard to resist. He regularly refers the biases of government, noting the irony that “the government could well offset such deficiencies with its own policy tools but soon chose not to by having high public deficits and low interest rates.” And when it comes to his picture of his preferred world it has a utilitarian flavour itself.

Being treated by society as responsible and unitary goes a long way toward eliciting responsible and unitary behavior. The incentives to solve my own behavioral problems are much larger if I expect society to hold me responsible for the consequences of my actions.

Bad Behavioural Science: Failures, bias and fairy tales

Below is the text of my presentation to the Sydney Behavioural Economics and Behavioural Science Meetup on 11 May 2016. The talk is aimed at an intelligent non-specialist audience. I expect the behavioural science knowledge of most attendees is drawn from popular behavioural science books and meetups such as this.


The typical behavioural science or behavioural economics event is a love-in. We all get together to laugh at people’s irrationality – that is, the irrationality of others – and opine that if only we designed the world more intelligently, people would make better decisions.

We can point to a vast literature – described in books such as Dan Ariely’s Predictably Irrational, Daniel Kahneman’s Thinking, Fast and Slow, and Richard Thaler and Cass Sunstein’s Nudge – all demonstrating the fallibility of humans, the vast array of biases we exhibit in our everyday decision making, and how we can help to overcome these problems.

Today I want to muddy the waters. Not only is the “we can save the world” TED talk angle that tends to accompany behavioural science stories boring, but this angle also ignores the problems and debates in the field.

I am going to tell you four stories – stories that many of you will have heard before. Then I am going to look at the foundations of each of these stories and show that the conclusions you should draw from each are not as clear as you might have been told.

I will say at the outset that the message of this talk is not that all behavioural science is bunk. Rather, you need to critically assess what you hear.

I should also point out that I am only covering one of the possible angles of critique. There are plenty of others.

For those who want to capture what I say, at 7pm tonight (AEST) the script of what I propose to talk about and the important images from my slides will be posted on my blog, Evolving Economics. You can find this at That post will include links to all the studies I refer to.

Story one – the Florida effect.

John Bargh and friends asked two groups of 30 psychology students to rearrange scrambled words into a sentence that made sense. Students in each of these groups were randomly assigned into one of two conditions. Some students received scrambled sentences with words that relate to elderly stereotypes, such as worried, Florida, old, lonely, grey, wrinkle, and so on. The other students were given sentences with non-age-specific words.

After completing this exercise the participants were debriefed and thanked. They then exited the laboratory by walking down a corridor.

Now for the punch line. The experimenters timed the participants as they walked down the corridor. Those who had rearranged the sentences with non-age specific words walked down the corridor in a touch over seven seconds. Those who had rearranged the sentences with the elderly “primes” walked more slowly – down the corridor in a bit over eight seconds. A very cool result that has become known as the Florida Effect.

Except……the study doesn’t seem to replicate. In 2012 a paper was published in PLOS One where Stephen Doyen and friends used a laser timer to time how long people took to walk down the corridor after rearranging their scrambled sentences. The presence of the elderly words did not change their walking speed (unless the experimenters knew about the treatment – but that’s another story). There’s another failed replication on PsychFileDrawer.

What was most striking about this failed replication – apart from putting a big question mark next to the result – was the way the lead researcher John Bargh attacked the PLOS One paper in a post on his Psychology Today blog (his blog post appears to have been deleted, but you can see a description of the content in Ed Yong’s article). Apart from calling the post “Nothing in their heads” and describing the researchers as incompetent, he desperately tried to differentiate the results – such as by arguing there were differences in methodology (which in some cases did not actually exist) – and by suggesting that the replication team used too many primes.

I don’t want to pick on this particular study alone (although I’m happy to pick on the reaction). After all, failure to replicate is not proof that the effect does not exist. But failure to replicate is a systematic problem in the behavioural sciences (in fact, many sciences). A study by Brian Nosek and friends published in Science examined 100 cognitive and social psychology studies published in several major psychology journals. They subjectively rated 39% of the studies they attempted to replicate as having replicated. Only 25% of social psychology studies in that study met that mark. The size of the effect in these studies was also around half of that in the originals – as shown in this plot of original versus replication effect sizes. The Florida effect is just the tip of the iceberg.

Nosek et al (2015)

Nosek et al (2015)

Priming studies seem to be particularly problematic. Another priming area in trouble is “money priming”, where exposure to images of money or the concept of money make people less willing to help others or more likely to endorse a free market economy. As an example, one set of replications of the effect of money primes on political views by Rohrer and friends – as shown in these four charts –  found no effect (ungated pdf). Analysis of the broader literature on money priming suggests, among other things, massive publication bias.

As a non-priming example, those of you who have read Daniel Kahneman’s Thinking, Fast and Slow or Malcolm Gladwell’s David and Goliath might recall a study by Adam Alter and friends. In that study, 40 students were exposed to two versions of the cognitive reflection task. One of the typical questions in the cognitive reflection task is the following classic:

A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?

The two versions differed in that one used small light grey font that made the questions hard to read. Those exposed to the harder to read questions achieved higher scores.

Cognitive reflection test results, Meyer et al (2015)

It all sounds very cool. Slowing people down made them do better. But while the original study with 40 subjects found a large effect, replications involving thousands of people found nothing (Terry Burnham discusses this paper in more detail here). As you can see in the chart, the positive result is a small sample outlier.

Then there is ego depletion – the idea that we have a fixed stock of willpower that becomes depleted through use. If we have to use our willpower in one setting, we’re more likely to crumble later on as our ego is depleted.

Now, this theory doesn’t rest on one study – a 2010 meta-analysis examined 83 studies with 198 experiments in concluding there was an ego depletion effect. But that meta-analysis had a lot of flaws, including only including published studies.

Soon a pre-registered replication of one ego depletion experiment involving 23 labs and over 2,000 subjects will be published in Psychological Science. The result? If there is any effect of ego depletion – at least as captured in that experiment – it is close to zero.

So what is going on here? Why all these failures? One, there is likely publication bias. Only those studies with positive results make it into print. Small sample sizes in many studies make it likely that any positive results are false positives.

Then there is p-hacking. People play around with their hypotheses and the data until they get the result they want.

Then there is the garden of forking paths, which is the more subtle process whereby people choose their method of analysis or what data to exclude by what often seems to be good reasons after the fact. All of these lead to a higher probability of positive results and these positive results end up being the ones that we read.

Now that these bodies of research are crumbling, some of the obfuscation going on is deplorable. John Bargh’s concerning the Florida effect is one of the more extreme examples. Many of the original study proponents erect the defences and claim poor replication technique or that they haven’t captured all the subtleties of the situation. Personally, I’d like to see a lot more admissions of “well, that didn’t turn out”.

But what is also surprising was the level of confidence some people had in these findings. Here’s a passage from Kahneman’s Thinking, Fast and Slow – straight out of the chapter on priming:

When I describe priming studies to audiences, the reaction is often disbelief. This is not a surprise: System 2 believes that it is in charge and that it knows the reasons for its choices. Questions are probably cropping up in your mind as well: How is it possible for such trivial manipulations of the context to have such large effects? …

The idea you should focus on, however, is that disbelief is not an option. The results are not made up, nor are they statistical flukes. You have no choice but to accept that the major conclusions of these studies are true.

Err, no.

So, don’t believe every study you read. Maintain some of that scepticism even for large bodies of published research. Look for pre-registered replications where people clearly stated what they were going to do before they did it.

And I should say, this recommendation doesn’t just apply to academic studies. There are now plenty of governments and consultants running around advertising the results of their behavioural work with approaches also likely to be subject to similar problems.

Story two – the jam study

On two Saturdays in a California supermarket, Mark Lepper and Sheena Iyengar (ungated pdf) set up tasting displays of either six or 24 jars of jam. Consumers could taste as many jams as they wished, and if they approached the tasting table they received a $1 discount coupon to buy the jam.

For attracting initial interest, the large display of 24 jams did a better job, with 60 per cent of people who passed the display stopping. Forty per cent of people stopped at the six jam display. But only three per cent of those who stopped at the 24 jam display purchased any of the jam, compared with almost 30 per cent who stopped at the six jam display.

This result has been one of the centrepieces of the argument that more choice is not necessarily good. The larger display seemed to reduce consumer motivation to buy the product. The theories around this concept and the associated idea that more choice does not make us happy are often labelled the choice overload hypothesis or the paradox of choice. Barry Schwartz wrote a whole book on this topic.

Fast-forward 10 years to another paper, this one by Benjamin Scheibehenne and friends (ungated pdf). They surveyed the literature on the choice overload hypothesis – there is plenty. And across the basket of studies – shown in this chart – evidence of choice overload does not emerge so clearly. In some cases, choice increases purchases. In others it reduces them. Scheibehenne and friends determined that the mean effect size of changing the number of choices across the studies was effectively zero.

These reviewed studies included a few attempts to replicate the jam study results. An experiment using jam in an upscale German supermarket found no effect. Other experiments found no effect of choice size using chocolates or jelly beans. There were small differences in study design between these and the original jam study (as original authors are often quick to point out when replications fail), but if studies are so sensitive to study design and hard to replicate, it seems foolhardy to extrapolate the results of the original study too far.

There is a great quote from one of my favourite books, Jim Manzi’s Uncontrolled, which captures this danger.

[P]opularizers telescoped the conclusions derived from one coupon-plus-display promotion in one store on two Saturdays, up through assertions about the impact of product selection for jam for this store, to the impact of product selection for jam for all grocery stores in America, to claims about the impact of product selection for all retail products of any kind in every store, ultimately to fairly grandiose claims about the benefits of choice to society.

While these study results often lead to grandiose extrapolations, the defences of these studies when there is a failure to replicate or ambiguous evidence often undermine the extent of these claims. Claiming that the replication didn’t perfectly copy the original study suggests the original effect applies to a small set of circumstances. This is no longer TED talk material that can be applied across our whole life.

That is not to say that there is not something interesting going on in these choice studies. Scheibehenne and friends suggest that there may be a set of restrictive conditions under which choice overload occurs. These conditions might involve the complexity (and not the size) of the choice, the lack of dominant alternatives, assortment of options, time pressure or the distribution of product quality (as suggested by another meta-analysis). And since the jam study appears tough to replicate, these conditions might be narrow. They suggest more subtle solutions than simply reducing choice. Let’s not recommend supermarkets get rid of 75% of their product lines to boost their sales by 900%.

So, even if a study suggests something interesting is going on, don’t immediately swallow the TED talk and book on how this completely changes our understanding of the world. Even if the result is interesting, the story is likely more subtle than the way it is  told.

Story three – organ donation

Organ donation rates are an often used example of the power of defaults. I’m now going to take a moment to read a passage by Dan Ariely explaining how defaults affect organ donation rates. He refers to this chart from Johnson and Goldstein (2003) (ungated pdf):

One of my favorite graphs in all of social science is the following plot from an inspiring paper by Eric Johnson and Daniel Goldstein. This graph shows the percentage of people, across different European countries, who are willing to donate their organs after they pass away. When people see this plot and try to speculate about the cause for the differences between the countries that donate a lot (in blue) and the countries that donate little (in orange) they usually come up with “big” reasons such as religion, culture, etc.

But you will notice that pairs of similar countries have very different levels of organ donations. For example, take the following pairs of countries: Denmark and Sweden; the Netherlands and Belgium; Austria and Germany (and depending on your individual perspective France and the UK). These are countries that we usually think of as rather similar in terms of culture, religion, etc., yet their levels of organ donations are very different.

So, what could explain these differences? It turns out that it is the design of the form at the DMV. In countries where the form is set as “opt-in” (check this box if you want to participate in the organ donation program) people do not check the box and as a consequence they do not become a part of the program. In countries where the form is set as “opt-out” (check this box if you don’t want to participate in the organ donation program) people also do not check the box and are automatically enrolled in the program. In both cases large proportions of people simply adopt the default option.

Johnson and Goldstein (2003) Organ donation rates in Europe

But does this chart seem right given that story? Only 2 in every 10,000 people fail to opt-out in Austria? Only 3 in 10,000 in Hungary? It seems too few. And for Dan Ariely’s story, it is too few, because the process is not as described.

The hint is in the term “presumed consent” in the chart description. There is actually no time where Austrians or Hungarians are presented with a form where they can simply change from the default. Instead, they are presumed to consent to organ donation. To change that presumption, they have to take steps such as contacting government authorities to submit forms stating they don’t want their organs removed. Most people probably don’t even think about it. It’s like calling my Australian citizenship – resulting from my birth in Australia – a default and praising the Australian Government for its fine choice architecture.

And what about the outcomes we care about – actual organ donation rates. Remember, the numbers on the Johnson and Goldstein chart aren’t the proportion of people with organs removed from their bodies. It turns out that the relationship is much weaker there.

Here is a second chart with actual donation rates – the same countries in the same order. The relationship suddenly looks a lot less clear. Germany at 15.3 deceased donors per million people is not far from Austria’s 18.8 and above Sweden’s 15.1. For two countries not on this chart, Spain, which has an opt-out arrangement, is far ahead of most countries at 33.8 deceased donors per million, but the United States, an opt-in country, is also ahead of most opt-out countries with a donation rate of 26.0.

Deceased donors per million people

Deceased donors per million people (Wikipedia, 2016)

[To be clear, I am not suggesting that Johnson and Goldstein did not analyse the actual donation rates, nor that no difference exists – there is an estimate of the effect of presumed consent in their paper, and other papers also attempt to do this. Those papers generally find a positive effect. However, the story is almost always told using the first chart. A difference of 16.4 versus 14.1 donors per million (Johnson and Goldstein’s estimate) is not quite as striking as 99.98% for Austria versus 12% for Germany. Even my uncontrolled chart could be seen to be exaggerating the difference – the averages in my chart are 13.1 per million for opt out and 19.3 per million for presumed consent. See the comments from Johnson and Goldstein at the end of this post.]

So, if you can, read the original papers, not the popularised version – and I should say that although I’ve picked on Dan Ariely’s telling of the story here, he is hardly Robinson Crusoe in telling the organ donation story in that way. I’ve lost count of the number of times reading the original paper has completely derailed what I thought was the paper’s message.

In fact, sometimes you will discover there is no evidence for the story at all – Richard Titmuss’s suggestion that paying for blood donations might reduce supply by crowding out intrinsic motivations was a thought experiment, not an observed effect. Recent evidence suggests that – as per most economic goods – paying for blood could increase supply.

And this organ donation story provides a second more subtle lesson – if you can, look at the outcomes we want to influence, not some proxy that might not lead where you hope.

Story four – the hot hand

This last story is going to be somewhat technical. I actually chose it as a challenge to myself to see if I could communicate this idea to a group of intelligent non-technical people. It’s also a very cool story, based on work by Joshua Miller and Adam Sanjurjo. I don’t expect you to be able to immediately go and give these explanations to someone else at the end of this talk, but I hope you can see something  interesting is going on.

So, when people watch sports such as basketball, they often see a hot hand. They will describe players as “hot” and “in form”. Our belief is that the person who has just hit a shot or a series of shots is more likely to hit their next one.

But is this belief in the ‘hot hand’ a rational belief? Or is it the case that people are seeing something that doesn’t exist? Is the ‘hot hand’ an illusion?

To answer this question, Thomas Gilovich, Robert Vallone and Amos Tversky took masses of shot data from a variety of sources, including the Philadelphia 76ers and Boston Celtics, and examined it for evidence of a hot hand. This included shots in games, free throws and a controlled shooting experiment.

What did they find? The hot hand was an illusion.

So, let’s talk a bit about how we might show this. This table shows a set of four shots by a player in each of 10 games. In the first column is the results of their shots. An X is a hit, an O is a miss. This particular player took 40 shots and hit 20 – so they are a 50% shooter.

So what would count as evidence of a hot hand? What we can do is compare 1) the proportion of shots they hit if the shot immediately before was a hit with 2) their normal shooting percentage. If their hit rate after a hit is higher than their normal shot probability, then we might say they get hot.

The second column of the table shows proportion of shots hit by the player if the shot before was a hit. Looking at the first sequence, the first shot was a hit, and it is followed by a hit. The second shot, a hit, is followed by a miss. So, for that first sequence, the proportion of hits if the shot before was a hit is 50%. The last shot, the third hit, is not followed by any other shots, so does not affect our calculation. The rest of that column shows the proportion of hits followed by hits for the other sequences. Where there is no hit in the first three shots, those sequences don’t enter our calculations.

Basketball player shot sequences (X=hit, O=miss)

Shots p(X|X)
XXOX 50%
OOXX 100%
XXXX 100%
XXOO 50%
OOXX 100%

Across these sequences, the average proportion of hits following a hit is 50%. (That average is also the expected value we would get if we randomly picked one of these sequences.) Since the proportion of hits after a hit is the same as their shooting percentage, we could argue that they don’t have a hot hand.

Now, I am going to take you on a detour, and then we’ll come back to this example. And that detour involves the coin flipping that I got everyone to do before we commenced.

34 people flipped a coin four times, and I asked you to try to flip a heads on each flip.  [The numbers obtained for the coin flipping were, obviously, added after the fact. The raw data is here. And as it turned out they did not quite tell the story I expected, so there are some slight amendments below to the original script.] Here are the results of our experiment. In the second column is the proportion of heads that you threw. Across all of you, you flipped heads 49% of the time, pretty close to 50%. Obviously you have no control over your flips. But what is more interesting is the second column. On average, the proportion of heads flipped after an earlier flip of heads looks to be closer to 48%.

Meetup experiment results – flipping a coin four times

Number of players p(H) p(H|H)
34 49% 48%

Now, intuition tells us the probability of a heads after flipping an earlier heads will be 50% (unless you suffer from the gambler’s fallacy). So this seems to be the right result.

But let’s have a closer look at this. This next table shows the 16 possible combinations of heads and tails you could have flipped. Each of these 16 combinations has an equal probability of occurring. What is the average proportion of heads following a previous flip of a heads? It turns out it is 40.5%. That doesn’t seem right. But let’s delve deeper into this. In the third column is how many heads follow a heads, and the fourth how many tails follow a heads. If we count across all the sequences, we see that we have 12 heads and 12 tails after the 24 earlier flips of heads – spot on the 50% you expect.

16 possible combinations of heads and tails across four flips

Flips p(H|H) n(H|H) n(T|H)
HHHH 100% 3 0
HHHT 67% 2 1
HHTH 50% 1 1
HHTT 50% 1 1
HTHH 50% 1 1
HTHT 0% 0 2
HTTH 0% 0 1
HTTT 0% 0 1
THHH 100% 2 0
THHT 50% 1 1
THTH 0% 0 1
THTT 0% 0 1
TTHH 100% 1 0
TTHT 0% 0 1
AVERAGE 40.5% 12 12

So what is going on in that second column. By looking at these short sequences, we are introducing a bias. Most of the cases of heads following heads are clustered together – such as the first sequence which has three cases of a heads following a heads. Yet it has the same weight in our average as the sequence TTHT – with only one shot occurring after a heads. The reason a tails appears more likely to follow a heads is because of this bias. The actual probability of a heads following a heads is 50%.

And if we do the same exercise for your flips, the result looks now look a bit different – you flipped 28 heads and 22 tails for the 50 flips directly after a head. 56% heads, 44% tails. It seems you have a hot hand, although our original analysis clouded that result (Obviously, they didn’t really have a hot hand – it is a chance result. There was a 24% probability of getting 28 or more heads. Ideally I should have got a larger sample size.)

Meetup experiment results – flipping a coin four times

Number of players p(H) p(H|H) n(H|H) n(T|H)
34 49% 48% 28 22

Turning back to the basketball example I showed you at the beginning, there I suggested there was a 50% chance of a hit after a hit for a 50% shooter – the first two columns of the table below. But let’s count the shots that occur after a hit. There are 12 shots that occur after a hit, and it turns out that 7 of these shots are a hit. Our shooter hits 58% of shots immediately following a hit. They miss on only 42% of those shots. They have a hot hand (noting the small sample size here……but you get the picture).

Basketball player shot sequences (X=hit, O=miss)

Shots p(X|X) n(X|X) n(O|X)
XXOX 50% 1 1
OXOX 0% 0 1
OOXX 100% 1 0
OXOX 0% 0 1
XXXX 100% 3 0
XOOX 0% 0 1
XXOO 50% 1 1
OOXX 100% 1 0
AVERAGE 50% 7 5

So, why have I bothered with this stats lesson? By taking short sequences of shots and measuring the proportion of hits following a hit, I have introduced a bias in the measurement. The reason this is important is because the papers that supposedly show that there is no hot hand used a methodology that suffered from this same bias. When you correct for the bias, there is a hot hand.

Taking the famous paper by Tom Gilovich and friends that I mentioned at the beginning, they did not average across sequences as I have shown here. But by looking at short sequences of shots, selecting each hit (or sequence of hits) and seeing the result of the following shot, they introduced the same bias. The bias acts in the opposite direction to the hot hand, effectively cancelling it out and leading to a conclusion that each shot is independent of the last.

Miller and Sanjurjo crunched the numbers for one of the studies in the Gilovich and friends paper, and found that the probability of hitting a three pointer following a sequence of three previous hits is 13 percentage points higher than after a sequence of three misses. There truly is a hot hand. To give you a sense of the scale of that difference, Miller and Sanjurjo note that the difference between the median and best three point shooter in the NBA is only 10 percentage points.

Apart from the fact that this statistical bias slipped past everyone’s attention for close to thirty years, I find this result extraordinarily interesting for another reason. We have a body of research that suggests that even slight cues in the environment can change our actions. Words associated with old people can slow us down. Images of money can make us selfish. And so on. Yet why haven’t these same researchers been asking why a basketball player would not be influenced by their earlier shots – surely a more salient part of the environment than the word “Florida”? The desire to show one bias allowed them to overlook another.

So, remember that behavioural scientists are as biased as anyone.

If you are interested in learning more….

Before I close, I’ll leave with a few places you can go if you found tonight’s presentation interesting.

First is Andrew Gelman’s truly wonderful blog Statistical Modeling, Causal Inference, and Social Science. Please don’t be put off by the name – you will learn something from Gelman even if you know little about statistics. Personally, I’ve learnt more about statistics from this blog than I did through the half a dozen statistics and econometrics units I completed through university. This is the number one place to see crap papers skewered and for discussion about why we see so much poor research. Google Andrew Gelman and his blog will be at the top of the list.

ManziSecond, read Jim Manzi’s Uncontrolled. It will give you a new lens with which to think about causal associations in our world. Manzi’s plea for humility about what we believe we know is important.

Third, read some Gerd Gigerenzer. I only touched on a couple of the critiques of behavioural science tonight. There are many others, such as the question of how irrational we really are. On this angle, Gigerenzer’s work is among the most interesting. I suggest starting with Simple Heuristics That Make Us Smart by Gigerenzer, Peter Todd and the ABC Research Group, and go from there.

Finally, check out my blog Evolving Economics at I’m grumpy about more than the material that I covered tonight. I will point you to one piece – Please Not Another Bias: An Evolutionary Take on Behavioural Economics – where I complain about how behavioural economics needs to be more than a collection of biases, but hopefully you will find more there that’s of interest to you.

And that’s it for tonight.

Evolutionary Biology in Economics: A Review

I’ve just had a new article published in the Economic Record – Evolutionary Biology in Economics: A Review.

Evolutionary Biology in Economics: A Review

Jason Collins, Boris Baer and Ernst Juerg Weber

As human traits and preferences were shaped by natural selection, there is substantial potential for the use of evolutionary biology in economic analysis. In this paper, we review the extent to which evolutionary theory has been incorporated into economic research. We examine work in four areas: the evolution of preferences, the molecular genetic basis of economic traits, the interaction of evolutionary and economic dynamics, and the genetic foundations of economic development. These fields comprise a thriving body of research, but have significant scope for further investigation. In particular, the growing accessibility of low-cost molecular data will create more opportunities for research on the relationship between molecular genetic information and economic traits.

I previously posted about an earlier version of this paper when it was called The Evolutionary Foundations of Economics. You can access an ungated version of that earlier paper here. Drop me a line if you want a copy of the published paper but can’t get access.

It’s not the most exciting article – it was the introductory chapter of my PhD thesis and I wrote it to provide the foundations for the substantive chapters rather than to spark a revolution. However, it will give you a decent snapshot of what is going on.

Ariely’s The Honest Truth About Dishonesty

ArielyI rate the third of Dan Ariely’s books, The Honest Truth About Dishonesty: How We Lie to Everyone – Especially Ourselves, somewhere between his first two books.

One of the strengths of Ariely’s books is that he is largely writing about his own experiments, and not simply scraping through the same barrel as every other pop behavioural science author. The Honest Truth has a smaller back catalogue of experiments to draw from than Predictably Irrational, so it sometimes meanders in the same way as The Upside of Irrationality. But the thread that ties The Honest Truth together – how and why we cheat – and Ariely’s investigations into it gave those extended riffs more substance than the story telling that filled some parts of The Upside.

The basic story of the book is that we like to see ourselves as honest, but are quite willing and able to indulge in a small amount of cheating where we can rationalise it. This amount of cheating is quite flexible based on situational factors, such as what other people are doing, and is not purely the result of a cost-benefit calculation.

The experiment that crops up again and again through the book is a task to find numbers in a series of matrices. People then shred the answers before collecting payment based on how many the completed. Most people cheat a little, possibly because they can rationalise that they could have solved more, or had almost completed the next one. Few cheat to the maximum, even when it is clear they have the opportunity to do so.

For much of the first part of the book, Ariely frames his research against the Simple Model of Rational Crime (or ‘SMORC’) – where people do a rational cost-benefit analysis as to whether to commit the crime. He shows experiments where people don’t cheat to the maximum amount when they have no chance of being caught – almost no-one says that they solved all the puzzles (amusingly, a few say they solved 20 out of 20, but no-one says 18 or 19). And most people do not increase their level of cheating when the potential gains increase.

As Ariely works through the various experiments attempting to isolate parts of the SMORC and show they don’t hold, I never felt fully satisfied. It is always possible to see how people might rationally respond in a way that thwarts the experimental design.

For example, Ariely found that changes in the stake with no change in enforcement did not result in an increase in cheating. But if I am in an environment with more money, I might assume there is more monitoring and enforcement, even if I can’t see it. However, I believe Ariely is right in arguing that the decision is not a pure cost-benefit analysis.

One of the more interesting parts of the book concerned how increasing the degrees of separation from the monetary outcome increases cheating. Having people collect tokens, which could be later exchanged for cash, increased cheating. In that light, a decision to cheat in an area such as financial services, where the ultimate cost is cash but there are many degrees of separation (e.g. manipulating an interest rate benchmark which changes the price I get on a trade which affects my profit and loss which affects the size of my bonus), might not feel like cheating at all.

As is the case when I read any behavioural science book, the part that leaves me slightly cold is that I’m not sure I can trust some of the results. The recent replication failures involving priming and ego depletion – and both phenomena feature in the book – resulted in me taking some of the results with a grain of salt. How many will stand the test of time?

The Macrogenoeconomics of Comparative Development

Oded Galor has pointed me to his forthcoming article with Quamrul Ashraf in The Journal of Economic Literature.

The Macrogenoeconomics of Comparative Development

A vibrant literature has emerged in recent years to explore the influences of human evolution and the genetic composition of populations on the comparative economic performance of societies, highlighting the roles played by the Neolithic Revolution and the prehistoric “out of Africa” migration of anatomically modern humans in generating worldwide variations in the composition of genetic traits across populations. The recent attempt by Nicholas Wade’s “A Troublesome Inheritance: Genes, Race and Human History” to expose the evolutionary origins of comparative economic development to a wider audience provides an opportunity to review this important literature in the context of his theory.

A couple of paragraphs from the introduction:

Wade advances a modified evolutionary theory of long-run economic development, based on regional variation in the intensity of positive selection of traits that are conducive to growth-enhancing institutions. His theory suggests that variation in the duration of selective pressures on genetic traits across regions form the basis of differences in social behaviors across racial groups, thereby shaping variations in the nature of institutions and, thus, the level of economic development across the globe. Although at the outset, the broad outline of this argument appears plausible and largely consistent with existing evolutionary theories of comparative development, there is currently no compelling evidence for supporting the actual mechanisms proposed by Wade. …

The two fundamental building blocks of Wade’s theory are rather speculative. In particular, his narrative relies on unsubstantiated selection mechanisms and on empirically unsupported conjectures regarding the determinants of institutional variation across societies. … Rather than subjecting his hypothesized mechanism to the scrutiny of evolutionary growth theory, Wade follows the speculative supposition of Clark (2007), merely positing that in historically densely populated regions of the world that were characterized by early statehood, there existed a class of rich elites, endowed with genetic traits (e.g., nonviolence, cooperation, and thrift) conducive to growth-enhancing institutions, whose evolutionary advantage increased the prevalence of these favorable traits in the populations of those regions over time. It is far from evident, however, that the traits emphasized by Wade necessarily generated higher incomes in a Malthusian environment and were, thus, necessarily favored by the forces of natural selection. Moreover, Wade provides no evidence on how variations across societies in their geographical setting or historical experience could have given rise to differential selective pressures on these traits and, thus, generated variation in the growth-promoting genetic makeup of their populations. Furthermore, there is currently little scientific consensus on the extent to which the key behavioral traits of nonviolence, cooperation, and thrift, as emphasized by Wade’s theory, are genetically determined.

The second building block of Wade’s theory that links genetic traits to institutions is equally speculative. In particular, there is little evidence to support the claim that the variation in institutions across societies is driven by differences in their endowment of specific genetic traits that might govern key social behaviors.

Failure to replicate: ego depletion edition

Ego depletion is the idea that we have a limited supply of willpower. As we use it through the day, we become depleted and more likely to experience a willpower failure.

There is a mountain of published experiments providing evidence of ego depletion. Meta-analyses of the studies have supported the concept. The typical trick in these experiments is to get someone to engage in an ego depleting task – such as resisting chocolate – and then you watch them cave in more quickly on a later task than those who haven’t been subject to the earlier ego depletion.

But now the evidence is looking shaky. A pre-registered replication involving 23 labs and over 2,000 subjects will be published in Psychological Science. A smaller scale attempt to replicate was also published in PLOS One. The result? If there is any effect of ego depletion, it is close to zero.

Daniel Engber at Slate has the full story. One of the interesting points is how the meta-analysis didn’t show any problems:

To figure out what went wrong, Carter reviewed the 2010 meta-analysis—the study using data from 83 studies and 198 experiments. The closer he looked at the paper, though, the less he believed in its conclusions. First, the meta-analysis included only published studies, which meant the data would be subject to a standard bias in favor of positive results. Second, it included studies with contradictory or counterintuitive measures of self-control. One study, for example, suggested that depleted subjects would give more money to charity while another said depleted subjects would spend less time helping a stranger. When he and his adviser, Michael McCullough, reanalyzed the 2010 paper’s data using state-of-the-art analytic methods, they found no effect. For a second paper published last year, Carter and McCullough completed a second meta-analysis that included different studies, including 48 experiments that had never been published. Again, they found “very little evidence” of a real effect.

Roy Baumeister, one of the founders on this work on ego depletion, provided a response to Slate. It’s typical of many responses to this growing replication ‘crisis’ in psychology – suggest that those replicating the experiments haven’t captured all the experimental nuances, or that the effect is context specific.

In his lab, Baumeister told me, the letter e task [the task used in the replication] would have been handled differently. First, he’d train his subjects to pick out all the words containing e, until that became an ingrained habit. Only then would he add the second rule, about ignoring words with e’s and nearby vowels. That version of the task requires much more self-control, he says.

Second, he’d have his subjects do the task with pen and paper, instead of on a computer. It might take more self-control, he suggested, to withhold a gross movement of the arm than to stifle a tap of the finger on a keyboard.

If the replication showed us anything, Baumeister says, it’s that the field has gotten hung up on computer-based investigations. “In the olden days there was a craft to running an experiment. You worked with people, and got them into the right psychological state and then measured the consequences. There’s a wish now to have everything be automated so it can be done quickly and easily online.” These days, he continues, there’s less and less actual behavior in the science of behavior. “It’s just sitting at a computer and doing readings.”

Engber nicely points out the consequence of this line of defence. The big idea – and you only need to read Willpower to see that Baumeister and friends sell ego depletion as a big idea – loses its power:

One of the idea’s major selling points is its flexibility: Ego depletion applied not just to experiments involving chocolate chip cookies and radishes, but to those involving word games, conversations between white people and black people, decisions on whether to purchase soap, and even the behavior of dogs. In fact, the incredible range of the effect has often been cited in its favor. How could so many studies, performed in so many different ways, have all been wrong?

Yet now we know that ego depletion might be very fragile. It might be so sensitive to how a test is run that switching from a pen and paper to a keyboard and screen would be enough to make it disappear. If that’s the case, then why should we trust all those other variations on the theme? If that’s the case, then the Big Idea has shrunk to something very small.

Personally, I don’t believe that this is a case of experimental outcomes being subject to specific experimental context. Rather, the ‘experimental context’ is the ‘garden of forking paths‘, p-hacking and publication bias.

Notes on a few books

The Advertising Effect: How to Change Behaviour by Adam Ferrier

If you’ve read a couple of behavioural economics/behavioural science books, it doesn’t take long to become bored with hearing the same experiments and examples over and over again.

Ferrier manages to largely avoid that problem. He works in advertising, so has plenty of new stories to tell, and it’s interesting to hear how advertisers go about their job (and desperately try to win the beer accounts). It also helps that Ferrier is a trained psych, so he brings a bit more psychology to the task than you typically see in the pop behavioural science literature.

That said, when The Advertising Effect does stray into those familiar studies, you start to run into the problem that many of them aren’t standing the test of time particularly well (power posing being one example).

Digital Gold: Bitcoin and the Inside Story of the Misfits and Millionaires Trying to Reinvent Money by Nathaniel Popper

Even though this book is less than a year old, it already feels like it is missing a chapter or two at the end. Still, it’s an easy and entertaining history of Bitcoin.

Mine-Field: The Dark Side of Australia’s Resources Rush by Paul Cleary

As Cleary notes, “regulation is more focused on flora and fauna than on the people affected by mining end energy developments.”

Radical Chic and Mau-Mauing the Flak Catchers by Tom Wolfe

New York society throws a party to raise funds for the Black Panthers. Many great passages – here’s one instance:

One rule is that nostalgie de la boue – i.e., the styles of romantic, raw-vital, Low Rent primitives – are good; and middle class, whether black or white, is bad. Therefore, Radical Chic invariably favors radicals who seem primitive, exotic and romantic, such as the grape workers, who are not merely radical and ‘of the soil,’ but also Latin; the Panthers, with their leather pieces, Afros, shades, and shoot-outs; and the Red Indians, who, of course, had always seemed primitive, exotic and romantic. …

Rule No. 2 was that no matter what, one should always maintain a proper address, a proper scale of interior decoration, and servants. Servants, especially, were one of the last absolute dividing lines between those truly “in Society,” New or Old, and the great scuffling mass of middle-class strivers paying up to $1,250-a-month rent or buying expensive co-ops all over the East Side. …

In the era of Radical Chic, then, what a collision course was set between the absolute need for servants—and the fact that the servant was the absolute symbol of what the new movements, black or brown, were struggling against! How absolutely urgent, then, became the search for the only way out: white servants!

Crime and Punishment by Fyodor Dostoevsky (Richard Pevear and Larissa Volokhonsky translation)

Another classic well worth reading.

Masel’s Bypass Wall Street: A Biologist’s Guide to the Rat Race

Bypass Wall StreetTyler Cowen described Joanna Masel’s Bypass Wall Street: A Biologist’s Guide to the Rat Race as “Darwin plus Fred Hirsch on positional goods as applied to finance and portfolios. Unorthodox, interesting.”

I agree with Cowen’s description of the book as unorthodox and interesting, although I was looking forward to more Darwin and more of a biological lens. As the title of the book implies, it provides a biologist’s view on savings and investment, and Masel’s background as a biologist – she is Associate Professor of Ecology & Evolutionary Biology at the University of Arizona – has likely guided her as to what arguments she is sympathetic to.

But the examination is not on the face of it from a biological perspective. Only two biological arguments directly referenced. The first is the distinction between absolute and relative competition. Relative competition can lead to wasteful arms races that are, on net, destructive of value. The second is a brief pointer to the competition between siblings for their parents’ finite attention and resources. If you asked someone to read Masel’s book and Robert Frank’s The Darwin Economy and guess who is the economist and who is the biologist, they’d likely guess their occupations the wrong way around.

A stronger influence has been some of Masel’s reading in economics. In the preface, she points to two books to which she owes an intellectual debt – Keynes’s The General Theory of Employment, Interest and Money, and Fred Hirsch’s Social Limits to Growth. Her analysis of savings and investment rests heavily on Keynes, and Hirsch’s views on positional goods provides a hook for her biological intuition that competition can be wasteful and zero sum.

The main thread of the book is the journey of “Jen” (a thinly disguised Masel?) as she decides how she should invest for her retirement. Masel builds up the analysis from near first principles and works through a set of possible investment options. She asks whether Jen should invest in stocks? Which stocks? Index funds? What are the future prospects of the stock market? If returns are unlikely to be strong, what are the other options? Is there a way to tap into areas traditionally the domain of public investment, such as health and infrastructure? What of more unorthodox options? And so on.

I won’t go into detail about where Masel lands – in some ways the most compelling part of the book is wondering just where Jen will end up – except to say that I doubt many people are going to find much guidance relevant to themselves. There are some points along Jen’s journey where I’m not convinced I agree, but they mostly relate to the finer points of what exactly savings and investment are, how it flows, and the like.

There are many moments in the book where Masel channels arguments argued in detail elsewhere – even though there is no sign that Masel has read these other sources. She shares Tyler Cowen and Robert Gordon’s view that many of the big innovations are over as part of her view that the stock market may be overvalued (although she is closer to Gordon’s pessimism). There are also many times where I could hear Robert Frank talking out of the pages, with her views on relative competition and public investment reflecting those of Frank.

On that point, the book is quite reference light – something Masel admits was deliberately done to avoid it becoming a heavily footnoted academic tome. I have some sympathy for that, but there are occasions in the book where I was longing for Masel to put up complements or counterpoints to her thinking and to discuss them.

Despite the different paths to get there, Masel often lands on conclusions that I have a lot of sympathy for. For example, she points out the crudeness of regulation defining “sophisticated” investors based on income or assets – which limits investment options for those who don’t meet the threshold. A university lecturer, who has likely forsaken material income in their career choice, does not meet the threshold despite likely being much more sophisticated than others who do.

She also mounts a strong argument for setting retirement accounts free. Today’s poor need the money now. There are many vested interests keen to keep people’s money locked in retirement accounts because of the fees they can charge. (As an aside, in Australia you can self manage your compulsory retirement savings – you can’t access them before retirement, but you have effective control on the asset allocation and who takes a cut.)

One other argument I have sympathy for is the role of education as a signal. Education can become susceptible to arms races, leading to over-investment compared to that which would optimally be obtained absent the relative competition.

To close, I will suggest a short reading list for Masel. Maybe she has already read some of these, but I expect she will find a lot of material of interest.

Gottschall’s The Storytelling Animal

storytelling-animalIn The Storytelling Animal: How Stories Make Us Human, Jonathan Gottschall asks why we live and breathe stories. We are prolific storytellers. We consume movies, novels and plays. We even create stories in our sleep.

Gottschall’s argument is that our propensity to storytelling is an evolved trait that helps us navigate problems. He likens stories to flight simulators that prepare us for problems when they arise.

Here are snippets from two chapters. First, the idea that the mind is a storyteller – an idea common in Nassim Taleb’s writings:

[W]hile Sherlock Holmes stories are good fun, it pays to notice that Holmes’s method is ridiculous.

Take the risk story Holmes concocts after glancing at Watson in the lab [at the beginning of A Study in Scarlet]. Watson is dressed in ordinary civilian clothes. What gives him “the air of a military man”? Watson is not carrying his medical bag or wearing a stethoscope around his neck. What identifies him as “a gentleman of a medical type”? And why is Holmes so sure that Watson had just returned from Afghanistan rather than from one of many other dangerous tropical garrison where Britain, at the height of its empire, stationed troops? (Let’s ignore the fact that Afghanistan is not actually in the tropical band.) …

In short, Sherlock Holmes’s usual method is to fabricate the most confident and complete explanatory stories from the most ambiguous clues. Holmes seizes on one of a hundred different interpretations of a clue and arbitrarily insists that the interpretation is correct. This then becomes the basis for a multitude of similarly improbable interpretations that all add up to a neat, ingenious, and vanishingly improbable explanatory story. …

We each have a little Sherlock Holmes in our brain. His job is to “reason backwards” from what we can observe in the present and show what orderly series of causes led to particular effects. Evolution has given us an “inner Holmes” because the world really is full of stories (intrigues, plots, alliances, relationships of cause and effect), and it pays to detect them. …

But the storytelling mind is imperfect. … The storytelling mind is allergic to uncertainty, randomness and coincidence. It is addicted to meaning. If the storytelling mind cannot find meaningful patterns in the world, it will try to impose them. In short, the storytelling mind is a factory that churns out true stories when it can, but will manufacture lies when it can’t.

The second snippet relates to the fallibility of our memories in telling stories. Memories are open to contamination, and are fictionalisations of past events rather than perfectly recollections.

In a classic experiment, Elizabeth Loftus and her colleagues gathered information from independent sources about undergraduate students’ childhoods. The psychologists then brought students into the lab and went over lists of actual events in their lives. The lists were Trojan horses that hid a single lie: When the student was five years old, the psychologists claimed, he wandered away from his parents in a mall. His parents were frightened, and so was he. Eventually an old man reunited him with his parents. At first, the students had no memory of this fictional event. But when they were later called back into the lab and asked about the mall episode, 25 percent of them said they remembered it. These students not only recalled the bare events that the researchers had supplied, but they also added many vivid details of their own.

The study was among the first of many to show how shockingly vulnerable the memory system is to contamination by suggestion.

I have several “clear” childhood memories that I suspect did not occur. That doesn’t overly worry me, but what does is my recollection of papers and books that I regularly refer to in conversation. Each time I recall the paper or book, I affect my memory of it. More than once I have gone back to the original after several years to re-read it, and realised that, even if not wrong in fact, my recollection of the tone, nuance and strength of the argument was well off.

Having pulled out two snippets of storytelling gone wrong, the book is positive about the effect of storytelling on the world. Gottschall argues that storytelling is often deeply moral, normally deals with problems of great (evolutionary) relevance to us and is a major cohering force in society. And I tend to agree.