Author: Jason Collins

Economics. Behavioural and data science. PhD economics and evolutionary biology. Blog at Evolving Economics

Psychology as a knight in shining armour, and other thoughts by Paul Ormerod on Thaler’s Misbehaving

I have been meaning to write some notes on Richard Thaler’s Misbehaving: The Making of Behavioral Economics for some time, but having now come across a review by Paul Ormerod (ungated pdf) – together with his perspective on the position of behavioural economics in the discipline – I feel somewhat less need. Below are some interesting sections of Ormerod’s review.

First, on the incorporation of psychology into economics:

With a few notable exceptions, psychologists themselves have not engaged with the area. ‘Behavioral economics has turned out to be primarily a field in which economists read the work of psychologists and then go about their business of doing research independently’ (p. 179). One reason for this which Thaler gives is that few psychologists have any attachment to the rational choice model, so studying deviations from it is not interesting. Another is that ‘the study of “applied” problems in psychology has traditionally been considered a low status activity’ (p. 180).

It is fashionable in many social science circles to deride economics, and to imagine that if only these obstinate and ideological economists would import social science theories into the discipline, all would be well. All manner of things would be well, for somehow these theories would not only be scientifically superior, but their policy implications would lead to the disappearance of all sorts of evils, such as austerity and even neo-liberalism itself. This previous sentence deliberately invokes a caricature, but one which will be all too recognisable to economists in Anglo-Saxon universities who have dealings with their colleagues in the wider social sciences.

A recent article in Science (Open Science Collaboration 2015) certainly calls into question whether psychology can perform this role of knight in shining armour. A team of no fewer than 270 co-authors attempted to replicate the results of 100 experiments published in leading psychology journals. … [O]nly 36 per cent of the attempted replications led to results which were statistically significant. Further, the average size of the effects found in the replicated studies was only half that reported in the original studies. …

Either the original or the replication work could be flawed, or crucial differences between the two might be unappreciated. … So the strategy adopted by behavioural economists of choosing for themselves which bits of psychology to use seems eminently sensible.

On generalising behavioural economics:

The empirical results obtained in behavioural economics are very interesting and some, at least, seem to be well established. But the inherent indeterminacy discussed above is the main reason for unease with the area within mainstream economics. Alongside Misbehaving, any economist interested in behavioural economics should read the symposium on bounded rationality in the June 2013 edition of the Journal of Economic Literature. …

In a paper titled ‘Bounded-Rationality Models: Tasks to Become Intellectually Competitive’, Harstad and Selten make a key point that although models have been elaborated which incorporate insights of boundedly rational behaviour, ‘the collection of alternative models has made little headway supplanting the dominant paradigm’ (2013, p. 496). Crawford’s symposium paper notes that ‘in most settings, there is an enormous number of logically possible models… that deviate from neoclassical models. In attempting to improve upon neoclassical models, it is essential to have some principled way of choosing among alternatives’ (2013, p. 524). He continues further on the same page ‘to improve on a neoclassical model, one must identify systematic deviations; otherwise one would do better to stick with a noisier neoclassical model’.

Rabin is possibly the most sympathetic of the symposium authors, noting for example that ‘many of the ways humans are less than fully rational are not because the right answers are so complex. They are instead because the wrong answers are so enticing’ (2013, p. 529). Rabin does go on, however, to state that ‘care should be taken to investigate whether the new models improve insight on average… in my view, many new models and explanations for experimental findings look artificially good and artificially insightful in the very limited domain to which they are applied’ (2013, p. 536). …

… Misbehaving does not deal nearly as well with the arguments that in many situations agents will learn to be rational. The arguments in the Journal of Economic Literature symposium both encompass and generalise this problem for behavioural economics. The authors accept without question that in many circumstances deviations from rationality are observed. However, no guidelines, no heuristics, are offered as to the circumstances in which systematic deviations might be expected, and circumstances where the rational model is still appropriate. Further, the theoretical models developed to explain some of the empirical findings in behavioural economics are very particular to the area of investigation, and do not readily permit generalisation.

On applying behavioural economics to policy:

In the final part (Part VIII) he discusses a modest number of examples where the insights of behavioural economics seem to have helped policymakers. He is at pains to point out that he is not trying to ‘replace markets with bureaucrats’ (p. 307). He discusses at some length the term he coined with Sunstein, ‘libertarian paternalism’. …

We might perhaps reflect on why it is necessary to invent this term at all. The aim of any democratic government is to improve the lot of the citizens who have elected it to power. A government may attempt to make life better for everyone, for the interest groups who voted for it, for the young, for the old, or for whatever division of the electorate which we care to name. But to do so, it has to implement policies that will lead to outcomes which are different from those which would otherwise have happened. They may succeed, they may fail. They may have unintended consequences, for good or for ill. By definition, government acts in paternalist ways. By the use of the word ‘libertarian’, Thaler could be seen as trying to distance himself from the world of the central planner.

… And yet the suspicion remains that the central planning mind set lurks beneath the surface. On page 324, for example, Thaler writes that ‘in our increasingly complicated world, people cannot be expected to have the experience to make anything close to the optimal decisions in all the domains in which they are forced to choose’. The implication is that behavioural economics both knows what is optimal for people and can help them get closer to the optimum.

Further, we read that ‘[a] big picture question that begs for more thorough behavioral analysis is the best way to encourage people to start new businesses (especially those which might be successful)’ (p. 351). It is the phrase in brackets which is of interest. Very few people, we can readily conjecture, start new businesses in order for them to fail. But most new firms do exactly that. Failure rates are very high, especially in the first two or three years of life. How exactly would we know whether a start-up was likely to be successful? There is indeed a point from the so-called ‘Gauntlet’ of orthodox economics which is valid in this particular context. Anyone who had a good insight into which start-ups were likely to be successful would surely be extremely rich.

Unchanging humans

One interesting thread to Don Norman’s excellent The Design of Everyday Things is the idea that while our tools and technologies are subject to constant change, humans stay the same. The fundamental psychology of humans is a relative constant.

Evolutionary change to people is always taking place, but the pace of human evolutionary change is measured in thousands of years. Human cultures change somewhat more rapidly over periods measured in decades or centuries. Microcultures, such as the way by which teenagers differ from adults, can change in a generation. What this means is that although technology is continually introducing new means of doing things, people are resistant to changes in the way they do things.

I feel this is generally the right perspective to think about human interaction with technology. There are certainly biological changes to humans based on their life experience. Take the larger hippocampus of London taxi drivers, increasing height through industrialisation, or the Flynn effect. But the basic building blocks are relatively constant. The humans of today and twenty years ago are close to being the same.

Every time I hear arguments about changing humans (or any discussion of millennials, generation X and the like), I recall the following quote from Bill Bernbach (I think first pointed out to me by Rory Sutherland):

It took millions of years for man’s instincts to develop. It will take millions more for them to even vary. It is fashionable to talk about changing man. A communicator must be concerned with unchanging man, with his obsessive drive to survive, to be admired, to succeed, to love, to take care of his own.

(If I were making a similar statement, I’d use a shorter time period than “millions”, but I think Bernbach’s point still stands.)

But for how long will this hold? Don Norman again:

For many millennia, even though technology has undergone radical change, people have remained the same. Will this hold true in the future? What happens as we add more and more enhancements inside the human body? People with prosthetic limbs will be faster, stronger, and better runners or sports players than normal players. Implanted hearing devices and artificial lenses and corneas are already in use. Implanted memory and communication devices will mean that some people will have permanently enhanced reality, never lacking for information. Implanted computational devices could enhance thinking, problem-solving, and decision-making. People might become cyborgs: part biology, part artificial technology. In turn, machines will become more like people, with neural-like computational abilities and humanlike behavior. Moreover, new developments in biology might add to the list of artificial supplements, with genetic modification of people and biological processors and devices for machines.

I suspect much of this, at least in the short term, will only relate to some humans. The masses will experience these changes with some lag.

(See also my last post on the human-machine mix.)

Getting the right human-machine mix

Much of the storytelling about the future and humans and machines runs with a theme that machines will not replace us, but that we will work with machines to create a combination greater than either alone. If you have heard the freestyle chess example, which now seems to be everywhere, you will understand the idea. (See my article in Behavioral Scientist if you haven’t.)

An interesting angle to this relationship is just how unsuited some of our existing human-machine combinations are for the unique skills of a human brings. As Don Norman writes in his excellent The Design of Everyday Things:

People are flexible, versatile, and creative. Machines are rigid, precise, and relatively fixed in their operations. There is a mismatch between the two, one that can lead to enhanced capability if used properly. Think of an electronic calculator. It doesn’t do mathematics like a person, but can solve problems people can’t. Moreover, calculators do not make errors. So the human plus calculator is a perfect collaboration: we humans figure out what the important problems are and how to state them. Then we use calculators to compute the solutions.

Difficulties arise when we do not think of people and machines as collaborative systems, but assign whatever tasks can be automated to the machines and leave the rest to people. This ends up requiring people to behave in machine like fashion, in ways that differ from human capabilities. We expect people to monitor machines, which means keeping alert for long periods, something we are bad at. We require people to do repeated operations with the extreme precision and accuracy required by machines, again something we are not good at. When we divide up the machine and human components of a task in this way, we fail to take advantage of human strengths and capabilities but instead rely upon areas where we are genetically, biologically unsuited.

The result is that at the moments when we expect the humans to act, we have set them up for failure:

We design equipment that requires people to be fully alert and attentive for hours, or to remember archaic, confusing procedures even if they are only used infrequently, sometimes only once in a lifetime. We put people in boring environments with nothing to do for hours on end, until suddenly they must respond quickly and accurately. Or we subject them to complex, high-workload environments, where they are continually interrupted while having to do multiple tasks simultaneously. Then we wonder why there is failure.


Automation keeps getting more and more capable. Automatic systems can take over tasks that used to be done by people, whether it is maintaining the proper temperature, automatically keeping an automobile within its assigned lane at the correct distance from the car in front, enabling airplanes to fly by themselves from takeoff to landing, or allowing ships to navigate by themselves. When the automation works, the tasks are usually done as well as or better than by people. Moreover, it saves people from the dull, dreary routine tasks, allowing more useful, productive use of time, reducing fatigue and error. But when the task gets too complex, automation tends to give up. This, of course, is precisely when it is needed the most. The paradox is that automation can take over the dull, dreary tasks, but fail with the complex ones.

When automation fails, it often does so without warning. … When the failure occurs, the human is “out of the loop.” This means that the person has not been paying much attention to the operation, and it takes time for the failure to be noticed and evaluated, and then to decide how to respond.

There is an increasing catalogue of these types of failures. Air France flight 447, which crashed into the Atlantic in 2009, is a classic case. The autopilot suddenly handed to the pilots an otherwise well-functioning plane due to an airspeed indicator problem, leading to disaster. But perhaps this new type of failure is an acceptable result of the overall improvement in system safety or performance.

This human-machine mismatch is also a theme in Charles Perrow’s Normal Accidents. Perrow notes that many systems are poorly suited to human psychology, with long periods of inactivity interspersed by bunched workload. The humans are often pulled into the loop just at the moments things are starting to go wrong. The question is not how much work humans can safely do, but how little.

Coursera’s Data Science Specialisation: A Review

As I mentioned in my comments on Coursera’s Executive Data Science specialisation, I have looked at a lot of online data science and statistics courses to find useful training material, understand the skills of people who have done these online courses, plus learn a bit myself.

One of the best known sets of courses is Coursera’s Data Science Specialisation, created by John Hopkins University. It is a ten course program that covers the data science process from data collection to the production of data science products. It focuses on implementing the data science process in R.

This specialisation is a signal that someone is familiar with data analysis in R – and the units are not bad if learning R is your goal. But this specialisation (nor any other similar length course I have reviewed to date) doesn’t offer a shortcut to the statistical knowledge necessary for good data science. A few university length units seem to be the minimum, and even they need to be paired with experience and self-directed study (not to mention some skepticism of what we can determine).

The specialisation assessments are such that you can often pass the courses without understanding what you have been taught. Points for some courses are awarded for “effort” (see Statistical Inference below). While capped at three attempts per 8 hours, the multiple choice quizzes have effectively unlimited attempts. I don’t have a great deal of faith in university assessment processes either – particularly in Australia where no-one wants to disrupt the flood of fees from international students by failing someone – but the assessment in these specialisations require even less knowledge or effort. They’re not much of a signal of anything.

If you are wondering whether you should audit or pay for the specialisation, you can’t submit the assignments under the audit option. But the quizzes are basic and you can find plenty of assignment submissions on GitHub or RPubs against which you can check your work.

Here are some notes on each course. I looked through each of these over a year or so, so there might be some updates to the earlier courses (although a quick revisit suggests my comments still apply).

  1. The Data Scientist’s Toolbox: Little more than an exercise in installing R and git, together with an overview of the other courses in the specialisation. If you are familiar with R and git, skip.
  1. R Programming: In some ways the specialisation could have been called R Programming. This unit is one of the better of the ten, and gives a basic grounding in R.
  1. Getting and Cleaning Data: Not bad for getting a grasp of the various ways of extracting data into R, but watching video after video of imports of different formats makes for less-than exciting viewing. The principles on tidy data are important – the unit is worth doing for this alone.
  1. Exploratory Data Analysis: Really a course in charting in R, but a decent one at that. There is some material on principal components analysis and clustering that will likely go over most people’s heads – too much material in too little time.
  1. Reproducible Research: The subject of this unit – literate (statistical) programming – is one of the more important subjects covered in the specialisation. However, this unit seemed cobbled together – lectures repeated points and didn’t seem produced to a logical structure. The last lecture is a conference video (albeit one worth watching). If you compare this unit to the (outstanding) production effort that has gone into the Applied Data Science with Python specialisation, this unit compares poorly.
  1. Statistical Inference: Likely too basic for someone with a decent stats background, but confusing for someone without. This unit hits home how it isn’t possible to build a stats background in a couple of hours a week over four weeks. The peer assessment caters to this through criteria such as “Here’s your opportunity to give this project +1 for effort.”, with option “Yes, this was a nice attempt (regardless of correctness)”.
  1. Regression Models: As per statistical inference, but possibly even more confusing for those without a stats background.
  1. Practical Machine Learning: Not a bad course for getting across implementing a few machine learning models in R, but there are better background courses. Start with Andrew Ng’s Machine Learning, and then work through Stanford’s Statistical Learning (which also has great R materials). Then return to this unit for a slightly different perspective. As for many of the other specialisation units, it is at a level too high for someone with no background. For instance, there is no point where they actually describe what machine learning is.
  1. Developing Data Products: This course is quite good, covering some of the major publishing tools, such as Shiny, R Markdown and Plotly (although skip the videos on Swirl). The strength of this specialisation is training in R, and that is what this unit focuses on.
  1. Data Science Capstone: This course can be best thought of as a commitment device that will force you to learn a certain amount about natural language processing in R (the topic of the project). You are given a task with a set of milestones, and you’re left to figure it out for yourself. Unless you already know something about natural language processing, you will have to review other courses and materials and spend a lot of time on the discussion boards to get yourself across the line. Skip it and do a natural language processing course such as Coursera’s Applied Text Mining in Python (although this assumes a fair bit of skill in Python). Besides, you can only access the capstone if you have paid for and completed the other nine units in the specialisation.

Perrow’s Normal Accidents: Living with High-Risk Technologies

A typical story in Charles Perrow’s Normal Accidents: Living with High-Risk Technologies runs like this.

We start with a plant, airplane, ship, biology laboratory, or other setting with a lot of components (parts, procedures, operators). Then we need two or more failures among components that interact in some unexpected way. No one dreamed that when X failed, Y would also be out of order and the two failures would interact so as to both start a fire and silence the fire alarm. Furthermore, no one can figure out the interaction at the time and thus know what to do. The problem is just something that never occurred to the designers. Next time they will put in an extra alarm system and a fire suppressor, but who knows, that might just allow three more unexpected interactions among inevitable failures. This interacting tendency is a characteristic of a system, not of a part or an operator; we will call it the “interactive complexity” of the system.

For some systems that have this kind of complexity, … the accident will not spread and be serious because there is a lot of slack available, and time to spare, and other ways to get things done. But suppose the system is also “tightly coupled,” that is, processes happen very fast and can’t be turned off, the failed parts cannot be isolated from other parts, or there is no other way to keep the production going safely. Then recovery from the initial disturbance is not possible; it will spread quickly and irretrievably for at least some time. Indeed, operator action or the safety systems may make it worse, since for a time it is not known what the problem really is.

Take this example:

A commercial airplane … was flying at 35,000 feet over Iowa at night when a cabin fire broke out. It was caused by chafing on a bundle of wire. Normally this would cause nothing worse than a short between two wires whose insulations rubbed off, and there are fuses to take care of that. But it just so happened that the chafing took place where the wire bundle passed behind a coffee maker, in the service area in which the attendants have meals and drinks stored. One of the wires shorted to the coffee maker, introducing a much larger current into the system, enough to burn the material that wrapped the whole bundle of wires, burning the insulation off several of the wires. Multiple shorts occurred in the wires. This should have triggered a remote-control circuit breaker in the aft luggage compartment, where some of these wires terminated. However, the circuit breaker inexplicably did not operate, even though in subsequent tests it was found to be functional. … The wiring contained communication wiring and “accessory distribution wiring” that went to the cockpit.

As a result:

Warning lights did not come on, and no circuit breaker opened. The fire was extinguished but reignited twice during the descent and landing. Because fuel could not be dumped, an overweight (21,000 pounds), night, emergency landing was accomplished. Landing flaps and thrust reversing were unavailable, the antiskid was inoperative, and because heavy breaking was used, the brakes caught fire and subsequently failed. As a result, the aircraft overran the runway and stopped beyond the end where the passengers and crew disembarked.

As Perrow notes, there is nothing complicated in putting a coffee maker on a commercial aircraft. But in a complex interactive system, simple additions can have large consequences.

Accidents of this type in complex, tightly coupled systems are what Perrow calls a “normal accident”. When Perrow uses the word “normal”, he does not mean these accidents are expected or predictable. Many of these accidents are baffling. Rather, it is an inherent property of the system to experience an interaction of this kind from time to time.

While it is fashionable to talk of culture as a solution to organisational failures, in complex and tightly coupled systems even the best culture is not enough. There is no improvement to culture, organisation or management that will eliminate the risk. That we continue to have accidents in industries with mature processes, good management and decent incentives not to blow up suggests there might be something intrinsic about the system behind these accidents.

Perrow’s message on how we should deal with systems prone to normal accidents is that we should stop trying to fix them in ways that only make them riskier. Adding more complexity is unlikely to work. We should focus instead on reducing the potential for catastrophe when there is failure.

In some cases, Perrow argues that the potential scale of the catastrophe is such that the systems should be banned. He argues nuclear weapons and nuclear energy are both out on this count. In other systems, the benefit is such that we should continue tinkering to reduce the chance of accidents, but accept they will occur despite our best efforts.

One possible approach to complex, tightly coupled systems is to reduce the coupling, although Perrow does not dwell deeply on this. He suggests that the aviation industry has done this to an extent through measures such as corridors that exclude certain types of flights. But in most of the systems he examines, decoupling appears difficult.

Despite Perrow’s thesis being that accidents are normal in some systems, and that no organisational improvement will eliminate them, he dedicates a considerable effort to critiquing management error, production pressures and general incompetence. The book could have been half the length with a more focused approach, but it does suggest that despite the inability to eliminate normal accidents, many complex, tightly coupled systems could be made safer through better incentives, competent management and the like.

Other interesting threads:

  • Normal Accidents was published in 1984, but the edition I read had an afterword written in 1999 in which Perrow examined new domains to which normal accident theory might be applied. Foreshadowing how I first came across the concept, he points to financial markets as a new domain for application. I first heard of “normal accidents” in Tim Harford’s discussion financial markets in Adapt. Perrow’s analysis of the upcoming Y2K bug under his framework seems slightly overblown in hindsight.
  • The maritime accident chapter introduced (to me) the concepts of radar assisted collisions and non collision course collisions. Radar assisted collisions are a great example of the Peltzman effect, whereby vessels that would have once remained stationary or crawled through fog now speed through. The first vessels with radar were comforted that they could see all the stationary or slow-moving obstacles as dots on their radar screen. But as the number of vessels with radars increased and those other dots also start moving with speed, we have more radar assisted collisions. On non collision course collisions, Perrow notes that most collisions involve two (or more) ships that were not on a collision course, but on becoming aware of each other managed to change course to effect a collision. Coordination failures are rife.
  • Perrow argues that nuclear weapon systems are so complex and prone to failure that there is inherent protection against catastrophic accident. Not enough pieces are likely to work to give us the catastrophe. Of course, this gives reason for concern about whether they will work when we actually need them (again, maybe a positive). Perrow even asks if complexity and coupling can be so problematic that the system ceases to exist.
  • Perrow spends some time critiquing hindsight bias in assessing accidents. He gives one example of a Union Carbide plant that received a glowing report from a US government department. Following an accidental gas release some months later, that same government department described the plant as accident waiting to happen. I recommend Phil Rosenzweig’s The Halo Effect for a great analysis of this problem in assessing the factors behind business performance after the fact.

The benefit of doing nothing

From Tim Harford:

[I]n many areas of life we demand action when inaction would serve us better.

The most obvious example is in finance, where too many retail investors trade far too often. One study, by Brad Barber and Terrance Odean, found that the more retail investors traded, the further behind the market they lagged: active traders underperformed by more than 6 percentage points (a third of total returns) while the laziest investors enjoyed the best performance.

This is because dormant investors not only save on trading costs but avoid ill-timed moves. Another study, by Ilia Dichev, noted a distinct tendency for retail investors to pile in when stocks were riding high and to sell out at low points. …

The same can be said of medicine. It is a little unfair on doctors to point out that when they go on strike, the death rate falls. Nevertheless it is true. It is also true that we often encourage doctors to act when they should not. In the US, doctors tend to be financially rewarded for hyperactivity; everywhere, pressure comes from anxious patients. Wiser doctors resist the temptation to intervene when there is little to be gained from doing so — but it would be better if the temptation was not there. …

Harford also reflects on the competition between humans and computers, covering similar territory to that in my Behavioral Scientist article Don’t Touch the Computer (even referencing the same joke).

The argument for passivity has been strengthened by the rise of computers, which are now better than us at making all sorts of decisions. We have been resisting this conclusion for 63 years, since the psychologist Paul Meehl published Clinical vs. Statistical Prediction. Meehl later dubbed it “my disturbing little book”: it was an investigation of whether the informal judgments of experts could outperform straightforward statistical predictions on matters such as whether a felon would violate parole.

The experts almost always lost, and the algorithms are a lot cleverer these days than in 1954. It is unnerving how often we are better off without humans in charge. (Cue the old joke about the ideal co-pilot: a dog whose job is to bite the pilot if he touches the controls.)

The full article is here.

Alter’s Irresistible: Why We Can’t Stop Checking, Scrolling, Clicking and Watching

I have a lot of sympathy for Adam Alter’s case in Irresistible: Why We Can’t Stop Checking, Scrolling, Clicking and Watching. Despite the abundant benefits of being online, the hours I have burnt over the last 20 years through aimless internet wandering and social media engagement could easily have delivered a book or another PhD.

It’s unsurprising that we are surrounded by addictive tech. Game, website and app designers are all designing their products to gain and hold our attention. In particular, the tools at the disposal of modern developers are fantastic at introducing what Alter describes as the six ingredients of behavioural addition:

[C]ompelling goals that are just beyond reach; irresistible and unpredictable positive feedback; a sense of incremental progress and improvement; tasks that become slowly more difficult over time; unresolved tensions that demand resolution; and strong social connections.

Behavioural addictions have a lot of similarity with substance addictions (some people question whether we should distinguish between them at all). They activate the same brain regions. They are fueled by some of the same human needs, such as the need for social engagement and support, mental stimulation and a sense of effectiveness. [Parts of the book seem to be a good primer on addiction, although see my endnote.]

Based on one survey of the literature, as many as 41 per cent of the population may have suffered a behavioural addiction in the past month. While having so many people classified as addicts dilutes the concept of “addiction”, it does not seem unrealistic given the way many people use tech.

As might be expected given the challenge, Alter’s solutions on how we can manage addiction in the modern world fall somewhat short of providing a fix. For one, Alter suggests we need to start training the young when they are first exposed to technology. However, it is likely that the traps present in later life will be much different from those present when young. After all, most of Alter’s examples of addicts were born well before the advent of World of Warcraft, the iPhone or the iPad that derailed them.

Further, the ability of tech to capture our attention is only in its infancy. It is not hard to imagine the eventual creation of immersive virtual worlds so attractive that some people will never want to leave.

Alter’s chapter on gamification is interesting. Gamification is the idea of turning a non-game experience into a game. One of the more inane but common examples of gamification is turning a set of stairs into a piano to encourage people to take those stairs in preference to the neighbouring escalator (see on YouTube). People get more exercise as a result.

The flip side is that gamification is part of the problem itself (unsurprising given the theme of Alter’s book). For example, exercise addicts using wearables can lose sight of why they are exercising. They push on for their gamified goals despite injuries and other costs. One critic introduced by Alter is particularly scathing:

Bogost suggested that gamification “was invented by consultants as a means to capture the wild, coveted beast that is video games and to domesticate it.” Bogost criticized gamification because it undermined the “gamer’s” well-being. At best, it was indifferent to his well-being, pushing an agenda that he had little choice but to pursue. Such is the power of game design: a well-designed game fuels behavioral addiction. …

But Bogost makes an important point when he says that not everything should be a game. Take the case of a young child who prefers not to eat. One option is to turn eating into a game—to fly the food into his mouth like an airplane. That makes sense right now, maybe, but in the long run the child sees eating as a game. It takes on the properties of games: it must be fun and engaging and interesting, or else it isn’t worth doing. Instead of developing the motivation to eat because food is sustaining and nourishing, he learns that eating is a game.

Taking this critique further, Alter notes that “[c]ute gamified interventions like the piano stairs are charming, but they’re unlikely to change how people approach exercise tomorrow, next week, or next year.” [Also read this story about Bogost and his game Cow Clicker.]

There are plenty of other interesting snippets in the book. Here’s one on uncertainty of reward:

Each one [pigeon] waddled up to a small button and pecked persistently, hoping that it would release a tray of Purina pigeon pellets. … During some trials, Zeiler would program the button so it delivered food every time the pigeons pecked; during others, he programmed the button so it delivered food only some of the time. Sometimes the pigeons would peck in vain, the button would turn red, and they’d receive nothing but frustration.

When I first learned about Zeiler’s work, I expected the consistent schedule to work best. If the button doesn’t predict the arrival of food perfectly, the pigeon’s motivation to peck should decline, just as a factory worker’s motivation would decline if you only paid him for some of the gadgets he assembled. But that’s not what happened at all. Like tiny feathered gamblers, the pigeons pecked at the button more feverishly when it released food 50–70 percent of the time. (When Zeiler set the button to produce food only once in every ten pecks, the disheartened pigeons stopped responding altogether.) The results weren’t even close: they pecked almost twice as often when the reward wasn’t guaranteed. Their brains, it turned out, were releasing far more dopamine when the reward was unexpected than when it was predictable.

I have often wondered to what extent surfing is attractive due to the uncertain arrival of waves during a session, or the inconsistency in swell from day-to-day.


Now for a closing gripe. Alter tells the following story:

When young adults begin driving, they’re asked to decide whether to become organ donors. Psychologists Eric Johnson and Dan Goldstein noticed that organ donations rates in Europe varied dramatically from country to country. Even countries with overlapping cultures differed. In Denmark the donation rate was 4 percent; in Sweden it was 86 percent. In Germany the rate was 12 percent; in Austria it was nearly 100 percent. In the Netherlands, 28 percent were donors, while in Belgium the rate was 98 percent. Not even a huge educational campaign in the Netherlands managed to raise the donation rate. So if culture and education weren’t responsible, why were some countries more willing to donate than others?

The answer had everything to do with a simple tweak in wording. Some countries asked drivers to opt in by checking a box:

If you are willing to donate your organs, please check this box: □

Checking a box doesn’t seem like a major hurdle, but even small hurdles loom large when people are trying to decide how their organs should be used when they die. That’s not the sort of question we know how to answer without help, so many of us take the path of least resistance by not checking the box, and moving on with our lives. That’s exactly how countries like Denmark, Germany, and the Netherlands asked the question—and they all had very low donation rates.

Countries like Sweden, Austria, and Belgium have for many years asked young drivers to opt out of donating their organs by checking a box:

If you are NOT willing to donate your organs, please check this box: □

The only difference here is that people are donors by default. They have to actively check a box to remove themselves from the donor list. It’s still a big decision, and people still routinely prefer not to check the box. But this explains why some countries enjoy donation rates of 99 percent, while others lag far behind with donation rates of just 4 percent.

This story is rubbish, as I have posted about here, here, here and here. This difference has nothing to do with ticking boxes on driver’s licence forms. In Austria they are never even asked. 99 per cent of Austrians aren’t organ donors in the way anyone would normally define it. 99% are presumed to consent, and if they happen to die their organs might not be taken because the family objects (or whatever other obstacle gets in the way) in the absence of any understanding of the actual intentions of the deceased.

To top it off, Alter embellishes the incorrect version of the story as told by Daniel Kahneman or Dan Ariely with phrasing from driver’s licence forms that simply don’t exist. Did he even read the Johnson and Goldstein paper (ungated copy)?

After reading a well-written and entertaining book about a subject I don’t know much about, I’m left questioning whether this is a single slip or Alter’s general approach to his writing and research. How many other factoids from the book simply won’t hold up once I go to the original source?

Rats in a casino

From Adam Alter’s Irresistible: Why We Can’t Stop Checking, Scrolling, Clicking and Watching:

Juice refers to the layer of surface feedback that sits above the game’s rules. It isn’t essential to the game, but it’s essential to the game’s success. Without juice, the same game loses its charm. Think of candies replaced by gray bricks and none of the reinforcing sights and sounds that make the game fun. …

Juice is effective in part because it triggers very primitive parts of the brain. To show this, Michael Barrus and Catharine Winstanley, psychologists at the University of British Columbia, created a “rat casino.” The rats in the experiment gambled for delicious sugar pellets by pushing their noses through one of four small holes. Some of the holes were low-risk options with small rewards. One, for example, produced one sugar pellet 90 percent of the time, but punished the rat 10 percent of the time by forcing him to wait five seconds before the casino would respond to his next nose poke. (Rats are impatient, so even small waits register as punishments.) Other holes were high-risk options with larger rewards. The riskiest hole produced four pellets, but only 40 percent of the time—on 60 percent of trials, the rat was forced to wait in time-out for forty seconds, a relative eternity.

Most of the time, rats tend to be risk-averse, preferring the low-risk options with small payouts. But that approach changed completely for rats who played in a casino with rewarding tones and flashing lights. Those rats were far more risk-seeking, spurred on by the double-promise of sugar pellets and reinforcing signals. Like human gamblers, they were sucked in by juice. “I was surprised, not that it worked, but how well it worked,” Barrus said. “We expected that adding these stimulating cues would have an effect. But we didn’t realize that it would shift decision making so much.”

I’ll post some other thoughts on the book later this week.

Ip’s Foolproof: Why Safety Can Be Dangerous and How Danger Makes Us Safe

Greg Ip’s framework in Foolproof: Why Safety Can Be Dangerous and How Danger Makes Us Safe is the contrast between what he calls the ecologists and engineers. Engineers seek to use the sum of our human knowledge to make us safer and the world more stable. Ecologists recognise that the world is complex and that people adapt, meaning that many of our solutions will have unintended consequences that can be worse than the problems we are trying to solve.

Much of Ip’s book is a catalogue of the failures of engineering. Build more and larger levees, and people will move into those flood protected areas. When the levees eventually fail, the damage is larger than it would otherwise have been. There is a self reinforcing link between flood protection and development, ensuring the disasters grow in scale.

Similarly, if you put out every forest fire as soon as it pops up, eventually a large fire will get out of control and take advantage of the build up in fuel that occurred due to the suppression of the earlier fires.

Despite these engineering failures, there is often pressure for regulators or those with responsibility to keep us safe to act as engineers. In Yellowstone National Park, the “ecologists” had taken the perspective that fires did not have to be suppressed immediately, as in combination with prescribed burning they could reduce the build up of fuel. But the economic interests around Yellowstone, largely associated with tourism, fought this use of fire. After all, prescribed burning and letting fires burn for a while is not costless or risk free. But the build up of fuel from failure to bear those short term costs or risks, as much of the pressure was on them to do, results in the long-term risk of a massive fire.

Despite the problems with engineers, Ip suggests we need to take the best of both the engineering and ecologist approaches in addressing safety. Engineers have made car crashes more survivable. Improved flood protection allows us to develop areas that were previously out of reach. What we need to do, however, is not expect too much of the engineers. You cannot eliminate risks and accidents. Some steps to do so will simply shift, change or exacerbate the risk.

One element of Ip’s case for retaining parts of the engineering approach is confidence. People need a degree of confidence or they won’t take any risks. There are many risks we want people to take, such as starting a business or trusting their money with a bank. The evaporation of confidence can be the problem itself, so if you prevent the loss of confidence, you don’t actually need to deploy the safety device. Deposit insurance is the classic example.

Ip ultimately breaks down the balance of engineering and ecology to a desire to maximise the units of innovation per unit of instability. An acceptance of instability is required for people to innovate. This could be through granting people the freedom to take risks, or by creating an impression of safety (and a degree of moral hazard – the taking of risks when the costs are not borne by the risk taker) to retain confidence.

Despite being an attempt to balance the two approaches, the innovation versus instability formula sounds much like what an engineer might suggest. I agree with Ip that the simple ecologist solution of removing the impression of safety to expunge moral hazard is not without costs. But it is not clear to me that you would ever get this balance right through design. Part of the appeal of the ecologist approach is the acceptance of the complexity of these systems and an acknowledgement to the limits of our knowledge about them.

Another way that Ip frames his balanced landing point is that we should accept small risks and the benefits, and save the engineering for the big problems. Ip hints at, but does not directly get to, Taleb’s concept of anti-fragility in this idea. Antifragility would see us develop a system where those small shocks strengthen the system and not simply being a cost we incur to avoid moral hazard.

The price of risk

Some of Ip’s argument is captured by what is known as the Peltzman effect, named after University of Chicago economist Sam Peltzman. Peltzman published a paper in 1975 examining the effect of safety improvements in cars over the previous 10 years. Peltzman found a reduction in deaths per mile travelled for vehicle occupants, but also an increase in pedestrian injuries and property damage.

Peltzman’s point was that risky driving has a price. If safety improvements reduce that price, people will take more risk. The costs of that additional risk can offsett the safety gains.

While this is in some ways an application of basic economics – make something cheaper and people will consume more – the empirical evidence on the Peltzman effect is interesting.

On one level, it is obvious that the Peltzman effect does not make all safety improvements a waste of effort. The large declines in driver deaths relative to the distance travelled over the last 50 years, without fully offsetting pedestrian deaths or other damage, establishes this case.

But when you look at individual safety improvements, there are some interesting outcomes. In the case of seat belts, empirical evidence suggests the absence of the Peltzman effect. For example, one study looked at the effects across states as each introduced seatbelt laws and found a decrease in deaths but no increase in pedestrian fatalities.

In contrast, anti-lock brakes were predicted to materially reduce crashes, but the evidence suggests effectively no net change. Drivers with anti-lock brakes drive faster and brake harder. While reducing some risks – less front-end collisions – they increase others – such as the increased rear end collisions induced by their hard braking behaviour.

So why the difference between seatbelts and anti-lock brakes? Ip argues that the difference depends on what the safety improvement allows us to do and how it feeds back into our behaviour. Anti-lock brakes give a driver with a feeling of control and a belief they can drive faster. This belief is correct, but occasionally it backfires and they have an accident they would not have had otherwise. With seatbelts, most people want to avoid a crash and a car crash remains unpleasant even when wearing a seatbelt. At many times the seatbelt is not even in people’s minds.

Irrational risk taking?

One of the interesting threads through the book (albeit one that I wish Ip had explored in more detail) is the mix of rational and irrational decision making in our approach to risk.

Much of this “irrationality” concerns our myopia. We rebuild on sites where hurricanes and storms have swept away or destroyed the previous structures. The lack of personal experience with the disaster leads people to underweight the probability. We also have short memories, with houses built immediately after a hurricane being more likely to survive the next hurricane than those built a few years later.

A contrasting effect is our fear response to vivid events, which leads us to overweight them in our decision making despite the larger costs of the alternative.

But despite the ease in spotting these anomalies, for many of Ip’s real world examples of individual actions that might by myopic or irrational it wouldn’t be hard to craft an argument that the individual might be making a good decision. If the previous building on the site was destroyed by a hurricane, can you still get flood insurance (possibly subsidised), making it a good investment all the same? As Ip points out, there are also many benefits to living in disaster prone areas, which are often sites of great economic opportunity (such as proximity to water).

In a similar vein, Ip points to the individual irrationality of “overconfident” entrepreneurs, whose businesses will more often than not end up failing. But as catalogued by Phil Rosenzweig, the idea that these “failed” businesses generally involve large losses is wrong. Overconfident is a poor word to describe these entrepreneurs’ actions (see also here on overconfidence).

I have a other few quibbles with the book. One was when Ip’s discussion of our response to uncertainty conflated risk aversion with loss aversion, the certainty effect and the endowment effect. But as I say, they are just quibbles. Ip’s book is well worth the read.

Does presuming you can take a person’s organs save lives?

I’ve pointed out several times on this blog the confused story about organ donation arising from Johnson and Goldstein’s Do Defaults Save Lives? (ungated pdf). Even greats such as Daniel Kahneman are not immune from misinterpreting what is going on.

Again, here’s Dan Ariely explaining the paper:

One of my favorite graphs in all of social science is the following plot from an inspiring paper by Eric Johnson and Daniel Goldstein. This graph shows the percentage of people, across different European countries, who are willing to donate their organs after they pass away. …

But you will notice that pairs of similar countries have very different levels of organ donations. For example, take the following pairs of countries: Denmark and Sweden; the Netherlands and Belgium; Austria and Germany (and depending on your individual perspective France and the UK). These are countries that we usually think of as rather similar in terms of culture, religion, etc., yet their levels of organ donations are very different.

So, what could explain these differences? It turns out that it is the design of the form at the DMV. In countries where the form is set as “opt-in” (check this box if you want to participate in the organ donation program) people do not check the box and as a consequence they do not become a part of the program. In countries where the form is set as “opt-out” (check this box if you don’t want to participate in the organ donation program) people also do not check the box and are automatically enrolled in the program. In both cases large proportions of people simply adopt the default option.

Johnson and Goldstein (2003) Organ donation rates in Europe

I keep hearing this story in new places, so it’s clearly got some life to it (and I’ll keep harping on about it). The problem is that there is no DMV form. These aren’t people “willing” to donate their organs. And a turn to the second page of Johnson and Goldstein’s paper makes it clear that the translation from “presumed consent” to donation appears mildly positive but is far from direct. 99.98% of Austrians (or deceased Austrians with organs suitable for donation) are not organ donors.

Although Johnson and Goldstein should not be blamed for the incorrect stories arising from their paper, I suspect their choice of title – particularly the word “default” – has played some part in allowing the incorrect stories to linger. What of an alternative title “Does presuming you can take a person’s organs save lives?”

One person who is clear on the story is Richard Thaler. In his surprisingly good book Misbehaving (I went in with low expectations after reading some reviews), Thaler gives his angle on this story:

In other cases, the research caused us to change our views on some subject. A good example of this is organ donations. When we made our list of topics, this was one of the first on the list because we knew of a paper that Eric Johnson had written with Daniel Goldstein on the powerful effect of default options in this domain. Most countries adopt some version of an opt-in policy, whereby donors have to take some positive step such as filling in a form in order to have their name added to the donor registry list. However, some countries in Europe, such as Spain, have adopted an opt-out strategy that is called “presumed consent.” You are presumed to give your permission to have your organs harvested unless you explicitly take the option to opt out and put your name on a list of “non-donors.”

The findings of Johnson and Goldstein’s paper showed how powerful default options can be. In countries where the default is to be a donor, almost no one opts out, but in countries with an opt-in policy, often less than half of the population opts in! Here, we thought, was a simple policy prescription: switch to presumed consent. But then we dug deeper. It turns out that most countries with presumed consent do not implement the policy strictly. Instead, medical staff members continue to ask family members whether they have any objection to having the deceased relative’s organs donated. This question often comes at a time of severe emotional stress, since many organ donors die suddenly in some kind of accident. What is worse is that family members in countries with this regime may have no idea what the donor’s wishes were, since most people simply do nothing. That someone failed to fill out a form opting out of being a donor is not a strong indication of his actual beliefs.

We came to the conclusion that presumed consent was not, in fact, the best policy. Instead we liked a variant that had recently been adopted by the state of Illinois and is also used in other U.S. states. When people renew their driver’s license, they are asked whether they wish to be an organ donor. Simply asking people and immediately recording their choices makes it easy to sign up. In Alaska and Montana, this approach has achieved donation rates exceeding 80%. In the organ donation literature this policy was dubbed “mandated choice” and we adopted that term in the book.