Manzi’s Uncontrolled

ManziIn social science, a myriad of factors can affect outcomes. Think of all the factors claimed to affect school achievement – student characteristics such as intelligence, conscientiousness, patience and willingness to work hard, parental characteristics such as income and education, and then there is genetics, socioeconomic status, school peers, teacher quality, class size, local crime and so on.

In assessing the effect of any policy or program, researchers typically attempt to control for these confounding factors. But as James Manzi forcefully argues in Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society, the high “causal density” in these settings nearly always results in the possibility that there is an important factor you have missed or do not understand.

As a result, Manzi advocates the use of randomised field trials (RFTs) to attempt to tease out whether interventions are having the desired effect. If control and treatment groups are randomised, any unidentified factors affecting the outcome should affect each group equally.

The ubiquity of uncontrolled factors and the ability of RFTs to do a better job of capturing them was demonstrated by John Ioannidis in a 2005 paper evaluating the reliability of forty-nine studies. As Manzi reports, 90 per cent of the large randomized experiments had produced results that could be replicated, compared to only 20 per cent of the non-randomized studies.

Manzi notes that RFTs have critics and limitations, and people such as James Heckman have argued that it is possible to achieve the same results as RFTs using non-experimental mathematical techniques. However, as Manzi points out, Heckman and friends’ demonstration that RFT results can be replicated using improved econometric methods after the fact is not the same as defining a set of procedures that can produce the same effect as future RFTs.

Although Manzi is a strong advocate of RFTs, he is clear that RFTs will not lead to a new era where we will understand everything. High causal density will always place limits on the ability to generalise experimental results. Manzi writes:

[I]ncreasing complexity has another pernicious effect: it becomes far harder to generalize the results of experiments. We can run a clinical trial in Norfolk, Virginia, and conclude with tolerable reliability that “Vaccine X prevents disease Y.” We can’t conclude that if literacy program X works in Norfolk, then it will work everywhere. The real predictive rule is usually closer to something like “Literacy program X is effective for children in urban areas, and who have the following range of incomes and prior test scores, when the following alternatives are not available in the school district, and the teachers have the following qualifications, and overall economic conditions in the district are within the following range.” And by the way, even this predictive rule stops working ten years from now, when different background conditions obtain in the society.

Manzi’s critique of the famous jam study is indicative. Can you truly generalise from 10 hours in one store with shoppers randomised into one hour chunks? Taken literally, the result implies that eliminating 75 per cent of products could increase sales by 900 per cent. However, that hasn’t stopped popularisers telescoping “the conclusions derived from one coupon-plus-display promotion in one store on two Saturdays, up through assertions about the impact of product selection for jam for this store, to the impact of product selection for jam for all grocery stores in America, to claims about the impact of product selection for all retail products of any kind in every store, ultimately to fairly grandiose claims about the benefits of choice to society.”

It’s not hard to come up other studies that are generalised in this matter. The Perry Pre-School project that found benefits for disadvantaged African American children in public pre-schools in the 1960s is generalised to promote more intensive early childhood education for everyone, regardless of country, race, socioeconomic status or era. A single Kenyan case study of deworming leads to a plan to deworm the world. And so on.

As a result, succeeding or failing in a single trial doesn’t usually constitute adequate evaluation of a program. Rather, promising ideas need to be subject to iterative evaluation in the relevant contexts.

Manzi’s reluctance to suggest RFTs will lead us to a new era also stems from the results of the few RFTs conducted in social science. Most programs fail replicated, independent, well-designed RFTs, so we should be sceptical of claims about the effectiveness of new programs. As Manzi states, innovative ideas rarely work.

In his review of RFTs in the social sciences, he does suggest one pattern emerges. Programs targeted at improving behaviour or raising skills or consciousness are more likely to fail than changes in incentives or environment. This might be considered a nod to both standard and behavioural economic tools.

At the end of the book, Manzi provides some guidance on how government should consider programs in an environment of high causal density.

First, he recommends that government build strong experimental capability. To keep the foxes out of the henhouse and avoid program advocates influencing results, he recommends a separate organisational entity be established to evaluate programs.

Second, there should be experimentation at the state level, or at the smallest possible competent authority. This might involve state by state deviation from Federal laws or programs on a trial basis.

Manzi recommends a broader scope for experimentation than you might normally hear advocated, with his suggestion that experimentation extend to examining different levels of coercion:

The characteristic error of the contemporary Right and Left in this is enforcing too many social norms on a national basis. All that has varied has been which norms predominate. The characteristic error of liberty-as-goal libertarians has been more subtle but no less severe: the parallel failure to appreciate that a national rule of “no restrictions on non-coercive behavior” (which, admittedly, is something of a cartoon) contravenes a primary rationale for liberty. What if social conservatives are right and the wheels really will come off society in the long run if we don’t legally restrict various sexual behaviors? What if some left-wing economists are right and it is better to have aggressive zoning laws that prohibit big-box retailers? I think both are mistaken, but I might be wrong. What if I’m right for some people at this moment in time but wrong for others, or what if I’m wrong for the same people ten years from now?

The freedom to experiment needs to include freedom to experiment with different governmental (i.e., coercive) rules. So here we have the paradox: a liberty-as-means libertarian ought to support, in many cases, local autonomy to restrict at least some personal freedoms.

To enable experimentation, Manzi uses an evolutionary framing and notes there is a need to encourage variation, cross-pollination of ideas and selection pressure. Encouraging variation requires a willingness to allow failure and deviation from whatever vision of social organisation we believe is best.

Our ignorance demands that we let social evolution operate as the least bad of the alternatives for determining what works. Subsocieties that behave differently on many dimensions are both the raw materials for an evolutionary process that sifts through and hybridizes alternative institutions, and also are analogous to the kind of evolutionary “reserves” of variation that may not be adaptive now but might be in some future changed environment. We want variation in human social arrangements for some of the same reasons that biodiversity can be useful in genetic evolution. This is the standard libertarian insight that the open society is well suited to developing knowledge in the face of a complex and changing environment. As per the first two parts of this book, it remains valid. But if we take our ignorance seriously, the implications of this insight significantly diverge from much of what the modern libertarian movement espouses.

Manzi highlights the importance of selection pressure is his discussion of school vouchers. He considers that “giving choice” to parents does not necessarily provide an environment in which trial-and-error improvement will occur as there may not be alternatives to status quo, the right incentives for market participants or adequate information for parents. Manzi is also sceptical as to whether taxpayer funded vouchers will come with so many controls to render the experiment useless.

Manzi’s proposals to provide selection pressure are not without problems. He suggests a comprehensive national exam for all schools receiving government funding, with those results published. But is the need to do well in this test is a form of control that kills off much of the experimentation, turning the education system into a group of organisations competing for high test scores?

One of Manzi’s more interesting ideas relates to immigration. Manzi supports programs to attract highly skilled immigrants, such as skills-based immigration programs, or offering entry to foreign students upon completing certain degrees. He proposes testing this idea by using a subset of the visas granted through lotteries to run a RFT. Immigrant outcomes could then be tracked.

Ultimately, however, Manzi’s message is one of humility. No matter what our worldview, we should be prepared to allow experimentation with alternatives, as we may well be wrong. And that favourite program you have been promoting? Feel free to experiment, but don’t expect success. And if it works in that context, test and test again, as it may not work somewhere else.


  1. Interesting post! I will have to check this book out. I would just add that my own skepticism of RFTs would extend beyond problems with generalizing their findings. Specifically, I would argue that even when RFTs can tell us THAT a program works, they can never really tell us WHY it work. So I would say it is even harder than Manzi implies to take the lessons learned from creating and implementing one intervention and apply them to another.

    This isn’t a problem unique to the social sciences either. Clinical trials in medicine are essentially RFTs. And while a properly administered clinical trial can help you figure whether a particular compound can treat a particular condition, it will tell you absolutely nothing about why that compound is effective. So finding one drug, won’t necessarily help you discover another.

  2. Economics can never be reduced to mathmatical analysis alone. Intuition often leads the way to new hypothesis. “Our Political Nature: The evolutionary origins of what divides us” by Avi Tuschman provides an example of how mathmatical analysis can be put together in an intuitive direction.

  3. “[…] turning the education system into a group of organisations competing for high test scores?” What about making other measures mandatory as well. Suicide rate for instance would be an easy and important one to measure. If test scores go up but so do suicide rates then alarms should go off for sure. Same if physical fitness goes down. Income of past pupils and self reported happiness in school as well as later in life would also be helpful.

Comments welcome

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s