Keywords: Popular science, statistics, data analysis

Title: Super Crunchers

Author: Ian Ayres

Publisher: Bantam

ISBN: 0553384732


On the face of it a pop science book about data mining and linear regression hardly seems to be a recipe for a best seller, but that is precisely what Ian Ayres has managed to do with Super Crunchers. Subtitled 'How Anything Can Be Predicted', this slim little tome is a catalogue of anecdotes and stories that illustrate the ways in which large datasets, fast computers and clever algorithms can be combined to discover interesting and unexpected correlations buried deep in the numbers. And, all of this is done with minimal mathematical or technical content - this certainly isn't a book that is going to scare off the average mathematophobe.

Ayres primary thesis is that the conjunction of massive amounts of data, fast computers and statistical techniques means that ever more issues and decisions become amenable to analysis and fact-based decision-making. He looks at fields as diverse as wine-drinking, sports, economics and medicine to show how intuition and human expertise are being out-performed by numerical analysis. In many cases the number crunchers out-perform the old-fashioned expert by some considerable degree. This of course causes as well as solves problems, particularly when vested interests are at work. Who wants to be out-smarted by an equation?

Some of the most interesting material covers health-care and the quest for evidence-based medicine. This includes diagnostic assistance (what used to be called an expert system a long, long time ago), and moves that try and find evidence to back up the efficacy (or not) of common health-care procedures and protocols.

Such work raises other issues, particularly when it comes to privacy and surveillance. The power of computers and the ubiquity of networked data makes it easy for governments to track citizens and for companies to manipulate consumers. The book doesn't shy away from looking at these issues and discusses how the technology can also be used by the consumer to gain back some power.

For those who are already versed in the technology there's not really much new to be gleaned here. There's certainly not much in the way of discussion of the different kinds of data mining techniques on offer - this isn't the place to read about machine learning, association rules, genetic algorithms and the like. What discussion there is of neural networks is very high-level. This really is a book aimed at the general reader who only has the haziest of ideas about how this stuff works.

The big claim is that 'anything can be predicted', and unfortunately the author doesn't really test this assertion in any way. Climate change, for example, is one area where number crunching doesn't deliver the goods - the global climate models that underpin much of the work of the Intergovernmental Panel on Climate Change (IPCC) are seriously flawed. It would have been interesting for this to have been covered, and similarly other fields at the limits of statistical analysis.

There are some who have compared the book to Freakonomics, but this is a book that doesn't carry the same weight. It's interesting, but where Freakonomics included some genuinely surprising results, the same can't be said of this book. On the other hand for those who want to know how an online store can accurately make recommendations for future purchases, this is as good an introduction as any.

Contents © London Book Review 2008. Published September 01 2008