Foundation and Reality

Cover of Foundation by Isaac AsimovLast week in the Guardian, economist Paul Krugman discussed the books that were the most influential to him and his life.  Much to my surprise, he cited Isaac Asimov's Foundation Trilogy: Foundation, Foundation and Empire and Second Foundation.  (Other books in the Foundation series came after this initial trilogy.)  Krugman's article does an excellent recap (with spoilers) about what happens in those books, and you are certainly encouraged to go and read that.  But I want to focus on the central conceit of the trilogy, Psychohistory. 

The Foundation books revolve around a scientist-historian named Hari Seldon, who developed an incredibly dense model for analyzing socio-historical trends that he termed "psychohistory." Using this model, he predicted that the galactic empire, within which this series of books is set, was on the brink of collapse and would enter a massive dark age.  However, he recognized that by manipulating certain social vectors that he could reduce this galactic dark age from something like 30,000 years of barbarity, to approximately 1,000 and be able to reconstitute the galactic empire anew.  To that end, he established The Foundation, whose purpose is to monitor this process and help keep the societal shift on pace with redevelopment.  That's a rather bold premise. 

Krugman looks at his work in macroeconomics as a kind of similar function, helping steer political and economic decisions through analysis, tracking, and looking at the impact of different chaotic vectors on the economies of the world. In his column, he talks about how chaotic systems -- like weather, politics and economics -- are far too complex to accurately predict over any long period of time. Science is really on Krugman's side with this analysis: On the YouTube channel Sixty Symbols, the professors at the University of Nottingham give a great 10-minute introduction to chaos theory and the butterfly effect. (For a deeper look into chaos theory, you should definitely pick up James Gleick's book Chaos.)

But there are things happening right now in the world of big data that are really starting to make us question the depth of chaotic systems, and whether number crunching history to predict the future is not so far out of the realm of reality. The biggest example right now is statistician Nate Silver's accurate prediction of the electoral vote count in the 2012 presidential election. The uncanny accuracy of Silver's predictions led some people, joking, to question whether or not he was actually a witch.

Interestingly, this leads us back to the one of the plot points in the Foundation series. Knowing that a dark age is coming -- and how science, education, libraries and intellectuals fared in dark ages -- the Foundation instead puts forth a false front as a religion. Using their knowledge of science, they are able to work "miracles" that reaffirm people's belief in the Foundation's holy power.  This is exactly what Arthur C. Clarke meant when he stated in his third law that, "Any sufficiently advanced technology is indistinguishable from magic."

But Nate Silver isn't doing witchcraft, just a deeply analytical form of mathematics. In fact, Nate Silver honed this skill in the world of baseball. You may remember the movie Moneyball with Brad Pitt; that film was based on Michael Lewis's book Moneyball: The Art of Winning an Unfair Game, which outlined the analytical method that was developed by the Society for American Baseball Research, that we now know as Sabremetrics.  Roughly, Sabremetrics analyzes players based on their performance data over a year of play, and then compares them to other players with similar records.  Using this analytical approach leads to incredibly accurate assessments of future performance, and this was demonstrated and proven by the performance of the 2002 Oakland Athletics.  Silver took a similar approach to analyzing polling data and the historic accuracy of those polls over time to predict political outcomes. Using the dozens and dozens of polls that come out on a near daily basis during an election season, he was able to crunch all that data, normalize it against past performance, and accurately forecast the exact outcome of all fifty states. 

That's pretty amazing.

And big data analysis doesn't stop at politics and baseball.  When Google scanned the collections of some of the world's best research libraries, they developed a textual repository the likes of which had never been seen in history. And because they had all of these texts in a searchable format, literary historians were able to begin doing textual analyses they had never been able to do before. Using the Google N-Gram Viewer, you can see the actual number of instances in which any particular word or phrase has been used over several hundred years of printed books.  This has sparked an entirely new field of literary and cultural analysis that has been dubbed "Culturomics."  While this may seem like it might be a throw-away attempt for the humanities to capitalize on data analysis, you can gain amazing amounts of insight about how the use of words over time can become predictors for things like outbreaks of war, cycles of drought and famine, and outbreaks of diseases. 

Cover of A New Kind of ScienceThough perhaps the most amazing exploration of big data comes from Stephen Wolfram. In his book A New Kind of Science, Wolfram talks about how the power of computation can help us discover correlations and analyze data to find causes and principles behind these complex systems.  If want to really have your mind blown, you should pop over the TED website and listen to his video, where he explains the computational systems that he uses to generate statistical models using his math engine Mathematica and analyze millions of different data sets using natural language queries in his data search engine Wolfram Alpha, and how having all of these data sets can help generate a statistical model for the development of the universe which, he hopes, will ultimately lead to a grand unified theory of physics.

So, will we be able to develop accurate models of social collapse and rebuilding on the scale of a galactic empire?  I think we're pretty far from it right now.  But we may be able to do some other equally miraculous things with data that we never even would have been able to dream of only 50 years ago.

-- Eric Riley