When James Shinn was working for the CIA as a senior East Asia expert more than a decade ago, he longed for the tools of a weatherman. He wanted to be able to predict that the chance of North Korea test-firing a missile within a month was, say, 60 percent. It remained a fantasy, he says, until now.
Shinn and his 14-person team at Predata have developed software that numerically describes political volatility and risk. It vacuums up vast quantities of data from online conversations and comments, compares them with past patterns, and spits out a probability. (A version of Predata’s service is accessible on the Bloomberg Professional service.) Shinn likens his product to sabermetrics, the statistics-driven baseball strategy popularized in Michael Lewis’s Moneyball. “By carefully gathering lots and lots of statistics on their past performance from all corners of the Internet, we are predicting how a large number of players on a team will bat or pitch in the future,” Shinn says, by way of analogy.
Predata doesn’t replace human analysts so much as offer them a new tool. Without people choosing what to follow, metadata scraping has limited use. Moreover, Shinn argues, while risk-analysis companies are increasingly offering clients numerical percentages, the data are often pulled from the air. “This is a machine-driven, carefully calculated risk index,” says Shinn, the company’s founder and chief executive officer. “There is no arbitrary scoring by a human analyst.”
Each day, Predata monitors about 1,000 Twitter feeds, 10,000 Wikipedia pages, 50,000 YouTube videos, and several dozen newspapers and magazines in some 200 countries. It covers 300 topics, including news about individual companies, the debate over the U.K. leaving the European Union, and interest rate decisions by central banks.
Historical data is paramount. For instance, Predata didn’t make a statistically useful prediction for the March 22 attacks in Brussels, in part because Belgium had experienced few such incidents. The software needs at least five previous events to find a correlation between digital conversations and an act of terrorism, according to Shinn. France, on the other hand, had witnessed 13 incidents prior to the Paris attacks on Nov. 13; the company says that its model indicated the likelihood of an event being at least 61 percent a month in advance. Similarly, on Dec. 27, Predata says it calculated a 68 percent chance that North Korea would engage in some activity regarding weapons of mass destruction within 45 days. Almost two weeks later, on Jan. 6, the Kim Jong Un regime conducted the nation’s fourth nuclear test.
Shinn, who served as an assistant secretary for East Asia at the U.S. Department of Defense after his CIA stint, began developing the technology in 2014 while teaching at his alma mater, Princeton, and serving on the advisory board of Kensho Technologies, an analytics software developer for investment management. Kensho’s CEO, Daniel Nadler, and Shinn experimented in their free time with a crude prototype that monitored online conversations among labor unions in South Africa, thinking the data offered a handle on the country’s volatility. They found that back-and-forth argumentation in English and Afrikaans on sites as public as the Wikipedia pages of the unions spiked before mining strikes, after which gold and platinum prices surged.