Computers have a reputation for being able to churn through numbers with limited intuition. Now, though, an algorithm developed by researchers at MIT to find predictive patterns in unfamiliar data has performed better than two-thirds of human teams.
The researchers, from MIT’s Computer Science and Artificial Intelligence Laboratory, are trying to take some of the strain out of analyzing large data sets, by creating algorithms that can identify interesting features hidden in gigantic pools of figures. They give examples such as this, for instance:
In a database containing, say, the beginning and end dates of various sales promotions and weekly profits, the crucial data may not be the dates themselves but the spans between them, or not the total profits but the averages across those spans.
Spotting that kind of insight is much easier for humans than it is for computers, and it’s what the team has been trying to get an algorithm to achieve. The result is a piece of software that they call Data Science Machine, and to test it they entered a prototype into a series of data science competitions, where it was pitted against human teams to identify predictive patterns in unfamiliar data sets.
It did pretty well.
Across the three competitions in aggregate, it managed to beat 615 of 906 human teams. And in two of the three competitions, its predictions were 94 percent and 96 percent as accurate as the winning teams (in the third, it only managed to be 87 percent as accurate as the winners). But, as MIT News points out, the human teams spent days, weeks, or in some cases months reaching their conclusions; Data Science Machine took between 2 to 12 hours at the most. Their findings are to be presented next week at the IEEE International Conference on Data Science and Advanced Analytics, but you can already read their paper online.
The algorithm uses several tricks to replicate the abilities of humans. First, it uses the structure of the databases it analyzes to create a bewildering array of new metrics for comparison, and then performs a series of different calculations to find correlations between those new metrics. It also pays special attentions to categorical data — like a name of the month or a brand name — and then studies relationships between new metrics and these categories.
While it’s unlikely such algorithms will become a replacement for human intuition, it seems plausible that they could help make the analysis of large pools of data a little faster. “There’s so much data out there to be analyzed,” explains, Max Kanter the lead authors of the research paper, in a press release. “And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”
Image by r2hox under Creative Commons license