It's tough for humans to predict how well a book will sell until after it's published—it's something of a gamble. But now, a new algorithm can tell if a book will be a commercial success or not long before it hits the shelves—with a staggering 84 percent accuracy.
A team of researchers from Stony Brook University, New York, has been developing a system called statistical stylometry to mathematically examine words and grammar in books. Turns out, it can predict way better than most humans if a novel will be a bestseller or not.
The team worked with a pretty sizable corpus—all the classic books held in the Project Gutenberg archive—and analyzed all the texts, developing an algorithm that predicted success based on lexicon and structure. Then, they compared its predictions to historical data about the success of all the books. The algorithm's predictions of success matched the real-world data 84 percent of the time.
So, what makes a best-seller? There are a few key findings, according to the researchers:
- Successful books make heavy use of conjunctions—like "and" and "but"—as well as large numbers of nouns and adjectives.
- Unsuccessful works include more verbs and adverbs, explicitly describing actions and emotions—like "wanted," "took" or "promised."
- Verbs in successful books more commonly describe thought processes—like"recognized" or "remembered."
Amusingly, the researchers scoured Amazon for low-ranking books to test their algorithm on poorly written and unsuccessful books, and their findings were borne out there, too. The research is published by the Association of Computational Linguistics.
Image by slightly everything under Creative Commons license