The Wizard and the Seer: Wordprint studies in the news again

08.12.2013 | Blair Hodges

I didn’t know about Robert Galbraith’s new crime novel The Cuckoo’s Calling until I heard it was actually written by J. K. Rowling, author of the Harry Potter series. An anonymous tweet tipped off the UK’s Sunday Times about the author’s possible identity, and the Times enlisted computer science professor Patrick Juola to investigate.

Over the past decade, Juola has dabbled in forensic linguistics, analyzing texts to identify distinct fingerprints left by authors. He performed a wordprint analysis of The Cuckoo’s Calling by comparing it to four other books and concluded that Rowling was the most probable candidate. He was quick to point out the limitations of his work, however. From the TIME magazine coverage:

“It doesn’t prove that [the Cuckoo author] was Rowling, but it’s a starting point,” he says. “In this particular case, I wasn’t that certain at all.” … Of those four [tested books], Cuckoo showed the highest similarity to Rowling’s work, but that only means the author was more likely to be Rowling than to be one of three other writers.

As it happens, Rowling was the author, the wordprint study was vindicated, and Rowling received an apology and settlement from the law firm that originally tipped off the Times.

But what if Juola had not included Rowling as a possible author in his tests? The tests would have identified another most probable author, but it wouldn’t have been Rowling. The test results are only as accurate as the selected candidates.

Rewind to 2008. Oxford’s Literary & Linguistic Computing published a wordprint study highlighting a new wordprint technique, which incidentally attributed authorship of the Book of Mormon to Sidney Rigdon. This wasn’t the first wordprint analysis of the Book of Mormon, but this one was published in a respected, mainstream academic publication. The Maxwell Institute’s Matthew Roper teamed up with Bruce Schaalje (BYU statistics professor) and Paul Fields (research and statistics consultant) to analyze the study. Roper handled the historical questions—the translation and publication timeline, the relationship between Joseph Smith and Sidney Rigdon—while Drs. Schaalje and Fields evaluated the paper’s proposed method (which, in good academic jargon, is referred to as “Nearest Shrunken Centroid Classification”).

Clearly, attributing authorship of the Book of Mormon to Sidney Rigdon directly challenges the claims Joseph and his associates made about the book’s marvelous translation. The quickest apologetic response would have been to simply note the technical problem that any tested group of potential authors would automatically identify a most probable author whether or not the actual author was included, as Juola pointed out in his analysis of the J. K. Rowling novel. Or a response could have simply emphasized problems with the historical claim that Smith and Rigdon collaborated prior to the publication of the Book of Mormon. The response from Schaalje, Fields, and Roper does point these problems out.

But the response goes further. The authors don’t simply acknowledge the interesting possibilities offered by the new proposed methodology; they improve on the technique itself. In this way, their paper on “Extended Nearest Shrunken Centroid Classification” is more than apologetics. It’s an excellent contribution to the field of literary and linguistic computing.

In 2011 their work was published in Oxford’s Literary and Linguistic Computing, while a popularization of the more technical paper appeared in the FARMS ReviewSchaalje’s work was also featured last fall in Frontiers magazine.

