Thursday, December 23, 2010

Google's ngrams: for entertainment only?

I admit it: I like to look for the problems, the bugs, the anomalies. So I couldn't resist using Google's new ngrams toy, with its nifty view of usage over time, to look for some anachronisms. Part of what prompted that: recently I was looking up a technical term in Google Books and it showed up in a book that was at least a decade too early. But it turned out that the book was not as described, in the most literal sense: the catalog information described one book, but the scanned content was of another book entirely.

The occasional mixup is not surprising in an effort of the scale of Google Books. But looking for anachronisms turned up a disturbing catalog error: periodicals misclassified as books. The problem is, whichever volume of the Memoirs and proceedings of the Manchester Literary & Philosophical Society it might be that discusses computer programming languages, it definitely is not one from 1888. But it seems they all get that publication date because the publication is (according to Google’s catalog data) a book.

Some other misleading finds:

  • Annual Report of the National Academy of Sciences, 1888
  • IEEE Science Abstracts, 1898
  • ACS Chemical Abstracts, 1907
  • American Association of Schools and Departments of Journalism Journalism quarterly, 1928
I could go on. The point is, if you’re looking for trends, you are probably just fine. If you are looking for when terms first appeared, ngrams and Google Books must be used with caution.