The Blind Spot of Search Engines

The financial meltdown of the American newspaper industry, and the subsequent shrinking of editorial staffs, may have claimed yet another casualty: good scholarship.

Researchers trying to establish the amount of information on a given topic that reaches the public — or to analyze that content for conscious or unconscious bias — tend to rely on the LexisNexis database. Since 1979, that service has been compiling and archiving news stories from a wide variety of sources, including the world’s major newspapers and magazines.

But its approach has fallen behind the reality of the times, according to a paper published in the Autumn 2008 issue of the Journalism and Mass Communication Quarterly.

Professor Bruce Bimber and doctoral candidate David A. Weaver of the University of California, Santa Barbara political science department note that — like most archives kept by individual newspapers — LexisNexis excludes wire-service copy that appears in those pages from its database (although it does include raw wire stories, sans attribution to a specific paper, in its database). Through the company’s resources, one can determine what staff writers for the Chicago Tribune or Boston Globe wrote about a specific issue on a given day, but there is no way of knowing whether the paper ran an Associated Press or Reuters story in place of, or to supplement, the staff report.

“The fraction of news reaching the public that originates with wire services is not known,” the researchers report. They add, however, that with “substantial reductions in news staff at newspapers across the country,” there is no question that AP copy is filling up more and more pages of the nation’s newspapers.

To determine the significance of this issue, the researchers focused on a single issue that is just emerging as a news item — nanotechnology. (Their choice is in line with their current affiliation — Bimber is a principal investigator and Weaver a graduate fellow with UCSB’s Center for Nanotechnology in Society.) Using both LexisNexis and Google News, which has been aggregating items from thousands of news sources since 2002, they looked at how many stories they could find on the subject between Jan. 29, 2006 and Aug. 15, 2007.

“Google News revealed that LexisNexis missed more than half of the news stories in major papers,” the researchers concluded. “LexisNexis is blind to a great many news stories because of the wire exclusion and this problem extends to major news outlets.”

So, should scholars simply shift to using Google News (which has the added advantage of being free)? Well, there’s a problem there. In September 2007, that service agreed to license stories directly from four major wire services and stopped displaying “duplicate content” — that is, alerting readers as to when a wire service report is picked up by a particular publication.

So both services now suffer from what the scholars call “the wire service blind spot,” making it quite difficult for researchers to determine after the fact precisely what stories subscribers to, say, the San Francisco Chronicle read on any particular topic.

Given the increasing tendency of people to use the Internet to find sources from all over, this line of research may be nearing its conclusion anyway. But if you’re looking at how public opinion is shaped, there is value in knowing whether a particular AP analysis of a presidential overseas trip — or, for that matter, a movie review — was printed prominently in major papers or confined to the back pages of a few small dailies.

So if you run across any graduate students leafing through stacks of back issues in library basements, don’t dismiss them as hopelessly out of date. They may be employing the only available technology to gather the data they need.

Sign up for our free e-newsletter.

Are you on Facebook? Become our fan.

Add our news to your site.

Related Posts