Digital Disappearance

Never has the world historical and cultural record been more accessible — or more fragile.

Once upon a time, news stories were entombed in newspaper “morgues” and rarely saw the dusty light of day.

Now the news never dies. Millions of people can search the archives online — an amazing benefit unless, perhaps, you’re someone who was actually in the news.

In a recent survey of 110 news organizations, the Toronto Star found that increasingly, publishers are fielding regular requests from anxious and embarrassed readers to “unpublish” information, sometimes months or years after it first appeared online.

Some readers don’t want their marital status or the price of their home known, or they were quoted saying something they now regret. They may be angry because the news of their arrest was reported, but not the news that they were acquitted or that charges were dropped, and their names keep popping up on Internet searches in connection with the crimes, usually misdemeanors.

Pre-Internet, of course, the reports remained on paper, intact and inviolable — but also inaccessible to the casual viewer and probably unknown. The Internet has opened up the past and made it fungible with a few keystrokes. The offended know it’s physically easy to change a story online.

“Most often, these individuals don’t understand a newspaper’s greater responsibility to its readers and the public record,” said Kathy English, the Star’s public editor, the author of the report, and the person who handles reader requests to “unpublish,” in consultation with the Star’s lawyers and senior editors.

“… To erase the record of what has been published would diminish transparency and credibility with readers,” English said.

Nearly 80 percent of the editors who participated in the survey said the circumstances sometimes do warrant changing the record. Some said they would remove information if there were a legal reason to do so. Others were more open to adding information than subtracting it. Overall, though, they were strongly resistant to altering published stories, even as they said they want to be fair to those named in the news.

As Kathy Steiner of the Jamestown Sun wrote, “‘Unpublishing’ is a word that doesn’t accurately reflect what people are asking. They’re asking to censor or rewrite history.”

‘Haphazard deterioration’
On a much broader scale, “unpublishing” is the wholesale loss of content that can occur when an online journal or Web archive is sold or goes bankrupt, or the software needed to read it becomes obsolete. It’s expensive to transfer records from an old server to a newer, faster version that operates with different formats and programs. A floppy disk has a half-life of about five years.

“It’s not clear who’s responsible to archive digital material,” said Stanley Katz, director of the Princeton University Center for Arts and Cultural Policy Studies. “Some of the stuff’s going to go away altogether. We are likely to lose whole subsets of it. If we keep renewing everything, we can keep it going. But the question is whether there is money and commitment enough to keep it going. The odds are that money will be applied selectively. …”

“If the New York Times goes out of business, whose responsibility is it to preserve their digital archive? This kind of thing is happening as we watch. It’s not speculation.”

In George Orwell’s sci-fi classic, Nineteen Eighty-Four,the Records Department of the super-state of Oceania is abruptly forced to work around the clock to rectify five years’ worth of political newspapers, books and pamphlets, expunging all references to Oceania’s long-term alliance with Eastasia. Because, the clerks learn, “Oceania was at war with Eastasia: Oceania has always been at war with Eastasia.”

As so-called cloud computing becomes more common, consolidating lots of — and perhaps someday most — data in servers physically removed from users, the ability to make quick and global changes grows with it. Web-based software — think Hotmail or Gmail — is already allowing quick, and welcome, updates for all users simultaneously without tapping their personal hard drives. Once much archival material and almost all new materials starts living online exclusively, the ability to amend that cloud by the cloud-keepers becomes less fantastic.

But direct governmental control of information is not the biggest threat in the digital age, at least in this country, said Clifford A. Lynch, executive director of the Coalition for Networked Information, a Washington D.C.-based nonprofit group with 220 members, primarily libraries and universities. Rather, it’s the “low-key, haphazard deterioration of the record” that’s alarming, he said.

“I don’t imagine the centralized attack that you see in Orwell,” Lynch said. “It’s very hard to see us getting to the stage where there’s a government department in charge of rewriting history. It’s more that there are thousands of small players who can chip away at things, people who can litigate or write in to a paper and get themselves removed, or somebody who had a prominent presence on Web and then dies, and the Web site vanishes.”

So the future is less 1984 or Fahrenheit 451, and more “99 bottles of beer on the wall.”

Will the past last?
At Columbia University, a team of seven people, including two full-time librarians, has recently founded the Human Rights Web Archive to preserve Web sites that are providing valuable information on struggles for democracy in other countries. Many of these sites are being hacked, suspended or shut down by repressive regimes.

“This is the early days yet,” said Bob Wolven, an associate librarian at Columbia. “The basic concern is with information on the Web that is freely available but fragile and might disappear. We’re still working things out. We’re starting with human rights, drawing on the expertise of librarians and scholars. Knowing what’s there is a big part of the challenge.”

Preservation is not cheap. Columbia’s effort is funded by a three-year, $716,000 grant from the Andrew Mellon Foundation. Print collections in libraries can weather a few years of budget austerity, scholars say, but a few lean years could cause large portions of the electronic record to disappear.

Twenty-one complete copies of the Gutenberg Bible are still in existence today after more than 500 years. The Dead Sea Scrolls (which are going online) survived more than two millennia. How long will electronic books survive?

“We kind of wonder what’s going to happen to them, 20 years from now,” said Oya Y. Rieger, who oversees digital technologies at Cornell University Library. Cornell is digitizing 15,000 books monthly in partnership with Google.

“Fifty years from now, Cornell will still probably have the print originals,” Rieger said. “There are many initiatives creating technical infrastructures to ensure that digital books will still be available, but they are untested. I’m afraid some of it will be defined by time.”

In the past, there were multiple print copies of journals on the shelves of multiple university libraries. Today, schools are saving money by canceling journals in print and subscribing exclusively online. In many cases, the journals are “born-digital.”

On Feb. 26, Princeton is hosting a conference titled, “Is there a past in your future?” to discuss how to preserve scholarly journals that are no longer stored by universities, either in print or online.

Hundreds of libraries, including Cornell, Princeton and Columbia, work with publishers to ensure that electronic copies will be deposited with nonprofit “dark archives,” a kind of insurance policy for the intellectual record. But the participants represent only a fraction of libraries around the world.

“We are at the mercy of publishers, in a way,” Rieger said. “We don’t have the back files.  If a publisher were to shut down a server, there would be thousands of libraries losing a journal.”

Also, there’s the problem of how to save the life’s work of prominent scholars who, on their retirement, leave a computer hard drive and boxes of CDs to their schools.

“It’s extremely challenging to ensure that even 10 years from now, some of these files are going to be retrievable,” Rieger said. Information has never been so accessible or so fragile, she said.

“If you can’t use it, it means it has not been preserved,” Rieger said.

‘Google is a menace’
Access is the flip side of preservation in the digital age. And access can effectively vanish when a Web site changes hands, as a group of frustrated U.S. and Mexican historians discovered last year.

In May 2009, these historians published a letter of protest online, alleging that Google had “disabled the access” to an important archive of 19th- and 20th-century Mexican newspapers it had purchased from the Paper of Record. The archive was an essential resource, the letter said: It included more than 490 newspapers in Spanish and 20 million pages, spanning three centuries.

The historians demanded that Google “restore free access to the contents of Paper of Record,” an archive they said represented the intellectual property and heritage of Mexico.

Historians who depended heavily on the Paper of Record for their research said they were being directed online to Google News Archives and WorldVitalRecords.com, but got garbled results on their keyword searches. They said they could no longer “turn” the pages of the scanned newspapers as if they were holding the actual documents. The rich repository of information was now virtually impossible to search, the historians said.

Google was unresponsive to their concerns, they said.

“I’d be better off going to Mexico City for a week,” said Richard Salvucci, a history professor at Trinity University who signed the letter of protest. “Google is a menace.”

Google and WorldVitalRecords did not respond to Miller-McCune’s e-mails or phone calls this month requesting comment regarding the Paper of Records archives.

In December, Ted Beatty, a history professor at the University of Notre Dame who signed the letter, said the old search engine and user-friendly features of Paper of Record were available with an institutional subscription at WorldVitalRecords, but for a high price: $2,500 for 10 users and $4,500 for 20 users.

The high institutional fee means that “only a small number of researchers at major institutions will get access,” Beatty said. Salvucci said he wouldn’t be one of them.

“Paper of Record made this widely available and accessible at no cost,” Beatty said. “It made it so that Mexicans could access their cultural history for the first time. It represented a real democratization of the ability to look at and engage our past. It was a tremendously powerful tool.”

“Google has deep pockets and they see these easily-gobbled-up chunks of material and it’s their mission to buy it … but in the short term, they didn’t think it through,” Beatty said. “They clearly weren’t ready to replicate the quality of access the Paper of Record had achieved.”

Salvucci said he had tried but failed to draw the attention of congressional committees to the looming threats to the historical record online.

“With what I know right now, every time I hear somebody is going to digitize something for posterity, I think to myself, ‘Good luck,'” Salvucci said. “Because if you can digitalize it, you can vaporize it.

“This is not just a vision of some crazed social democrat in Britain,” he said, in a reference to Orwell. “This is no joke. This is happening.”

‘Like stealing a book’

While historians of Mexico’s past lament their loss, current-day newspaper editors are struggling to keep the record of today’s news intact. In the Star survey, they wrote about their reluctance to revise history.

“For one, we have an ethical obligation to historians, researchers and readers to uphold the content record we have created,” said Peter Crowley of the Adirondack Daily Enterprise. “We should only remove something if it is libelous — and even then, we should be careful.”

Doug Ernst of the St. Helena Star said, “The online archive is like a library. Unpublishing is like stealing a book from the library.”

There are no industry-wide “best practices” for handling the public’s requests. In the report, English recommended that papers draw up a policy and choose several top executives to make decisions by consensus. They should be humane but not give in to “source remorse,” English said.

It’s been an uneven approach so far. The Times-Tribune/Times Shamrock Communications agreed to remove a story that police said would compromise an investigation. But the Wisconsin State Journal declined to take down a years-old column that mentioned a man’s immigration problems.

Capital Gazette Communications removed all the columns written by an author after discovering plagiarism in one of them. The Dayton Daily took down a nude picture that a woman had accidentally uploaded to an online photo gallery. TheSummit Daily News removed a story in which a rookie reporter had switched the name of the suspect and the reporting party.

The Big Spring Herald declined to remove the story of a man arrested for indecent exposure, even though the case was later expunged from his record. The Houston Chronicle agreed to remove an in-depth story, several years old, about a heroin addict who was now clean and looking for work as a paralegal.

“We have done this in the rarest of cases, based on facts specific to individual cases,” said Dean Betz of the Chronicle. “It’s largely done for humanitarian reasons.”

At GateHouse Media, a company that owns hundreds of dailies, weeklies and local Web sites in the U.S., officials decided to institute a pilot “sunset” policy at some New England papers, in which police blotter reports of misdemeanors are programmed to “fall off” the Web sites, six months after publication.

“How long does something minor like a shoplifting charge have to follow someone on the Web?” asked Brad Dennison, a GateHouse vice president. “My moral barometer tells me that’s not fair. There’s no rule that says this stuff has to live forever.” (Or even be published in the first place, English notes.)

Yet, as Paulette Haddix of the Post-Tribune of Northwest Indiana said: “If something happened, it happened. If it was said, it was said. We don’t want to set any ‘unpublishing’ precedent where we are rewriting history.”

Sign up for our free e-newsletter.

Are you on Facebook? Become our fan.

Follow us on Twitter.

Add our news to your site.

Related Posts