Deep Throat Meets Data Mining

In the nick of time, the digital revolution comes to democracy’s rescue. And, perhaps, journalism’s.

If you pay passing attention to the media landscape, you know that most mainstream news outlets have had their business models undermined by the digital revolution. As their general-interest monopolies have been pillaged by niche online competitors, traditional news organizations have lost revenue and cachet, laying off journalists in waves that have grown into tsunamis. This process has created dire prospects for the future of investigative reporting, often seen as the most costly of journalistic forms.

In the middle of November, Sam Zell, the occasionally foul-mouthed chief executive officer of Tribune Company, concisely summarized a common 21st-century media-titan view of public interest journalism in an interview by Joanne Lipman, editor in chief of Condé Nast‘s business magazine, Portfolio: “I haven’t figured out how to cash in a Pulitzer Prize.”

Now, though, the digital revolution that has been undermining in-depth reportage may be ready to give something back, through a new academic and professional discipline known in some quarters as “computational journalism.” James Hamilton is director of the DeWitt Wallace Center for Media and Democracy at Duke University and one of the leaders in the emergent field; just now, he’s in the process of filling an endowed chair with a professor who will develop sophisticated computing tools that enhance the capabilities — and, perhaps more important in this economic climate, the efficiency — of journalists and other citizens who are trying to hold public officials and institutions accountable.

In fact, Hamilton says, his interest in computational journalism grew out of his studies of media economics, which suggested that accountability journalism — the watchdog function that really can involve sorting through tens of thousands of documents, looking for the one smoking invoice — does not have the kind of everyday utility or broad interest value that will ever make it a revenue center in the disaggregated media world of the Internet. For every Watergate exposé that brings renown and readers to a Washington Post, there are dozens of other investigative reports — often equally time-consuming and expensive — that pass largely unnoticed, as the public mind focuses on the return of Britney or the current vogue in sexy vampires. (And if you don’t think Britney and vampires are important, go find yourself a Dec. 11 copy of Rolling Stone, which was, once upon a distant time, the politico-cultural bible of the Watergate age.)

On a disaggregated Web, it seems, people and advertisers simply will not pay anything like the whole freight for investigative reporting. But Hamilton thinks advances in computing can alter the economic equation, supplementing and, in some cases, even substituting for the slow, expensive and eccentric humans required to produce in-depth journalism as we’ve known it.

Already, complex algorithms — programming often placed under the over-colorful umbrella of “artificial intelligence” — are used to gather content for Web sites like Google News, which serves up a wide selection of journalism online, without much intervention from actual journalists. Hamilton sees a not-too-distant future in which that process would be extended, with algorithms mining information from multiple sources and using it to write parts of articles or even entire personalized news stories.

Hamilton offers a theoretical example, taking off from EveryBlock, the set of Web sites masterminded by Adrian Holovaty, one of the true pioneers of database journalism and a former innovation editor at washingtonpost.com. If you live in one of the 11 American cities EveryBlock covers, you now can enter your address, and the site gives you civic information (think building permits, police reports and so on), news reports, blog items and other Web-based information, such as consumer reviews and photos, all connected to your immediate geographic neighborhood. In the not-too-distant future, Hamilton suggests, an algorithm could take information from EveryBlock and other database inputs and actually write articles personalized to your neighborhood and your interests, giving you, for example, a story about crime in your neighborhood this week and whether it has increased or decreased in relation to a month or a year ago.

I understand the potential utility of such a program, which could provide “hyperlocal” news without the cost of journalists (or, at least, with a lot fewer journalists than daily newspapers have employed). It’s just not something that excites me, professionally or as a reader.

But Hamilton also talks of an emerging trend that quickens the heart of even a grumpy old investigative reporter like me. Let’s call it data mining in the public interest.

When The New York Times reported late in 2002 that the Defense Advanced Research Projects Agency was developing a terrorist tracking system known as Total Information Awareness and led by Iran-Contra affair alumnus John Poindexter, privacy advocates were horrified. The horror: Government terror trackers planned to vacuum the huge trove of digital information now available on the Internet and other electronic systems into an enormous database and then use computer algorithms — so-called data-mining strategies — and human analysts to identify previously unnoticed patterns and associations that signaled terrorist planning.

The obvious potential for political abuse (not to mention Poindexter’s ethically troubled history) led Congress to cut off funding for the program and shut down the Pentagon’s Information Awareness Office in September 2003. And to this day, the term data mining maintains its authoritarian/Big Brother/Dr. Strangelove connotations in many left and libertarian quarters.

But the cross-referencing of information contained in multiple databases to find patterns of possible wrongdoing is not an inherently dastardly activity. It can, in fact, enhance government accountability. If you don’t believe me, take a look at the Web site for the Sunlight Foundation, which has founded or funded a dizzying array of projects aimed at revealing “the interplay of money, lobbying, influence and government in Washington in ways never before possible.”

Because it supports so many fascinating ways of tracking the performance of politicians and government agencies, I can only hint at how inventive Sunlight’s agenda actually is. But here’s one entertaining hint: The foundation’s Party Time site  tells you where Washington’s politicos are partying (and, very often, raking in campaign contributions). I clicked in a few weeks ago to see that the Build America PAC, a committee that seems to give primarily to Democratic causes and candidates, was cordially inviting people to join Congressman Gregory Meeks, D-N.Y., at the fourth annual Las Vegas weekend retreat at Bellagio early in December. For a suggested contribution of $5,000 for a co-host and $2,500 for a sponsor, a retreat-goer could get a private reception with Congressman Meeks and a “Special Guest,” not to mention a discounted room rate at Bellagio. I know these particulars because Party Time includes PDFs of the invitations to political partying.

Now, click over to Fortune 535, a Sunlight Foundation attempt to chart the wealth of members of Congress using the vague, flawed financial disclosure system they have set up for themselves. There are all sorts of reasons that the estimated average net worth of U.S. Sen. Orrin Hatch, R-Utah, may have nearly tripled (from $1.2 million to $3.4 million) between 1995 and 2006. All of those reasons may be perfectly legitimate. Still, it’s nice to be able to track the general financial fortunes of congresspeople through the years, via Fortune 535, at a click of the mouse. It used to take a trip to Washington and hours of work (at least) to gather the financial disclosures of just one congressman.

Party Time and Fortune 535 are actually some of the simpler offerings that the Sunlight Foundation funds. For a more complex glimpse of the future of government accountability, go to Sunlight-supported watchdog.net (“The Good Government Site With Teeth”) and enter the name “Pelosi,” just as an example. With a click of the mouse, you’ll get the House speaker‘s campaign contribution totals, earmark activity and legislative record, as well as a listing of how her votes align with the stances of different interest groups. It’s a wonderful and important cross-layering of databased information from the Federal Election Commission, Project Vote Smart, GovTrack, Taxpayers for Common Sense, MAPLight.org and other groups aiming to make government information more easily available to the public. And it is, truly, easy.

Bill Allison, a senior fellow at the Sunlight Foundation and a veteran investigative reporter and editor, summarizes the nonprofit’s aim as “one-click” government transparency, to be achieved by funding online technology that does some of what investigative reporters always have done: gather records and cross-check them against one another, in hopes of finding signs or patterns of problems. Allison has had a distinguished career, from his work as an investigative reporter at The Philadelphia Inquirer to his investigative duties at the Center for Public Integrity, where he co-authored The Cheating of America with legendary center founder Charles Lewis. Before he came to the Sunlight Foundation, Allison says, the notion that computer algorithms could do a significant part of what investigative reporters have always done seemed “far-fetched.”

But there’s nothing far-fetched about the use of data-mining techniques in the pursuit of patterns. Law firms already use data “chewers” to parse the thousands of pages of information they get in the discovery phase of legal actions, Allison notes, looking for key phrases and terms and sorting the probative wheat from the chaff and, in the process, “learning” to be smarter in their further searches.

Now, in the post-Google Age, Allison sees the possibility that computer algorithms can sort through the huge amounts of databased information available on the Internet, providing public interest reporters with sets of potential story leads they otherwise might never have found. The programs could only enhance, not replace, the reporter, who would still have to cultivate the human sources and provide the context and verification needed for quality journalism. But the data-mining programs could make the reporters more efficient — and, perhaps, a less appealing target for media company bean counters looking for someone to lay off. “I think that this is much more a tool to inform reporters,” Allison says, “so they can do their jobs better.”

Investigative reporters have long used computers to sort and search databases in pursuit of their stories. Investigative Reporters and Editors and its National Institute for Computer-Assisted Reporting, for example, hold regular computer-assisted reporting training sessions around the country. And the country’s major journalism schools all deal in some way with computer-enhanced journalism. The emerging academic/professional field of computational journalism, however, might be thought of as a step beyond computer-assisted reporting, an attempt to combine the fields of information technology and journalism and thereby respond to the enormous changes in information availability and quality wrought by the digital revolution.

I would be remiss to write about computational journalism and not mention Irfan Essa, a professor in the School of Interactive Computing of the College of Computing at the Georgia Institute of Technology, who teaches a class in computational journalism and is often credited with coining the term. He says both journalism and information technology are concerned, as disciplines, with information quality and reliability, and he views the new field as a way to bring technologists and journalists together so they can create new computing tools that further the traditional aims of journalism. In the end, such collaboration may even wind up spawning a new participant in the public conversation.

“We’re talking about a new breed of people,” Essa says, “who are midway between technologists and journalists.”

Essa was an organizer of the first “Symposium on Computation and Journalism,” a conference held at Georgia Tech early last year that was sponsored by entities ranging from the National Science Foundation to Google and Yahoo to CNN and Wired. The conference included representatives from the relevant fields — traditional media (including particularly The Washington Post and The New York Times), digital media, social media, academics and the programming community — all trying to understand how the world of information has changed and what the change means for journalism.

So far, the attempt to bring together technologists and journalists has been going “amazingly well,” Essa says. But the field is so new that those involved are still drawing a road map for the discipline and deciding what questions it will attempt to answer through research.

At Duke, Hamilton’s forays into computational journalism have included programs to plumb the U.S. Environmental Protection Agency’s Toxics Release Inventory database, which contains information on chemical emissions by industry and the government. He says he used statistical and mathematical methods to tease out possible inaccuracies or outright lies in companies’ reports of toxic releases.

For instance, his programs searched for violations of Benford’s law, the mathematical rule holding (accurately but counterintuitively) that in many lists of numbers from the real world, the first digit will be “1” about 30 percent of the time and “9” less than 5 percent of the time. Another of the programs looked for toxic release reports that were the same from year to year, on the theory that it is extremely unlikely for any industrial plant to emit precisely the same amount of toxins many years running.

“That’s sort of the supplement (for investigative reporting), where you see the pattern,” Hamilton says. “It’s like a virtual tip.”

After he fills the endowed chair for the Knight Professor of the Practice of Journalism and Public Policy Studies, Hamilton hopes the new professor can help him grow an academic field that provides generations of new tools for the investigative journalist and public interest-minded citizen. The investigative algorithms could be based in part on a sort of reverse engineering, taking advantage of experience with previous investigative stories and corruption cases and looking for combinations of data that have, in the past, been connected to politicians or institutions that were incompetent or venal. “The whole idea is that we would be doing research and development in a scalable, open-source way,” he says. “We would try to promote tools that journalists and others could use.”

It is dangerous to generalize about a class of people who enjoy revealing ugly truths under large headlines. I can, however, say that in my experience, investigative reporters are the type of people who think looking through 10,000 or 20,000 pieces of paper could be — if not exactly fun — challenging and even exciting.

Investigative reporters also can have finely honed senses of outrage. Unlike Capt. Renault, they are not ironically shocked to discover that gambling is going on at Rick’s Café Américain; they actually are shocked — shocked! — and determined to tell the world all the shocking facts, whether the world cares or not.

I’ve also found many investigative reporters to have substantial egos hidden away beneath their outrage. They believe themselves to be engaged in heroic quests that will end in fame, fortune and acknowledgment of their selfless pursuit of the public interest. As a result, older investigative journalists can be a grouchy lot. (Just ask my wife.)

So, to summarize: Investigative journalists are often strident, self-righteous, boring as a phone book, absurdly egotistical and grouchy. The good ones are also national treasures. During my lifetime alone, they have helped save American democracy from two presidents who seemed determined to destroy the checks and balances that support it. Without investigative reporters and the news organizations that pay their freight, we would not know about the National Security Agency‘s illegal electronic surveillance program (thank you, Eric Lichtblau, James Risen and The New York Times), the administration’s plans to attack Iran (bravo, Seymour Hersh and The New Yorker), the Walter Reed Army Medical Center‘s shameful treatment of wounded Iraq war veterans (cheers, Dana Priest, Anne Hull and The Washington Post), and the extent of systematic mistreatment of detainees during the so-called War on Terror (salute, Washington bureau of Knight Ridder and, subsequently, the McClatchy newspapers).

Just as important, without investigative reportage, an astonishing array of major local and regional problems and corruptions would go unrevealed and unaddressed every year. There is no person more important to the general civic health — and yet more consistently under-rewarded in financial and social status terms — than the quality investigative reporter at a local news organization.

I like my Facebook feed and all those funny YouTube clips; I’m glad my RSS reader tells me whenever new fake news pops up on theonion.com. But The Onion won’t save the Republic from the next president who thinks that whatever he does is, by definition, legal. Perhaps a new breed of investigative reporter — practiced in the seduction of Deep Throat sources and armed with algorithms that parse 20,000 documents quicker than you can say, “Follow the money” — will. At the least, a new and more efficient class of investigative reporters might still be on the payroll, the next time we need one to tell us the story we really ought to know, even if it isn’t sexy enough to make the cover of the Rolling Stone.

Miller-McCune welcomes letters to the editor, sent via e-mail to theeditor@miller-mccune.com; via the comment sections of our Web site, Miller-McCune.com; or by standard mail to The Editor, 804 Anacapa St., Santa Barbara, CA 93101.

Sign up for our free e-newsletter.

Are you on Facebook? Click here to become our fan.

Add our news to your site.

Related Posts