For a scholar of ancient languages, Gregory Crane has blazed a very contemporary career path. Crane received his Ph.D. in classical philology at Harvard University before joining the university as an assistant professor. He has published books on Thucydides, widely regarded as one of the world’s first historians, and articles on Hellenistic Poetry. He’s also obsessed with algorithmic analysis and has spent decades building an online database that’s changing how we view the texts of the past, and, in the process, scholarship itself.
In the 1980s, when computer use was far from mainstream, particularly in academia, Crane created a Unix-based text-retrieval system for the Thesaurus Linguae Graecae, also known as the Treasury of Greek Language. The TLG is a corpus—the linguistics term for a body of writing—that encompasses all of the classical Greek text we know exists, ranging roughly from the third century B.C.E. in Greece to Constantinople in 1453. “Ninety percent of everything that’s been written in the last 100 years was about 10 million words of [classical] text,” Crane says.
Since 1985, Crane has also been a part of the Perseus Project, first as co-director and now as editor-in-chief. The organization, which might sound like something a Bond villain would come up with for a missile that destroys the moon, is actually an open-source online library for those same classical texts. It’s stated mission is making “the full record for humanity as intellectually accessible as possible to every human being,” according to its website. Digital technology is making breaking down ancient languages that much easier—and that much more public.
It’s not hard to see why Google might have an affinity for the project. It’s building a kind of online search tool for antiquity rather than the present.
“All of a sudden we have access to thousands of years of stuff,” Crane says. By packing the entire TLG corpus into a database at the Perseus Project, turning the words into searchable data, it has become possible to find new connections between languages. “If you have every word in Greek analyzed, you can answer questions like, ‘What English translation does this Sanskrit word correspond to?’” Crane says. “You can start to analyze parallel texts and do semantic analysis that you couldn’t do before.”
With the help of Optical Character Recognition software that turns letters into zeroes and ones, digitizing text is simple. But on its own, that data isn’t quite as valuable as it could be. “The problem with OCR-generated text is that there is not much metadata,” Crane says. All those words need to be tagged and collated before they can be properly studied. To solve that problem, Perseus has taken on a decidedly unacademic openness to the outside world. “We’re building an environment to support a global audience to work directly with the source material—Chinese, Greek, whatever,” he says.
In other words, the database has become participatory. Classics students as well as the wider public are invited to contribute translations, definitions, citations, and text corrections for when OCR fails—hundreds of thousands of users have already accessed the database. In return, the utility of Perseus’ information increases for everyone using it.
“The challenge is to be able to identify the structure of the collection, to analyze the text and find patterns over time or space,” Crane says. This means that the Perseus Project is able to track how ancient languages shifted and evolved not just across centuries text by text, but from place to place—“what’s going on in Germany versus Italy,” according to Crane.
Rather than generalizing historical grammar, “You want to be able to say, 37 percent of the time in the 17th century and 12 percent of the time in the 18th century, this happens,” Crane says. It’s bringing a technological edge to a field that has been driven by traditional scholarship. “Often what happens is that if you look at the data behind statements in the [academic] literature there’s either a lot of data that’s ignored or there’s very little,” the professor says.
Besides quantifiable scholarly benefits, the cooperative database has also lent a transparency to the rather obscure pursuit of understanding ancient languages. “It challenges us to think about other people, not just specialists—how to articulate many of the tasks that we perform in ways that other people can understand,” Crane says.
The burgeoning field of Digital Humanities provides the larger trend behind Crane’s work with Perseus. In fact, in 2010, Crane was given the Google Digital Humanities award for his ongoing research. It’s not hard to see why the technology company might have an affinity for the project. It’s building a kind of online search tool for antiquity rather than the present.
But where Digital Humanities is often devoted to figuring out how we use new technology to create culture, dissecting the behavior of ordinary users like they were cells under a microscope, Crane’s work is more about using technology to enliven a cultural arena that we had perhaps thought exhausted long ago.
Perseus isn’t just about breaking down ancient languages; it’s about our shared history, and how we can use the Internet to make that past more collectively accessible than ever before. “As a humanist, you’re really supposed to address human intellectual life,” Crane says. “It’s a chance to re-think the relationship between what we do as professional scholars and what other people are doing.”