When the FedEx package arrived from California, its contents were so small as to be invisible. One of the British scientists, a mathematician, worried they’d been sent empty test tubes. His colleague, a molecular biologist, corrected him, turning a vial upside down, tilting it in the light. There, in the bottom, like a thimbleful of dust, were the data: tiny strands of DNA.
Stored in their paired nucleotides were five files, 739 kilobytes of information: a color photograph of the European Bioinformatics Institute, where the researchers worked; an MP3 excerpt of Dr. King’s “I Have a Dream” speech; the complete text of Shakespeare’s 154 sonnets; a PDF of Watson and Crick’s seminal paper describing the double-helix; and a copy of the cipher used to encode the data.
Now, the biologists had to stitch the photo, the speech, and the sonnets back together again. “We sequenced the DNA, we read the sequences, we decoded the information in them,” said Nick Goldman, a geneticist at the U.K.-based EBI. “And indeed it had worked!”
Though it took two weeks and thousands of dollars to sequence all the base pairs, Goldman’s biological “hard drive” revealed its contents almost flawlessly. The strands didn’t hold much data—a tenth as much as an everyday CD-ROM—but the experiment proved that the technique worked. And it could scale massively.
“DNA is a very, very dense piece of information storage,” said Ewan Birney, a member of the EBI team. “It’s very light. It’s very small. The coding scheme that we used would work to a zetabyte level”—a billion gigabytes, or half the total data being stored by all the world’s companies today.
Goldman and Birnery dreamed up the helix-as-hard-drive idea, which appears this week in Nature, over a pint in a Hamburg pub. They were discussing better ways of storing the data their institute collects than magnetic tapes (which degrade) and hard disks (which hog electricity and require constant upkeep). “This is a very real problem at the EBI, because the databases we have to look after are growing exponentially but, sadly, our budgets aren’t,” Goldman said.
Nature, they realized, worked out a simple, compact way of storing information more than three billion years ago. “So over a second beer, we started to write on napkins and sketch out some details of how that might be made to work.”
Goldman and Birney used a cipher to convert bytes (0s and 1s) into nucleotides (Gs, Cs, As, and Ts), with redundancies to minimize the risk of error during synthesis and sequencing. Agilent Technologies, in Santa Clara, California, created the strands and mailed them in a plain cardboard box back to Europe. The biologists discarded the strands with transcription errors and reconstructed the original files. Every sonnet was still there.
Birney admits that the technology is still “breathtakingly expensive.” (Agilent donated its services, worth tens of thousands of dollars alone.) But the costs are entirely up-front. “One of the great properties of DNA is that you don’t need any electricity to store it,” he said. “If you keep it cold, dry, and dark, DNA lasts for a very long time. And we know that because we routinely sequence woolly mammoths’ DNA.”
While the Nature trial was more proof-of-concept than proof-of-marketability, in the paper, Birney and Goldman explore the “switchover point” at which synthesizing DNA makes more economic sense than storing reams of magnetic tape or running a server farm.
At current costs, that timeline is six hundred to 5,000 years. (Birney jokes that, if only early molecular biologists had encoded the Library of Alexandria and buried the strands in a cave in Finland, none of that ancient knowledge would now be lost.) To reach a more realistic switchover point—say, 50 years—the cost of DNA synthesis would need to come down by a factor of 100.
Birney notes that we’ve seen such a reduction in the last decade alone. “Anything that you want to store, we could store. Really the only limit for this is the expense of doing the synthesis, and we believe that will come down in the future.” Given that most of us can remember saving files, not too long ago, on 3.5-inch “floppy” disks, that future’s not difficult to imagine.
For now, “the cloud” is the future of data storage. Can “the helix” be far behind?