How to store the world’s data on DNA
January 24, 2013

Storage cost for DNA vs. tape (credit: Nick Goldman et al./Nature)
Researchers at the EMBL-European Bioinformatics Institute (EMBL-EBI) have created a way to store data in the form of DNA — a material that lasts for tens of thousands of years.
The new method, published in the journal Nature, makes it possible to store at least 100 million hours of high-definition video in about a cup of DNA.
There is a lot of digital information in the world — about three zettabytes’ worth (that’s 3000 billion billion bytes) — and the constant influx of new digital content poses a real challenge for archivists.
Hard disks are expensive and require a constant supply of electricity, while even the best “no-power” archiving materials such as magnetic tape degrade within a decade. This is a growing problem in the life sciences, where massive volumes of data — including DNA sequences — make up the fabric of the scientific record.
“We already know that DNA is a robust way to store information because we can extract it from wooly mammoth bones, which date back tens of thousands of years, and make sense of it,” explains Nick Goldman of EMBL-EBI. “It’s also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy.”
How to write DNA

Digital information encoding in DNA (credit: Nick Goldman et al./Nature)
Reading DNA is fairly straightforward, but writing it has until now been a major hurdle to making DNA storage a reality. There are two challenges: First, using current methods it is only possible to manufacture DNA in short strings. Secondly, both writing and reading DNA are prone to errors, particularly when the same DNA letter is repeated.
Goldman and co-author Ewan Birney, Associate Director of EMBL-EBI, set out to create a code that overcomes both problems.
“We knew we needed to make a code using only short strings of DNA, and to do it in such a way that creating a run of the same letter would be impossible. So we figured, let’s break up the code into lots of overlapping fragments going in both directions, with indexing information showing where each fragment belongs in the overall code, and make a coding scheme that doesn’t allow repeats. That way, you would have to have the same error on four different fragments for it to fail — and that would be very rare.”
The new method requires synthesizing DNA from the encoded information: enter Agilent Technologies, Inc, a California-based company that volunteered its services. Birney and Goldman sent them encoded versions of: an .mp3 of Martin Luther King’s speech, “I Have a Dream,” a .jpg photo of EMBL-EBI, a .pdf of Watson and Crick’s seminal “Molecular structure of nucleic acids” paper, a .txt file of all of Shakespeare’s sonnets; and a file that describes the encoding.
“We downloaded the files from the Web and used them to synthesize hundreds of thousands of pieces of DNA — the result looks like a tiny piece of dust,” explains Emily Leproust of Agilent. Agilent mailed the sample to EMBL-EBI, where the researchers were able to sequence the DNA and decode the files without errors.
“We’ve created a code that’s error-tolerant using a molecular form we know will last in the right conditions for 10 000 years, or possibly longer,” says Goldman. “As long as someone knows what the code is, you will be able to read it back if you have a machine that can read DNA.”
Although there are many practical aspects to solve, the inherent density and longevity of DNA makes it an attractive storage medium. The next step for the researchers is to perfect the coding scheme and explore practical aspects, paving the way for a commercially viable DNA storage model.
Comments (12)
by godot
Isn’t it clear that human beings are just at the edge of being obsolete as information processing elements? How can you even dream that something as limited as a human brain will be used to review, watch, or elsewise assimilate data in 10,000 years???
by infomagician
bc other things do not desire to do so
by Cybernettr
Isn’t it funny, with all this scientific and technological progress, that the way nature does it is still the best?
by Dr.Pratt
As an aside thought, wonder if another Race could have stored information in our DNA 100K years ago, and is just now retrieving it ? That would make us “containers” of information we know not of.
by Mike
Decode using he Golden Ratio or base 108 to ancient Sanskrit and get the plans for a working Vimanas….let’s see the comments now! lol
by Whittaker
Read
http://en.wikipedia.org/wiki/The_Chase_(Star_Trek:_The_Next_Generation)
We would need more than one species to do this though.
by Mishka
You nailed it pal, but not only that earth’s structures (any, not only dna) serve as data, its total dynamics (interactions, transformations) are the calculations. We can look at the totality of the “Gaia process” as a superset similarity to the genotypical and phenotypical expressions.
by Mishka
BTW, Douglas Adams mentioned it in his hitchhiker’s guide to the galaxy much before the scientists thought of using it as a biological “microfilm” :-)
by high carbfoods
A FREE database for the whole world, that is a monumental achievement. Second storing info on DNA like material for thousands of years for archival purposes presents even more extraordinary challenges plus how are we in the future going to make a sense of this ZETTABYTE Dbase. NOW, I am having problem watching 300 channels!
by Marcos Marin
“100 million hours of high-definition video”
SI for dummies.
by Amit
Heh heh!
by Bob
I understood it perfectly (doh!).