How to store a book in DNA
August 17, 2012

Information density (log10 of bits/mm3) versus current bits actually achieved (credit: George M. Church, Yuan Gao, Sriram Kosuri/Science)
Although Harvard geneticist George Church’s next book, Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves, doesn’t hit the shelves until Oct. 2, but it has already passed an enviable benchmark: 70 billion copies — roughly triple the sum of the top 100 books of all time.
That’s because Church, the Robert Winthrop Professor of Genetics at Harvard Medical School and a founding core faculty member of the Wyss Institute for Biomedical Engineering at Harvard University, and his team encoded the book in DNA, which they then read and copied.
Biology’s databank, DNA has long tantalized researchers with its potential as a storage medium: fantastically dense, stable, energy efficient and proven to work over a timespan of some 3.5 billion years.
While not the first project to demonstrate the potential of DNA storage, Church’s team married next-generation sequencing technology with a novel strategy to encode 1,000 times the largest amount of data previously stored in DNA.
The researchers used binary code to preserve the text, images and formatting of the book at a density of 5.5 petabits (1 million gigabits) per cubic millimeter. “The information density and scale compare favorably with other experimental storage methods from biology and physics,” said Sri Kosuri, a senior scientist at the Wyss Institute and senior author on the paper. The team also included Yuan Gao, a former Wyss postdoc who is now an associate professor of biomedical engineering at Johns Hopkins University.
And where some experimental media — like quantum holography — require incredibly cold temperatures and tremendous energy, DNA is stable at room temperature. “You can drop it wherever you want, in the desert or your backyard, and it will be there 400,000 years later,” Church said.
Reading and writing in DNA is slower than in other media, however, which makes it better suited for archival storage of massive amounts of data, rather than for quick retrieval or data processing. “Imagine that you had really cheap video recorders everywhere,” Church said. “Just paint walls with video recorders. And for the most part they just record and no one ever goes to them.
But if something really good or really bad happens you want to go and scrape the wall and see what you got. So something that’s molecular is so much more energy efficient and compact that you can consider applications that were impossible before.”
About four grams of DNA theoretically could store the digital data humankind creates in one year
Although other projects have encoded data in the DNA of living bacteria, the Church team used commercial DNA microchips to create standalone DNA. “We purposefully avoided living cells,” Church said. “In an organism, your message is a tiny fraction of the whole cell, so there’s a lot of wasted space. But more importantly, almost as soon as a DNA goes into a cell, if that DNA doesn’t earn its keep, if it isn’t evolutionarily advantageous, the cell will start mutating it, and eventually the cell will completely delete it.”
In another departure, the team rejected “shotgun sequencing,” which reassembles long DNA sequences by identifying overlaps in short strands. Instead, they took their cue from information technology, and encoded the book in 96-bit data blocks, each with a 19-bit address to guide reassembly. Including jpeg images and HTML formatting, the code for the book required 54,898 of these data blocks, each a unique DNA sequence. “We wanted to illustrate how the modern world is really full of zeroes and ones, not As through Zs alone,” Kosuri said.
The team discussed including a DNA copy with each print edition of Regenesis. But in the book, Church and his co-author, the science writer Ed Regis, argue for careful supervision of synthetic biology and the policing of its products and tools. Practicing what they preach, the authors decided against a DNA insert — at least until there has been far more discussion of the safety, security and ethics of using DNA this way. “Maybe the next book,” Church said.
This work was supported by the U.S. Office of Naval Research (N000141010144), Agilent Technologies and the Wyss Institute.
Comments (5)
by Barbara
Very cool that an entire book could be encoded in DNA. Does anyone know what device was used to read and copy the DNA-encoded book? If the authors HAD included a DNA insert with their book, exactly how would anyone read that DNA?
by Phillfrog
Not going to be much use to a future civilisation as they’d have to rediscover this technology in order to decode it. Would be better to have the book as the only full size thing and all other information compressed in this way. Pretty much the inverse. All bad joking aside, this is amazing.
I remember when I was 17 and took a bus to a nearby village to buy a 100 megabyte hard drive for about £100. You can now pick up a 1 terabyte hard drive from PC World for about the same price. Then there’s this tech, which is at least 1 trillion times as dense. That’s not bad progress in 16 years!
by asiwel
Have to laugh. This is hilarious! What a thing to do. 70 billion copies of your synthetic biology book floating around in a test tube. (Well maybe in a chip.) Certainly appears probably to validate most of the scientific points discussed in the book narrative, I’d say.
by arch1
It would be a nice touch if the authors could assure readers that the book in their hands had been printed based entirely on DNA-encoded information (I can’t tell from the summary whether this in fact will be the case).
by Ian Clarke
I think this is pretty mind-blowing! It’s a shame the read/write capabilities are slower than with standard media (it would be interesting to know just how much slower), but who knows, may be even this shortcoming can be resolved eventually? This has got to have a big impact on our future tech.