How to store data error-free for millions of years

February 16, 2015

ETH researchers have found an error-free  way to store information in the form of DNA, potentially preserving it for millions of years: encapsulate the information-bearing segments of DNA in silica (glass), using an error-correcting information-encoding scheme.

Scrolls thousands of years old provide us with a glimpse into long-forgotten cultures and the knowledge of our ancestors. In this digital era, in contrast, a large part of our knowledge is located on servers and hard drives, which may not survive 50 years, let alone thousands of years. So researchers are searching for new ways to store large volumes of data over the long term.

Recently, 300,000 year old mitochondrial DNA from bears and humans has been sequenced. DNA has also been utilized as a coding language, for applications in forensics, product tagging, and DNA computing, the researchers note.

Two years ago, researchers demonstrated that data could be compactly saved and reread in the form of DNA. The time between “writing” the information — synthesis of the corresponding coding sequence of the DNA — and reading (sequencing) the data was very short.

Synthetic fossils

”Over the longer term, DNA can change significantly as it reacts chemically with the environment, thus presenting an obstacle to long-term storage. However, genetic material found in fossilized bones several hundreds of thousands of years old can be isolated and analyzed, since it’s been encapsulated and protected.

“Similar to these bones, we wanted to protect the information-bearing DNA with a synthetic ‘fossil’ shell,” explains Robert Grass, a lecturer at ETH Zurich’s Department of Chemistry and Applied Biosciences.

Digital information is encoded to DNA and encapsulated within silica spheres. Upon release of the DNA from the spheres by fluoride chemistry, the DNA is read by Illumina sequencing and decoded to recover the original information, even if errors were introduced during the procedures. (credit: Robert N. Grass et al./ANIE)

To do that, his team encapsulated the DNA in silica spheres with a diameter of about 150 nanometers. The researchers encoded Switzerland’s Federal Charter of 1291 and The Methods of Mechanical Theorems by Archimedes in the DNA.

Simulating data degradation

To simulate the degradation of the information-bearing DNA over a long period of time, researchers stored it at a temperature of between 60 and 70 degrees Celsius for up to a month. Such high temperatures replicate the chemical degradation that takes place over hundreds of years within a few weeks.

That allowed the researchers to compare DNA storage in a sheath of silica glass with other common storage methods: on impregnated filter paper and in a biopolymer. The DNA encapsulated in the glass shell turned out to be particularly robust. By using a fluoride solution, it could be easily separated from the silica glass, and the information read from it.

Since encapsulation in silica is roughly comparable to encapsulation in bones, researchers could draw on prehistoric information about the long-term stability of encapsulated DNA and from this calculate a prognosis

By storing it at low temperatures, such as that found in the Svalbard Global Seed Vault (minus 18 degrees Celsius), DNA-encoded information can survive more than a million years, the researchers suggest. In contrast, data projected on microfilm can be preserved only for an estimated 500 years.

Retrieving lost data

Recovering the original information from the DNA (credit: Robert N. Grass et al./Angew.Chem.Int.)

But it’s not enough to simply store the information over long periods of time without substantial damage; the data must also be able to be read free of error. Thanks to significant technological advancements in DNA sequencing, reading stored data is now affordable and will become even more cost-effective in the future. These technologies, however, are not error-free.

To deal with this, Reinhard Heckel from ETH Zurich’s Communication Technology Laboratory developed a scheme to correct these errors based on the Reed-Solomon Codes, similar to those that are used in the transmission of data over long distances; for example, radio communication with spacecraft.

The key is redundant information attached to the actual data, explains Heckel. “In order to define a parabola, you basically need only three points. We added a further two in case one gets lost or is shifted.” The DNA-encoded data is indeed more complex, but in principle the researchers’ DNA-encrypted security ‘back-up’ functions in the same manner. Even when stored in adverse conditions, the information saved for the test — Switzerland’s Federal Charter and Archimedes’ text — could be retrieved error-free.

Their data further predict that digital information could be stored encapsulated in silica at the Global Seed Vault (at –880 C) for more than 2 million years.

What kind of information would Grass save for millions of years? The documents in Unesco’s Memory of the World Programme, he says, and Wikipedia: “Many entries are described in detail, others less so. This probably provides a good overview of what our society knows, what occupies it and to what extent.”


Abstract of Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes

Information, such as text printed on paper or images projected onto microfilm, can survive for over 500 years. However, the storage of digital information for time frames exceeding 50 years is challenging. Here we show that digital information can be stored on DNA and recovered without errors for considerably longer time frames. To allow for the perfect recovery of the information, we encapsulate the DNA in an inorganic matrix, and employ error-correcting codes to correct storage-related errors. Specifically, we translated 83 kB of information to 4991 DNA segments, each 158 nucleotides long, which were encapsulated in silica. Accelerated aging experiments were performed to measure DNA decay kinetics, which show that data can be archived on DNA for millennia under a wide range of conditions. The original information could be recovered error free, even after treating the DNA in silica at 70 °C for one week. This is thermally equivalent to storing information on DNA in central Europe for 2000 years.