Scientists discover hidden code in DNA

December 16, 2013

DNA structure (credit: MIT)

A second code hiding within DNA has been discovered by University of Washington scientists, containing information that changes how scientists read the instructions contained in DNA and interpret mutations to make sense of health and disease.

Some scientists are skeptical about “hype” regarding the announcement.

Since the genetic code was deciphered in the 1960s, scientists have assumed that it was used exclusively to write information about proteins, the UW scientists said, who were “stunned to discover that genomes use the genetic code to write two separate languages.”

One describes how proteins are made, and the other instructs the cell on how genes are controlled. One language is written on top of the other, which is why the second language remained hidden for so long.

“For over 40 years we have assumed that DNA changes affecting the genetic code solely impact how proteins are made,” said Dr. John Stamatoyannopoulos, UW associate professor of genome sciences and of medicine. “Now we know that this basic assumption about reading the human genome missed half of the picture. These new findings highlight that DNA is an incredibly powerful information storage device, which nature has fully exploited in unexpected ways.”

Dual meanings

The genetic code uses a 64-letter alphabet called codons. The UW team discovered that some codons, which they called “duons,” can have two meanings, one related to protein sequence, and one related to gene control. These two meanings seem to have evolved in concert with each other. The gene control instructions appear to help stabilize certain beneficial features of proteins and how they are made.

The discovery of duons has major implications for how scientists and physicians interpret a patient’s genome and will open new doors to the diagnosis and treatment of disease.

“The fact that the genetic code can simultaneously write two kinds of information means that many DNA changes that appear to alter protein sequences may actually cause disease by disrupting gene control programs or even both mechanisms simultaneously,” said Stamatoyannopoulos.

Grants from the National Institutes of Health and National Institute of Diabetes and Digestive and Kidney Diseases funded the research. Researchers at Benaroya Research Institute and Twist Bioscience where also involved.

Abstract of Science paper (A. B. Stergachis et al.)

Genomes contain both a genetic code specifying amino acids and a regulatory code specifying transcription factor (TF) recognition sequences. We used genomic deoxyribonuclease I footprinting to map nucleotide resolution TF occupancy across the human exome in 81 diverse cell types. We found that ~15% of human codons are dual-use codons (“duons”) that simultaneously specify both amino acids and TF recognition sites. Duons are highly conserved and have shaped protein evolution, and TF-imposed constraint appears to be a major driver of codon usage bias. Conversely, the regulatory code has been selectively depleted of TFs that recognize stop codons. More than 17% of single-nucleotide variants within duons directly alter TF binding. Pervasive dual encoding of amino acid and regulatory information appears to be a fundamental feature of genome evolution.

Abstract of Science paper (Robert J. Weatheritt, M. Madan Babu)

Despite redundancy in the genetic code (1), the choice of codons used is highly biased in some proteins, suggesting that additional constraints operate in certain protein-coding regions of the genome. This suggests that the preference for particular codons, and therefore amino acids in specific regions of the protein, is often determined by factors unrelated to protein structure or function (2, 3). On page 1367 in this issue, Stergachis et al. (4) reveal that transcription factors bind within protein-coding regions (in addition to nearby noncoding regions) in a large number of human genes. Thus, a transcription factor “binding code” may influence codon choice and, consequently, protein evolution. This “binding” code joins other “regulatory” codes that govern chromatin organization (3), enhancers (5, 6), mRNA structure (7), mRNA splicing (3), microRNA target sites (6, 8), translational efficiency (9), and cotranslational folding (10), all of which have been proposed to constrain codon choice, and thus protein evolution (see the figure).