New open-source encyclopedia catalogs the human genome

April 20, 2011
ENCODE Image

ENCODE is a massive database cataloging many of the functional elements of the entire collection of human genes (credit: National Institutes of Health)

ENCODE (Encyclopedia Of DNA Elements), a massive database cataloging the human genome’s functional elements, including genes, RNA transcripts, and other products, has been created by an international team of researchers, with principal investigators at Penn State University and HudsonAlpha Institute for Biotechnology.

ENCODE is being made available as an open resource to the scientific community, classrooms, science writers, and the public. It provides an overview of the team’s ongoing efforts to interpret the human genome sequence, as well as a user’s guide for accessign the vast amounts of data and resources produced so far by the project.

ENCODE comes on the heels of the now-complete Human Genome Project, the 13-year effort aimed at identifying all the approximately 20,000 to 25,000 genes in human DNA, which used open-source data sharing to further scientific discovery and public understanding of science.

Scientists with the ENCODE Project are applying up to 20 different tests in 108 commonly used cell lines to compile the data. The ENCODE User’s Guide also explains how to apply the data to interpret the human genome.

The ENCODE project adds data such as where RNA is produced from our DNA, where proteins bind to DNA, and where parts of our DNA are augmented by additional chemical markers. These proteins and chemical additions are keys to understanding how different cells within our bodies are interpreting the language of DNA.

The researchers say that the ENCODE data can be immediately useful in interpreting associations between disease and DNA sequences that can vary from person to person (single nucleotide polymorphisms). For example, scientists know that DNA variants located upstream of a gene called MYC are associated with multiple cancers, but until recently the mechanism behind this association was a mystery.

“Very few proteins and other DNA products differ in any fundamental way between humans and chimps,” says principal investigator Ross Hardison. “The important difference between us and our close cousins lies in gene expression — the basic level at which genes give rise to traits such as eye color, height, and susceptibility to a particular disease. ENCODE is helping to map the very proteins involved in gene regulation and gene expression. The User’s Guide not only explains how to find the data, but also explains how to apply the data to interpret the human genome.”

Ref.: Peter B. Becker, Academic Editor, A User’s Guide to the Encyclopedia of DNA Elements (ENCODE), PLoS Biology (open access), April 19, 2011