Study of protein folds adds to evidence that viruses are alive and ancient

Scientists estimate there are more than a million viral species, but less than 4,900 viruses have been identified and sequenced
October 1, 2015

The diverse physical attributes, genome sizes and lifestyles of viruses make them difficult to classify. A new study uses protein folds as evidence that viruses are living entities that belong on their own branch of the tree of life. (credit: Julie McMahon)

Viruses are actually living entities that share a long evolutionary history with cells, researchers report in a study that traces viral evolution back to a time when neither viruses nor cells existed in the forms recognized today.

The new findings appear in an open-access paper in the journal Science Advances.

Some scientists have argued that viruses are nonliving entities, bits of DNA and RNA shed by cellular life. They point to the fact that viruses are not able to replicate outside of host cells, and rely on cells’ protein-building machinery to function. But much evidence supports the idea that viruses are not that different from other living entities, said University of Illinois crop sciences and Carl R. Woese Institute for Genomic Biology professor Gustavo Caetano-Anollés, who led a new analysis with graduate student Arshan Nasir.

“Many organisms require other organisms to live, including bacteria that live inside cells, and fungi that engage in obligate parasitic relationships — they rely on their hosts to complete their lifecycle,” he said. “And this is what viruses do.”

The discovery of the giant mimiviruses in the early 2000s challenged traditional ideas about the nature of viruses, Caetano-Anollés said. “These giant viruses were not the tiny Ebola virus, which has only seven genes. These are massive in size and massive in genomic repertoire,” he said. “Some are as big physically and with genomes that are as big or bigger than bacteria that are parasitic.”

Some giant viruses also have genes for proteins that are essential to translation, the process by which cells read gene sequences to build proteins, Caetano-Anollés said. The lack of translational machinery in viruses was once cited as a justification for classifying them as nonliving, he said.

“This is no more,” Caetano-Anollés said. “Viruses now merit a place in the tree of life. Obviously, there is much more to viruses than we once thought.”

A new virus taxonomy

Caetano-Anollés is the co-author of a report by the International Committee on the Taxonomy of Viruses that recognized seven orders of viruses, based on their shapes and sizes, genetic structure and means of reproducing.

“Under this classification, viral families belonging to the same order have likely diverged from a common ancestral virus,” the authors wrote. “However, only 26 (of 104) viral families have been assigned to an order, and the evolutionary relationships of most of them remain unclear.”

Part of the confusion stems from the abundance and diversity of viruses. Less than 4,900 viruses have been identified and sequenced so far, even though scientists estimate there are more than a million viral species. Many viruses are tiny — significantly smaller than bacteria or other microbes — and contain only a handful of genes. Others, like the recently discovered mimiviruses, are huge, with genomes bigger than those of some bacteria.

The new study focused on the vast repertoire of protein structures, called “folds,” that are encoded in the genomes of all cells and viruses. Folds are the structural building blocks of proteins, giving them their complex, three-dimensional shapes. By comparing fold structures across different branches of the tree of life, researchers can reconstruct the evolutionary histories of the folds and of the organisms whose genomes code for them.

A new study analyzes the distinct, three-dimensional “folds” structures found in proteins. Pictured here are folds found in viruses. (credit: Arshan Nasir)

The researchers chose to analyze protein folds because the sequences that encode viral genomes are subject to rapid change; their high mutation rates can obscure deep evolutionary signals, Caetano-Anollés said. Protein folds are better markers of ancient events because their three-dimensional structures can be maintained even as the sequences that code for them begin to change.

Shared protein folds between cells and viruses

The researchers analyzed all of the known folds in 5,080 organisms representing every branch of the tree of life, including 3,460 viruses. Using advanced bioinformatics methods, they identified 442 protein folds that are shared between cells and viruses, and 66 that are unique to viruses.

“This tells you that you can build a tree of life, because you’ve found a multitude of features in viruses that have all the properties that cells have,” Caetano-Anollés said. “Viruses also have unique components besides the components that are shared with cells.”

In fact, the analysis revealed genetic sequences in viruses that are unlike anything seen in cells, Caetano-Anollés said. This contradicts one hypothesis that viruses captured all of their genetic material from cells. This and other findings also support the idea that viruses are “creators of novelty,” he said.

Using the protein-fold data available in online databases, Nasir and Caetano-Anollés used computational methods to build trees of life that included viruses.

The data suggest “that viruses originated from multiple ancient cells … and co-existed with the ancestors of modern cells,” the researchers wrote. These ancient cells likely contained segmented RNA genomes, Caetano-Anollés said.

The data also suggest that at some point in their evolutionary history, not long after modern cellular life emerged, most viruses gained the ability to encapsulate themselves in protein coats that protected their genetic payloads, enabling them to spend part of their lifecycle outside of host cells and spread, Caetano-Anollés said. The protein folds that are unique to viruses include those that form these viral “capsids.”

“These capsids became more and more sophisticated with time, allowing viruses to become infectious to cells that had previously resisted them,” Nasir said. “This is the hallmark of parasitism.”


Abstract of A phylogenomic data-driven exploration of viral origins and evolution

The origin of viruses remains mysterious because of their diverse and patchy molecular and functional makeup. Although numerous hypotheses have attempted to explain viral origins, none is backed by substantive data. We take full advantage of the wealth of available protein structural and functional data to explore the evolution of the proteomic makeup of thousands of cells and viruses. Despite the extremely reduced nature of viral proteomes, we established an ancient origin of the “viral supergroup” and the existence of widespread episodes of horizontal transfer of genetic information. Viruses harboring different replicon types and infecting distantly related hosts shared many metabolic and informational protein structural domains of ancient origin that were also widespread in cellular proteomes. Phylogenomic analysis uncovered a universal tree of life and revealed that modern viruses reduced from multiple ancient cells that harbored segmented RNA genomes and coexisted with the ancestors of modern cells. The model for the origin and evolution of viruses and cells is backed by strong genomic and structural evidence and can be reconciled with existing models of viral evolution if one considers viruses to have originated from ancient cells and not from modern counterparts.