Crowdsourced ‘EteRNA’ RNA designs outperform computer algorithms

Carnegie Mellon and Stanford project combines global online design challenge with lab experiments
January 28, 2014

An RNA design produced by a player of the online EteRNA design game (credit: CMU)

An enthusiastic group of non-experts, working through an online interface and receiving feedback from lab experiments, has produced designs for RNA molecules that are consistently more successful than those generated by the best computerized design algorithms, researchers at Carnegie Mellon University and Stanford University report.

The researchers then gathered some of the best design rules and practices generated by players of the online EteRNA design challenge and, using machine learning principles, generated their own automated design algorithm, EteRNABot, which also bested prior design algorithms. Though this improved computer design tool is faster than humans, the designs it generates still don’t match the quality of those of the online community, which now has more than 130,000 members.

The research is published in the Proceedings of the National Academy of Sciences (open access).

Getting Started with Eterna — ETERNAGAME

Quality of the RNA designs: “amazing”

“The quality of the designs produced by the online EteRNA community is just amazing and far beyond what any of us anticipated when we began this project three years ago,” said Adrien Treiulle, an assistant professor of computer science and robotics at Carnegie Mellon, who leads the project with Rhiju Das, an assistant professor of biochemistry at Stanford, and Jeehyung Lee, a Ph.D. student in computer science at Carnegie Mellon.

“This wouldn’t be possible if EteRNA members were just spitting out designs using online simulation tools,” Treuille continued. “By actually synthesizing the most promising designs in Das’ lab at Stanford, we’re giving our community feedback about what works and doesn’t work in the physical world. And, as a result, these non-experts are providing us insight into RNA design that is significantly advancing the science.”

RNA, or ribonucleic acid, is one of the three macromolecules essential for life, along with DNA and proteins. Long recognized as a messenger for genetic information, RNA also may play a much broader role as a regulator of cells. Understanding RNA design could be useful for treating or controlling diseases such as HIV, for creating RNA-based sensors or even for building computers out of RNA.

In the research being reported this week, the researchers tested the performance of the EteRNA community, EteRNABot, and two state-of-the-art RNA design algorithms in generating designs that would cause RNA strands to fold themselves into certain shapes. The computers could generate designs in less than a minute, while most people would take one or two days; synthesizing the molecules to determine the success and quality took a month for each design, so the entire experiment lasted about a year.

In the end, Lee said, the designs produced by humans had a 99 percent likelihood of being superior to those of the prior computer algorithms, while EteRNABot produced designs with a 95 percent likelihood of besting the prior algorithms.

“The quality of the community’s designs is so good that even if you generated thousands of designs with computer algorithms, you’d never find one as good as the community’s,” Lee said.

Players just “recognize patterns”

When the project began, players were asked to design RNA that folded into specific shapes selected by the Das lab. Thanks to technological breakthroughs that now enable Das and his team to synthesize a thousand design sequences each month instead of the original 30, EteRNA has become an open research project to which researchers from labs around the world can submit design challenges.

Though EteRNA players may not be scientifically trained, they nevertheless have instincts that, when bolstered by the lab experiments, can lead to new insights. “Most players didn’t have tactical insights on RNA designs,” Lee said. “They would just recognize patterns — visual patterns.”

“Scientifically, not all of these rules initially seemed to make sense, but people who were following them did better,” he noted.

One design rule generated by the players involves “capping.” RNA consists of long sequences of pairs of nucleotides and usually the easiest way to create a sequence or “stack” that won’t rip itself apart when synthesized is to fill it with guanine-cytosine (GC) pairs. But too many GC pairs can produce some unexpected shapes when synthesized. “It’s like doing origami with a cardboard box,” as one player put it.

Lee said the players found a solution by putting the GC pairs only at the end of the stack — “capping” — and filling the rest of the stack with adenine-uracil pairs.

The project is now looking at expanding its design regimen to include three-dimensional designs. They also are developing a template that researchers in other fields can use to turn scientific projects into online challenges.

EteRNA receives financial support from the National Science Foundation, the National Research Foundation of Korea, Google, and the W.M. Keck Foundation.

Abstract of Proceedings of the National Academy of Sciences

Self-assembling RNA molecules present compelling substrates for the rational interrogation and control of living systems. However, imperfect in silico models — even at the secondary structure level — hinder the design of new RNAs that function properly when synthesized. Here, we present a unique and potentially general approach to such empirical problems: the Massive Open Laboratory. The EteRNA project connects 37,000 enthusiasts to RNA design puzzles through an online interface. Uniquely, EteRNA participants not only manipulate simulated molecules but also control a remote experimental pipeline for high-throughput RNA synthesis and structure mapping. We show herein that the EteRNA community leveraged dozens of cycles of continuous wet laboratory feedback to learn strategies for solving in vitro RNA design problems on which automated methods fail. The top strategies — including several previously unrecognized negative design rules — were distilled by machine learning into an algorithm, EteRNABot. Over a rigorous 1-y testing phase, both the EteRNA community and EteRNABot significantly outperformed prior algorithms in a dozen RNA secondary structure design tests, including the creation of dendrimer-like structures and scaffolds for small molecule sensors. These results show that an online community can carry out large-scale experiments, hypothesis generation, and algorithm design to create practical advances in empirical science.