GPU-based server farm slashes analysis time for DNA sequencing data batches from days to hours

December 15, 2011

BGI's NVIDIA Tesla GPU-based server farm (credit: BGI)

BGI, the world’s largest genomics institute, has slashed the time to analyze batches of DNA sequencing data from nearly four days to just six hours using a NVIDIA Tesla GPU-based server farm.

The speed up is considered a critically important step in determining the chemical building blocks that make up a DNA molecule in an affordable manner. This is key for the genomics industry to achieve its target of the $1,000 genome — the point at which genomics can be used in clinical diagnostic tests as a practical component of patient care.

“We are drowning in the genome data that our high-throughput sequencing machines create every day,” said Dr. Bingqiang Wang, head of high performance computing from BGI. “GPU acceleration of our genome analysis applications enables our scientists to crunch through data and gain insights into bacteria, plants and humans faster than was ever possible. It offers the potential for researchers and healthcare professionals to identify highly effective and affordable individualized medicines and treatments.”

BGI researchers and collaborators have developed three genome data analysis applications that are accelerated by NVIDIA Tesla GPUs:

  • SOAP3 aligner: Aligns short reads from the sequencing machine against existing reference genome sequences. Through GPU acceleration, the SOAP3 aligner can find all three-mismatch alignments in tens of seconds per one million reads, instead of tens of minutes without GPU acceleration. This means that sequencing and assembling of individual genomes for comparison to those previously sequenced and studied can be performed quickly to understand potential future disease states and treatments.
  • GSNP (SNP detection): A GPU-accelerated version of the widely used SOAPsnp software that detects variation of a single nucleotide polymorphism (SNP) in the DNA of a genome. These DNA SNP variations can be used to study how individuals develop diseases differently and respond to bacteria, viruses and medicines.
  • GAMA (high resolution genotyping tool): Finds the distribution of the occurrence or frequency of particular gene variants, such as eye color or the propensity for prostate cancer in a set of genes.

BGI does groundbreaking work in sequencing the genomes of a wide range of life forms, ranging from plants and E.coli to the giant panda, to develop better medicines, improve healthcare and develop genetically enhanced food. BGI’s sequencing output is expected to soon surpass the equivalent of more than 700,000 human genomes per year, a dramatic increase over initial efforts, which took 13 years to sequence a single genome.

Tesla GPUs are massively parallel accelerators based on the NVIDIA CUDA parallel computing architecture.

See also: NVIDIA opens up CUDA platform, helping accelerate the path to exascale computing