Storage system dramatically speeds access to ‘big data’

February 3, 2014

(Credit: Sang-Woo Jun et al.)

MIT researchers have developed a storage system for big-data analytics that can dramatically reduce the time it takes to access information by using a network of flash storage devices.

Currently, information tends to be stored on multiple hard disks on a number of machines across an Ethernet network.

With the new flash-based storage system, data in a large dataset can typically be randomly accessed in microseconds. That’s about 1,000 times faster than the typical four to 12 milliseconds with hard disks, according to Sang-Woo Jun, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT.

Flash systems also are nonvolatile, meaning they do not lose any of the information they hold if the computer is switched off.

BlueDBM top level system diagram: multiple storage nodes are connected using high speed serial links, forming an inter-controller network (credit: Sang-Woo Jun et al.)

Blue Database Machine

In the storage system the researchers designed, known as BlueDBM (Blue Database Machine), each flash device is connected to a field-programmable gate array (FPGA) chip to create an individual node. The FPGAs are used to control the flash device; they can also perform processing operations on the data itself to avoid having to move the data.

What’s more, FPGA chips can be linked together using a high-performance serial network, which has a very low latency, or time delay, allowing information from any of the nodes to be accessed within a few nanoseconds.

“So if we connect all of our machines using this network, it means any node can access data from any other node with very little performance degradation, [and] it will feel as if the remote data were sitting here locally,” Jun says. “Using multiple nodes allows the team to get the same bandwidth and performance from their storage network as far more expensive machines.”

The team is building a 16-node prototype network, in which each node will operate at 3 gigabytes per second, with a capacity of 16 to 32 terabytes.

Simulating all the particles of the universe

Using the new hardware, CSAIL graduate student Ming Liu is also building a database system designed for use in big-data analytics. The system will use the FPGA chips to perform computation on the data as it is accessed by the host computer, to speed up the process of analyzing the information.

“If we’re fast enough, if we add the right number of nodes to give us enough bandwidth, we can analyze high-volume scientific data at around 30 frames per second, allowing us to answer user queries at very low latencies, making the system seem real-time,” he says. “That would give us an interactive database.”

As an example of the type of information the system could be used on, the team has been working with data from a simulation of the universe generated by researchers at the University of Washington. The simulation contains data on all the particles in the universe, across different points in time.

“Scientists need to query this rather enormous dataset to track which particles are interacting with which other particles, but running those kind of queries is time-consuming,” Jun says. “We hope to provide a real-time interface that scientists can use to look at the information more easily.”

The system will be presented in February at the International Symposium on Field-Programmable Gate Arrays in Monterey, Calif.

Abstract of International Symposium on Field-Programmable Gate Arrays paper

For many “Big Data” applications, the limiting factor in performance is often the transportation of large amount of data from hard disks to where it can be processed, i.e. DRAM. In this paper we examine an architecture for a scalable distributed flash store which aims to overcome this limitation in two ways. First, the architecture provides a highperformance, high-capacity, scalable random-access storage. It achieves high-throughput by sharing large numbers of fl ash chips across a low-latency, chip-to-chip backplane network managed by the flash controllers.  The additional latency for remote data access via this network is negligible as compared to flash access time. Second, it permits some computation near the data via a FPGA-based programmable flash controller. The controller is located in the datapath between the storage and the host, and provides hardware acceleration for applications without any additional latency. We have constructed a small-scale prototype whose network bandwidth scales directly with the number of nodes, and where average latency for user software to access flash store is less than 70s, including 3.5s of network overhead.