IBM scientists say radical new ‘in-memory’ computing architecture will speed up computers by 200 times

New architecture to enable ultra-dense, low-power, massively-parallel computing systems optimized for AI
October 25, 2017

(Left) Schematic of conventional von Neumann computer architecture, where the memory and computing units are physically separated. To perform a computational operation and to store the result in the same memory location, data is shuttled back and forth between the memory and the processing unit. (Right) An alternative architecture where the computational operation is performed in the same memory location. (credit: IBM Research)

IBM Research announced Tuesday (Oct. 24, 2017) that its scientists have developed the first “in-memory computing” or “computational memory” computer system architecture, which is expected to yield 200x improvements in computer speed and energy efficiency — enabling ultra-dense, low-power, massively parallel computing systems.

Their concept is to use one device (such as phase change memory or PCM*) for both storing and processing information. That design would replace the conventional “von Neumann” computer architecture, used in standard desktop computers, laptops, and cellphones, which splits computation and memory into two different devices. That requires moving data back and forth between memory and the computing unit, making them slower and less energy-efficient.

The researchers used PCM devices made from a germanium antimony telluride alloy, which is stacked and sandwiched between two electrodes. When the scientists apply a tiny electric current to the material, they heat it, which alters its state from amorphous (with a disordered atomic arrangement) to crystalline (with an ordered atomic configuration). The IBM researchers have used the crystallization dynamics to perform computation in memory. (credit: IBM Research)

Especially useful in AI applications

The researchers believe this new prototype technology will enable ultra-dense, low-power, and massively parallel computing systems that are especially useful for AI applications. The researchers tested the new architecture using an unsupervised machine-learning algorithm running on one million phase change memory (PCM) devices, successfully finding temporal correlations in unknown data streams.

“This is an important step forward in our research of the physics of AI, which explores new hardware materials, devices and architectures,” says Evangelos Eleftheriou, PhD, an IBM Fellow and co-author of an open-access paper in the peer-reviewed journal Nature Communications. “As the CMOS scaling laws break down because of technological limits, a radical departure from the processor-memory dichotomy is needed to circumvent the limitations of today’s computers.”

“Memory has so far been viewed as a place where we merely store information, said Abu Sebastian, PhD. exploratory memory and cognitive technologies scientist, IBM Research and lead author of the paper. But in this work, we conclusively show how we can exploit the physics of these memory devices to also perform a rather high-level computational primitive. The result of the computation is also stored in the memory devices, and in this sense the concept is loosely inspired by how the brain computes.” Sebastian also leads a European Research Council funded project on this topic.

* To demonstrate the technology, the authors chose two time-based examples and compared their results with traditional machine-learning methods such as k-means clustering:

  • Simulated Data: one million binary (0 or 1) random processes organized on a 2D grid based on a 1000 x 1000 pixel, black and white, profile drawing of famed British mathematician Alan Turing. The IBM scientists then made the pixels blink on and off with the same rate, but the black pixels turned on and off in a weakly correlated manner. This means that when a black pixel blinks, there is a slightly higher probability that another black pixel will also blink. The random processes were assigned to a million PCM devices, and a simple learning algorithm was implemented. With each blink, the PCM array learned, and the PCM devices corresponding to the correlated processes went to a high conductance state. In this way, the conductance map of the PCM devices recreates the drawing of Alan Turing.
  • Real-World Data: actual rainfall data, collected over a period of six months from 270 weather stations across the USA in one hour intervals. If rained within the hour, it was labelled “1” and if it didn’t “0”. Classical k-means clustering and the in-memory computing approach agreed on the classification of 245 out of the 270 weather stations. In-memory computing classified 12 stations as uncorrelated that had been marked correlated by the k-means clustering approach. Similarly, the in-memory computing approach classified 13 stations as correlated that had been marked uncorrelated by k-means clustering. 


Abstract of Temporal correlation detection using computational phase-change memory

Conventional computers based on the von Neumann architecture perform computation by repeatedly transferring data between their physically separated processing and memory units. As computation becomes increasingly data centric and the scalability limits in terms of performance and power are being reached, alternative computing paradigms with collocated computation and storage are actively being sought. A fascinating such approach is that of computational memory where the physics of nanoscale memory devices are used to perform certain computational tasks within the memory unit in a non-von Neumann manner. We present an experimental demonstration using one million phase change memory devices organized to perform a high-level computational primitive by exploiting the crystallization dynamics. Its result is imprinted in the conductance states of the memory devices. The results of using such a computational memory for processing real-world data sets show that this co-existence of computation and storage at the nanometer scale could enable ultra-dense, low-power, and massively-parallel computing systems.