What happens in the brain when we learn

Findings could enhance teaching methods and lead to treatments for cognitive problems
October 28, 2015

Isolated cells in the visual cortex of a mouse (credit: Alfredo/Kirkwood (JHU))

A Johns Hopkins University-led research team has proven a working theory that explains what happens in the brain when we learn, as described in the current issue of the journal Neuron.

More than a century ago, Pavlov figured out that dogs fed after hearing a bell eventually began to salivate when they heard the bell ring. The team looked into the question of how Pavlov’s dogs (in “classical conditioning”) managed to associate an action with a delayed reward to create knowledge. For decades, scientists had a working theory of how it happened, but the team is now the first to prove it.

“If you’re trying to train a dog to sit, the initial neural stimuli, the command, is gone almost instantly — it lasts as long as the word sit,” said neuroscientist Alfredo Kirkwood, a professor with the university’s Zanvyl Krieger Mind/Brain Institute. “Before the reward comes, the dog’s brain has already turned to other things. The mystery was, ‘How does the brain link an action that’s over in a fraction of a second with a reward that doesn’t come until much later?’ ”

Eligibility traces

The working theory — which Kirkwood’s team has now validated experimentally — is that invisible “synaptic eligibility traces” effectively tag the synapses activated by the stimuli so that the learning can be cemented with the arrival of a reward. The reward is a neuromodulator* (neurochemical) that floods the dog’s brain with “good feelings.” Though the brain has long since processed the “sit” command, eligibility traces in the synapse respond to the neuromodulators, prompting a lasting synaptic change, a.k.a. “learning.”

The team was able to prove the eligibility-traces theory by isolating cells in the visual cortex of a mouse. When they stimulated the axon of one cell with an electrical impulse, they sparked a response in another cell. By doing this repeatedly, they mimicked the synaptic response between two cells as they process a stimulus and create an eligibility trace.

When the researchers later flooded the cells with neuromodulators, simulating the arrival of a delayed reward, the response between the cells strengthened (“long-term potentiation”) or weakened (“long-term depression”), showing that the cells had “learned” and were able to do so because of the eligibility trace.

“This is the basis of how we learn things through reward,” Kirkwood said, “a fundamental aspect of learning.”

In addition to a greater understanding of the mechanics of learning, these findings could enhance teaching methods and lead to treatments for cognitive problems, the researchers suggest.

Scientists at the University of Texas at Houston and the University of California, Davis were also involved in the research, which was supported by grants from JHU’s Science of Learning Institute and National Institutes of Health.

* The neuromodulators tested were norepinephrine, serotonin, dopamine, and acetylcholine, all of which have been implicated in cortical plasticity (ability to grow and form new connections to other neurons).

Abstract of Distinct Eligibility Traces for LTP and LTD in Cortical Synapses

In reward-based learning, synaptic modifications depend on a brief stimulus and a temporally delayed reward, which poses the question of how synaptic activity patterns associate with a delayed reward. A theoretical solution to this so-called distal reward problem has been the notion of activity-generated “synaptic eligibility traces,” silent and transient synaptic tags that can be converted into long-term changes in synaptic strength by reward-linked neuromodulators. Here we report the first experimental demonstration of eligibility traces in cortical synapses. We demonstrate the Hebbian induction of distinct traces for LTP and LTD and their subsequent timing-dependent transformation into lasting changes by specific monoaminergic receptors anchored to postsynaptic proteins. Notably, the temporal properties of these transient traces allow stable learning in a recurrent neural network that accurately predicts the timing of the reward, further validating the induction and transformation of eligibility traces for LTP and LTD as a plausible synaptic substrate for reward-based learning.