The Mouse Trap: Basal Ganglia: action selection, error prediction and reinforcement learning

The December edition of Dana Foundation's online brain journal , cerebrum , has a very informative and interesting piece on the role of basal ganglia in response selection, error prediction and reinforcement learning.

The article contains a primer on basic basal ganglia functions and pathways.

The basal ganglia are a collection of interconnected areas deep below the cerebral cortex. They receive information from the frontal cortex about behavior that is being planned for a particular situation. In turn, the basal ganglia affect activity in the frontal cortex through a series of neural projections that ultimately go back up to the same cortical areas from which they received the initial input. This circuit enables the basal ganglia to transform and amplify the pattern of neural firing in the frontal cortex that is associated with adaptive, or appropriate, behaviors, while suppressing those that are less adaptive. The neurotransmitter dopamine plays a critical role in the basal ganglia in determining, as a result of experience, which plans are adaptive and which are not.
Evidence from several lines of research supports this understanding of the role of basal ganglia and dopamine as major players in learning and selecting adaptive behaviors. In rats, the more a behavior is ingrained, the more its neural representations in the basal ganglia are strengthened and honed. Rats depleted of basal ganglia dopamine show profound deficits in acquiring new behaviors that lead to a reward. Experiments pioneered by Wolfram Schultz, M.D., Ph.D., at the University of Cambridge have shown that dopamine neurons fire in bursts when a monkey receives an unexpected juice reward. Conversely, when an expected reward is not delivered, these dopamine cells actually cease firing altogether, that is, their firing rates “dip” below what is normal. These dopamine bursts and dips are thought to drive changes in the strength of synaptic connections—the neural mechanism for learning—in the basal ganglia so that actions are reinforced (in the case of dopamine bursts) or punished (in the case of dopamine dips)

In particular it discusses the role of dopaminergic receptors in the GO and NoGO pathways that are involved in positive and negative reinforcement learning respectively.

Building on a large body of earlier theoretical work, my colleagues and I developed a series of computational models that explore the role of the basal ganglia when people select motor and cognitive actions. We have been focusing on how When the “Go” pathway is active, it facilitates an action directed by the frontal cortex, such as touching your pinkies together. But when the opposing “NoGo” pathway is more active, the action is suppressed. dopamine signals in the basal ganglia, which occur as a result of positive and negative outcomes of decisions (that is, rewards and punishments), drive learning. This learning is made possible by two main types of dopamine receptors, D1 and D2, which are associated with two separate neural pathways through the basal ganglia. When the “Go” pathway is active, it facilitates an action directed by the frontal cortex, such as touching your pinkies together. But when the opposing “NoGo” pathway is more active, the action is suppressed. These Go and NoGo pathways compete with each other when the brain selects among multiple possible actions, so that an adaptive action can be facilitated while at the same time competing actions are suppressed. This functionality can allow you to touch your pinkies together, not perform another potential action (such as scratching an itch on your neck), or to concentrate on a math problem instead of daydreaming.

But how does the Go/NoGo system know which action is most adaptive? One answer, we think (and as you might have guessed), is dopamine. During unexpected rewards, dopamine bursts drive increased activity and changes in synaptic plasticity (learning) in the Go pathway. When a given action is rewarded in a particular environmental context, the associated Go neurons learn to become more active the next time that same context is encountered. This process depends on the D1 dopamine receptor, which is highly concentrated in the Go pathway. Conversely, when desired rewards are not received, the resulting dips in dopamine support increases in synaptic plasticity in the NoGo pathway (a process that depends on dopamine D2 receptors concentrated in that pathway). Consequently, these nonrewarding actions will be more likely to be suppressed in the future.

It then goes on to consider the different types of learner: positive learners that have a more active GO system and negative learners that have a more active NoGO system.

This theoretical framework, which integrates anatomical, physiological, and psychological data into a single coherent model, can go a long way in explaining changes in learning, memory, and decision making as a function of changes in basal ganglia dopamine. In particular, this model makes a key, previously untested, prediction that greater amounts of dopamine (via D1 receptors) support learning from positive feedback, whereas decreases in dopamine (via D2 receptors) support learning from negative feedback.

They then experimentally manipulated the dopamine levels and verified their predictions. The experiment involved a simple game in which two symbols say A and B were paired consistently (along with other symbols say 'CD') with subjects required to choose one of them. After each choosing, the subject was given feedback as to whether he had been rewarded or punished. This feedback was not consistently related to the choice : 'A' was rewarded with positive feedback 80% of times, while 'B' was punished with negative feedback 80 % of the times. Thus though an implicit learning would happen to chose A and reject B, this rule would not be explicitly learned. Now, comes the interesting part, choose A strategy is related to positive learning and Avoid B strategy with negative learning. When these symbols A and B, in test phase were paired with new symbols say E and F respectively, subjects should have implicitly still gone with choose A and Avoid B strategy with equal inclination. Yet, administering dopamine affecting drugs had dramatic effects.

We found a striking effect of the different dopamine medications on this positive versus negative learning bias, consistent with predictions from our computer model of the learning process. While on placebo, participants performed equally well at choose-A and avoid-B test choices. But when their dopamine levels were increased, they were more successful at choosing the most positive symbol A and less successful at avoiding B. Conversely, lowered dopamine levels were associated with the opposite pattern: worse choose-A performance but more-reliable avoid-B choices. Thus the dopamine medications caused participants to learn more or less from positive versus negative outcomes of their decisions

They then go on to apply these results to Parkinson's patients.In Parkinson's patients have deficits in basal ganglia dopamine levels - especially in the NoGO pathway. Medication is L-Dopa which is a dopamine precursor and acts by increasing dopamine in the basal ganglia. They hypothesized, that people with untreated Parkinson's will be negative learners (less dopamine and less the dip), while those on medication would be positive learners 9more dopamine and more the burst).

To test this idea, we presented people with Parkinson’s disease with the same choose-A/avoid-B learning task once while they were on their regular dose of dopamine medication and another time while off it.8 Consistent with what we predicted, we found that, indeed, patients who were off the medication were relatively impaired at learning to choose the most positive stimulus A, but showed intact or even enhanced learning of avoid-B. Dopamine medication reversed this bias, improving choose-A performance but impairing avoid-B. This discovery supports the idea that medication prevents dopamine dips during negative feedback and impairs learning based on negative feedback

This notion might explain why some medicated Parkinson’s patients develop pathological gambling behaviors, which could result from enhanced learning from gains together with an inability to learn from losses.

The above (gambling in those on dopamine) I have touched earlier also, in relation to psychosis and schizophrenia, where dopamine excess is suspected. In those cases, having a consistently high dopamine level may predispose towards positive behavioral learning and positive cognitive learning. The latter may be the underlying manic loop, whereby only positively rewarded cognitions become salient, leading to a rosy picture of universe. Negatively reinforced cognitions are not registered properly and not learned/ remembered.

They then go on to discuss other implications like in ADHD, wherein the total noise in dopamine neurons may be higher, leading to both lowered positive and negative learning (my conjecture, not author's) and in addiction.

Overall a very fascinating article indeed.

Sphere: Related Content

Thursday, December 13, 2007

Basal Ganglia: action selection, error prediction and reinforcement learning

No comments:

The Mouse Trap

Popular Posts

Search This Blog

If you like Mouse Trap

The Sculptor and the Sandman

Social Presence

Blog Archive

Labels

About Me

Followers

Bloggie Stuff