Wednesday, April 08, 2009

Action-selection and Attention-allocation: a common problem and a common solution?

I have recently blogged a bit about action-selection and operant learning, emphasizing that the action one chooses, out of many possible, is driven by maximizing the utility function associated with the set of possible actions, so perhaps a quick read of last few posts would help appreciate where I come from .

To recap, whenever an organism makes a decision to indulge in an act (an operant behavior), there are many possible actions from which it has to choose the most appropriate one. Each action leads to a possibly different Outcome and the organism may value the outcomes differentially. this valuation may be both objective (how the organism actually 'likes' the outcome once it happens, or it may be subjective and based on how keenly the organism 'wants' the outcome to happen independent on whether the outcome is pleasurable or not. Also, it is never guaranteed that the action would produce the desired/expected outcome. There is always some probability associated that the act may or may not result in the expected outcome. Also, on a macro level the organism may lack sufficient energy required to indulge in the act or to carry it out successfully to completion. Mathematically, with each action one can associate a utility U= E x V (where U is utility of act; E is expectancy as to whether one would be able to carry the act and if so whether the act would result in desired outcome; and V is the Value (both subjective and objective0 that one has assigned to the outcome. The problem of action-selection then is simply to maximize the utility given different acts n and to choose the action with maximum utility.

Today I had an epiphany; doesn't the same logic apply to allocating attention to the various stimuli that bombard us. Assuming a spotlight view of attention, and assuming that there are limited attentional resources, one is constantly faced with the problem of finding which stimuli in the world are salient and need to be attended to. Now, the leap I am making is that attention-allocation just like choosing to act volitionally is an operant and not a reactive, but pro-active process. It may be unconscious, but still it involves volition and 'choosing'. Remember, that even acts can be reactive and thus there is room for reactive attention; but what I am proposing is that the majority of attention is pro-active- actively choosing between stimuli and focusing on one to try and better predict the world. We are basically prediction machines that want to predict beforehand the state of the world that is most relevant to us and this we do by classical or pavlovian conditioning. We try to associate stimuli (CS) with stimuli(UCS) or response (UCR) and thus try to ascertain what state of world at time T would be given that stimulus (CS) has happened. Apart from prediction machines we are also Agents that try to maximize rewards and minimize punishments by acting on this knowledge and acting and interacting with the world. There are thousands of actions we can indulge in- but we choose wisely; there are thousands of stimuli in the external world, but we attend to salient features wisely.

Let me elaborate on the analogy. While selecting an action we maximize reward and minimize punishment, basically we choose the maximal utility function; while choosing which stimuli to attend to we maximize our foreknowledge of the world and minimize surprises, basically we choose the maximal predictability function; we can even write an equivalent mathematical formula: Predictability P = E x R where P is the increase in predictability due to attending to stimulus 1 ; E is probability that stimulus 1 correctly leads to prediction of stimulus 2; and R is the Relevance of stimulus 2(information) to us. Thus the stimulus one would attend, is the one that leads to maximum gain in predictability. Also, similar to the general energy level of organism that would bias as to whether, and how much, the organism acts or not; there is a general arousal level of the organism that biases whether and how much it would attend to stimuli.

So, what new insights do we gain from this formulation? First insight we may gain is by elaborating the analogy further. We know that basal ganglia in particular and dopamine in general is involved in action-selection. Dopamine is also heavily involved in operant learning. We can predict that dopamine systems , and the same underlying mechanisms, may also be used for attention-allocation. Dopamine may also be heavily involved in classical learning as well. Moreover, the basic computations and circuitry involved in allocating attention should be similar to the one involved in action-selection. Both disciplines can learn from each other and utilize methods developed in one field for understanding and elaborating phenomenon in the other filed. For eg; we know that dopamine while coding for reward-error/ incentive salience also codes for novelty and is heavily involved in novelty detection. Is the novelty detection driven by the need to avoid surprises, especially while allocating attention to a novel stimulus.

What are some of the prediction we can make form this model: just like the abundant literature on U= E x V in decision making and action selection literature, we should be able to show the independent and interacting effects of Expectancy and Relevance on attention-grabbing properties of stimulus. The relevance of different stimuli can be manipulated by pairing them with UCR/UCS that has different degrees of relevance. The expectancy can be differentially manipulated by the strength of conditioning; more trials would mean that the association between the CS and UCS is strong; also the level of arousal may bias the ability to attend to stimuli. I am sure that there is much to learn in attention research from the research on decision-making and action-selection and the reverse would also be true. It may even be that attention-allocation is actually conceptualized in the above terms; if so I plead ignorance of knowledge of this sub-field and would love to get a few pointers so that I can refine my thinking and framework.

Also consider the fact that there is already some literature implicating dopamine in attention and the fact that dopamine dysfunction in schizophrenia, ADHD etc has cognitive and attentional implications is an indication in itself. Also, the contextual salience of drug-related cues may be a powerful effect of dapomine based classical conditioning  and attention allocation hijacking the normal dopamine pathways in addicted individuals. 

Lastly, I got set on this direction while reading an article on chaining of actions to get desired outcomes and how two different brain systems ( a cognitive (Prefrontal) high road one based on model-based reinforcement learning and a unconscious low road one (dorsolateral striatal) based on model-free reinforcement learning)may be involved in deciding which action to choose and select. I believe that the same conundrum would present itself when one turns attention to the attention allocation problem, where stimuli are chained together and predict each other in succession); I would predict that there would be two roads involved here too! but that is matter for a future post. for now, would love some honest feedback on what value, if any, this new conceptualization adds to what we already know about attention allocation.

Sphere: Related Content

Friday, April 03, 2009

Low Mood and Risk Aversion: a poor State outcome?

Daniel Nettle, writes an article in Journal Of Theoretical Biology about the evolution of low mood states. Before I get to his central thesis, let us review what he reviews:

Low mood describes a temporary emotional and physiological state in humans, typically characterised by fatigue, loss of motivation and interest, anhedonia (loss of pleasure in previously pleasurable activities), pessimism about future actions, locomotor retardation, and other symptoms such as crying.
...
This paper focuses on a central triad of symptoms which are common across many types of low mood, namely anhedonia, fatigue and pessimism. Theorists have argued that, whereas their opposites facilitate novel and risky behavioural projects. These symptoms function to reduce risk-taking. They do this, proximately, by making the potential payoffs seem insufficiently rewarding (anhedonia), the energy required seem too great (fatigue), or the probability of success seem insufficiently high (pessimism). An evolutionary hypothesis for why low mood has these features, then, is that is adaptive to avoid risky behaviours when one is in a relatively poor current state, since one would not be able to bear the costs of unsuccessful risky endeavors at such times .

I would like to pause here and note how he has beautifully summed up the low mood symptoms and key features; taking liberty to define using my own framework of Value X Expectancy and distinction between cognitive('wanting') and behavioral ('liking') side of things :
  • Anhedonia: behavioral inability to feel rewarded by previously pleasurable activities. Loss of 'liking' following the act. Less behavioral Value assigned.
  • Loss of motivation and interest: cognitive inability to look forward to or value previously desired activities. Loss of 'wanting' prior to the act. Less cognitive Value assigned.
  • Fatigue: behavioral inability to feel that one can achieve the desired outcome due to feelings that one does not have sufficient energy to carry the act to success. Less behavioral Expectancy assigned.
  • Pessimism: cognitive inability to look forward to or expect good things about the future or that good outcomes are possible. Less cognitive Expectancy assigned.
The reverse conglomeration is found in high mood- High wanting and liking, high energy and outlook. Thus, I agree with Nettle fully that low mood and high mood are defined by these opposed features and also that these features of low and high mood are powerful proximate mechanisms that determine the risk proneness of the individual: by subjectively manipulating the Value and Expectancy associated with an outcome, the high and low mood mediate the risk proneness that an organism would display while assigning a utility to the action. Thus, it is fairly settled: if ultimate goal is to increase risk-prone behavior than the organism should use the proximate mechanism of high mood; if the ultimate goal is to avoid risky behavior, then the organism should display low mood which would proximately help it avoid risky behavior.

Now let me talk about Nettle's central thesis. It has been previously proposed in literature that low mood (and thus risk-aversion) is due to being in a poor state wherein one can avoid energy expenditure (and thus worsening of situation) by assuming a low profile. Nettle plays the devil's advocate and argues that an exactly opposite argument can be made that the organism in a poor state needs to indulge in high risk (and high energy) activities to get out of the poor state. Thus, there is no a prior reason as to why one explanation may be more sound than the other. To find out when exactly high risk behavior pay off and when exactly low risk behaviors are more optimal, he develops a model and uses some elementary mathematics to derive some conclusions. He, of course , bases his model on a Preventive focus, whereby the organism tries to minimize getting in a state R , which is sub-threshold. He allows the S(t) to be maximized under the constraint that one does not lose sight of R. I'll not go into the mathematics, but the results are simple. When there is a lot of difference between R (dreaded state) and S (current state), then the organism adopts a risky behavioral profile. when the R and S are close, he maintains low risk behavior, however when he is in dire circumstances (R and S are very close) then risk proneness again rises to dramatic levels. To quote:

The model predicts that individuals in a good state will be prepared to take relatively large risks, but as their state deteriorates, the maximum riskiness of behaviour that they will choose declines until they become highly risk-averse. However, when their state becomes dire, there is a predicted abrupt shift towards being totally risk-prone. The switch to risk-proneness at the dire end of the state continuum is akin to that found near the point of starvation in the original optimal foraging model from which the current one is derived (Stephens, 1981). The graded shift towards greater preferred risk with improving state is novel to this model, and stems from the stipulation that if the probability of falling into the danger zone in the next time step is minimal, then the potential gain in S at the next time step should be maximised. However, a somewhat similar pattern of risk proneness in a very poor state, risk aversion in an intermediate state, and some risk proneness in a better state, is seen in an optimal-foraging model where the organism has not just to avoid the threshold of starvation, but also to try to attain the threshold of reproduction (McNamara et al., 1991). Thus, the qualitative pattern of results may emerge quite generally from models using different assumptions.

Nettle, then extrapolates the clinical significance from this by proposing that 'agitated' / 'excited' depression can be explained as when the organism is in dire straits and has thus become risk-prone. He also uses a similar logic for dysphoric mania although I don't buy that. However, I agree that euphoric mania may just be the other extreme of high mood and more risk proneness and goal achievements; while depression the normal extreme of low mood and adverse circumstances and risk aversion. To me this model ties up certain things we know about life circumstances and the risk profile and mood tone of people and contributes to deepening our understanding.
ResearchBlogging.org
Nettle, D. (2009). An evolutionary model of low mood states Journal of Theoretical Biology, 257 (1), 100-103 DOI: 10.1016/j.jtbi.2008.10.033

Sphere: Related Content