UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Dopaminergic mechanisms guiding probabilistic choice Stopper, Colin Michael 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2014_spring_stopper_colin.pdf [ 2.68MB ]
Metadata
JSON: 24-1.0167375.json
JSON-LD: 24-1.0167375-ld.json
RDF/XML (Pretty): 24-1.0167375-rdf.xml
RDF/JSON: 24-1.0167375-rdf.json
Turtle: 24-1.0167375-turtle.txt
N-Triples: 24-1.0167375-rdf-ntriples.txt
Original Record: 24-1.0167375-source.json
Full Text
24-1.0167375-fulltext.txt
Citation
24-1.0167375.ris

Full Text

 DOPAMINERGIC MECHANISMS GUIDING PROBABILISTIC CHOICE  by Colin Michael Stopper  B.S., The University of Connecticut, 2008 M.A., The University of British Columbia, 2010  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Psychology)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  April 2014  © Colin Michael Stopper, 2014 ii  Abstract  Dopamine, acting via different modes of transmission, is involved in making cost-benefit decisions involving reward uncertainty.  Phasic and tonic dopamine contribute to different behavioral strategies by influencing the connectivity and strength of inputs within the cortico-limbic-striatal circuit.  Chapter 1 introduces the construct of risk-based decision-making and approaches to studying it before delving into how tonic and phasic dopamine are involved in encoding reward uncertainty and reward prediction error.  This chapter also describes the nucleus accumbens (NAc) as a corticolimbic interface innervated by dopamine and the lateral habenula (LHb) as a modulator of dopamine and source of negative reward prediction error.  Chapter 2 examines how receptor-selective dopaminergic drugs infused into the NAc influence risk-based decision-making.  The D2 receptor, mainly influenced by tonic dopamine, was not involved in risky choice.  Stimulation of the D1 receptor, presumably by phasic dopamine, optimized decision-making while blockade of this receptor made animals risk-averse.  Chapter 3 demonstrates that the LHb is critical for expressing subjective choice preferences.  Inactivation of the LHb induced indifference in animals trained on probabilistic and delay discounting tasks.  LHb inactivation on a simpler reward magnitude discrimination task and inactivation of closely adjacent areas were without effect, highlighting both the behavioral and anatomical specificity of this effect.  Chapter 4 reveals that temporally and behaviorally specific phasic dopamine changes are critical for probabilistic choice biases.  Ventral tegmental area stimulation following a risky loss increased risky choice while stimulation of the LHb, or intermediary rostromedial tegmental area, following a risky win decreased risky choice.  Chapter 5 integrates these findings with iii  previous literature and proposes ideas for how phasic and tonic dopamine acting in the cortico-limbic-striatal circuit are used to bias choices involving reward uncertainty.  iv  Preface  Chapters 2 through 4 are based on work conducted in the Neural Circuits and Cognition Laboratory by Dr. S.B. Floresco and [Colin Michael Stopper].  I worked in collaboration with Dr. S.B. Floresco to design the experiments.  Along with undergraduate research assistants I performed the experiments and collected the data.  I analyzed the data and wrote the manuscripts with assistance from Dr. S.B. Floresco.   A version of Chapter 2 has been published.  [Stopper, C.M.], Khayambashi, S., and Floresco, S.B. (2013) Receptor-specific modulation of risk-based decision making by nucleus accumbens dopamine. Neuropsychopharmacology 38:715-728. doi:10.1038/npp.2012.240 http://www.nature.com/npp/journal/v38/n5/full/npp2012240a.html o I conducted the testing with technical assistance from S. Khayambashi and wrote the manuscript with assistance from S.B. Floresco.   A version of Chapter 3 has been published.  [Stopper, C.M.] and Floresco, S.B. (2014) What’s better for me? Fundamental role for lateral habenula in promoting subjective decision biases. Nature Neuroscience 17:33-35. doi:10.1038/nn.3587 http://www.nature.com/neuro/journal/v17/n1/full/nn.3587.html o I conducted the testing with technical assistance from undergraduate research assistants and wrote the manuscript with assistance from S.B. Floresco.  v   A version of Chapter 4 is under review for publication.  [Stopper, C.M.], Tse, M.T., Montes, D.R., Wiedman, C.R., and Floresco, S.B. Multiple phasic dopamine signals are necessary for risk-based decision-making. o I conducted the behavioral testing with technical assistance from D.R. Montes and C.R. Wiedman.  I performed the electrophysiological experiments with assistance and guidance from M.T. Tse.  I wrote the manuscript with assistance from S.B. Floresco.   This research was approved by the UBC Animal Care Committee  Certificate: “Modeling Deficits in Cognition in Rats: Behavioural and Electrophysiological Analyses” (A10-0197)  Project title: “Dopaminergic circuits and risky decision making” vi  Table of Contents  Abstract .......................................................................................................................................... ii Preface ........................................................................................................................................... iv Table of Contents ......................................................................................................................... vi List of Tables ................................................................................................................................ xi List of Figures .............................................................................................................................. xii List of Abbreviations ................................................................................................................. xiv Acknowledgements .................................................................................................................... xvi Dedication ................................................................................................................................. xviii Chapter 1: Introduction ................................................................................................................1 1.1 Models of Risk-Reward Decision-Making ..................................................................... 2 1.1.1 Human Studies ............................................................................................................ 2 1.1.2 Animal Studies ............................................................................................................ 4 1.2 Neural Basis of Risk/Reward Decision-Making ............................................................. 5 1.2.1 Human Studies ............................................................................................................ 5 1.2.2 Animal Studies ............................................................................................................ 7 1.3 Dopamine Signaling and Decision-Making .................................................................. 10 1.4 Different Modes of Dopamine Transmission and Reward Prediction Error ................ 13 1.5 Lateral Habenula Modulation of Midbrain Dopamine Systems ................................... 17 1.6 Summary and Objectives .............................................................................................. 21 Chapter 2: Receptor-Specific Modulation of Risk-Based Decision-Making by Nucleus Accumbens Dopamine .................................................................................................................23 vii  2.1 Introduction ................................................................................................................... 23 2.2 Methods......................................................................................................................... 25 2.2.1 Animals ..................................................................................................................... 25 2.2.2 Apparatus .................................................................................................................. 26 2.2.3 Lever Pressing Training/Side Bias Testing............................................................... 26 2.2.4 Decision-Making Tasks ............................................................................................ 28 2.2.4.1 Probabilistic Discounting .................................................................................. 28 2.2.4.2 Reward Magnitude Discrimination ................................................................... 30 2.2.5 Surgery ...................................................................................................................... 31 2.2.6 Drugs and Microinfusion Protocol............................................................................ 31 2.2.7 Histology ................................................................................................................... 34 2.2.8 Data Analysis ............................................................................................................ 35 2.2.8.1 Win-Stay/Lose-Shift Analyses .......................................................................... 35 2.3 Results ........................................................................................................................... 38 2.3.1 Blockade of NAc D1 and D2 Receptors .................................................................... 38 2.3.1.1 D1 Receptor Blockade ....................................................................................... 38 2.3.1.2 D2 Receptor Blockade ....................................................................................... 42 2.3.2 Stimulation of NAc D1, D2, and D3 Receptors .......................................................... 43 2.3.2.1 D1 Receptor Stimulation ................................................................................... 43 2.3.2.2 D2/D3 Stimulation (Quinpirole) ........................................................................ 45 2.3.2.3 Preferential D2 Receptor Stimulation (Bromocriptine)..................................... 45 2.3.2.4 Preferential D3 Stimulation (PD 128 907) ........................................................ 46 2.3.3 Reward Magnitude Discrimination ........................................................................... 48 viii  2.4 Discussion ..................................................................................................................... 50 2.4.1 NAc D1 Receptors and Risk-Based Decision-Making .............................................. 52 2.4.2 NAc D2 Receptors and Risk-Based Decision-Making .............................................. 54 2.4.3 Summary and Conclusions ....................................................................................... 60 Chapter 3: Fundamental Role for the Lateral Habenula in Promoting Subjective Decision Biases .............................................................................................................................................61 3.1 Introduction ................................................................................................................... 61 3.2 Methods......................................................................................................................... 62 3.2.1 Experimental Subjects and Apparatus ...................................................................... 62 3.2.2 Behavioral Tasks ....................................................................................................... 63 3.2.2.1 Probabilistic Discounting .................................................................................. 63 3.2.2.2 Probabilistic Choice with Fixed Reward Probabilities ..................................... 64 3.2.2.3 Delay Discounting ............................................................................................ 65 3.2.2.4 Reward Magnitude Discrimination ................................................................... 65 3.2.2.5 Devaluation Tests.............................................................................................. 65 3.2.3 Surgery and Microinfusion Protocol ......................................................................... 66 3.2.4 Histology ................................................................................................................... 67 3.2.5 Data Analysis ............................................................................................................ 68 3.3 Results ........................................................................................................................... 70 3.3.1 LHb Inactivation during Probabilistic Choice .......................................................... 70 3.3.2 Inactivation of Regions Adjacent to LHb ................................................................. 75 3.3.3 Inactivation of LHb Efferents ................................................................................... 77 3.3.4 Inactivation of the LHb during Delay Discounting .................................................. 77 ix  3.3.5 Inactivation of the LHb during Reward Magnitude Discrimination ......................... 79 3.3.6 Analysis of Relative Value Representation .............................................................. 79 3.3.7 Reinforcer and Response Devaluation during Reward Magnitude Discrimination .. 82 3.4 Discussion ..................................................................................................................... 85 3.4.1 Objective vs. Subjective Reinforcement and Avoidance vs. Preference .................. 86 3.4.2 The LHb Modulates a Variety of Phasic Events to Influence Choice Biases ........... 88 3.4.3 Summary and Conclusions ....................................................................................... 91 Chapter 4: Multiple Phasic Dopamine Signals Exert a Direct Influence on Risk-Based Decision-Making ..........................................................................................................................93 4.1 Introduction ................................................................................................................... 93 4.2 Methods......................................................................................................................... 96 4.2.1 Neurophysiological Studies ...................................................................................... 96 4.2.1.1 Experimental Subjects ...................................................................................... 96 4.2.1.2 Surgery, Extracellular Recordings, and Cell-Searching Procedures ................ 97 4.2.1.3 Histology ........................................................................................................... 99 4.2.1.4 Data Analysis .................................................................................................... 99 4.2.2 Behavioral Studies .................................................................................................. 100 4.2.2.1 Experimental Subjects .................................................................................... 100 4.2.2.2 Operant Apparatus .......................................................................................... 100 4.2.2.3 Lever Press Training ....................................................................................... 101 4.2.2.4 Probabilistic Discounting ................................................................................ 102 4.2.2.5 Reward Magnitude Discrimination ................................................................. 104 4.2.2.6 Stereotaxic Surgery and Brain Stimulation Protocol ...................................... 104 x  4.2.2.7 Histology ......................................................................................................... 107 4.2.2.8 Data Analysis .................................................................................................. 107 4.3 Results ......................................................................................................................... 108 4.3.1 Neurophysiological Studies .................................................................................... 108 4.3.2 Behavioral Studies .................................................................................................. 114 4.3.2.1 LHb Stimulation.............................................................................................. 114 4.3.2.2 RMTg Stimulation .......................................................................................... 123 4.3.2.3 VTA Stimulation ............................................................................................. 125 4.4 Discussion ................................................................................................................... 127 Chapter 5: Discussion ................................................................................................................133 5.1 Mesoaccumbens Dopamine and Risk-Based Decision-Making ................................. 136 5.1.1 The D1 Receptor in Risk-Based Decision-Making ................................................. 138 5.2 The LHb and Dopamine Modulation .......................................................................... 139 5.2.1 The LHb and Multiple Phasic Signals .................................................................... 141 5.3 Phasic Dopamine in Risk-Based Decision-Making .................................................... 142 5.4 Tonic Dopamine in Risk-Based Decision-Making ..................................................... 143 5.5 Experimental Strengths ............................................................................................... 144 5.6 Limitations .................................................................................................................. 147 5.7 Future Directions & Applications ............................................................................... 148 5.8 Conclusions ................................................................................................................. 149 References ...................................................................................................................................151  xi  List of Tables  Table 1. Mean (±SEM) number of risky choices for each probability block separated by ‘wins’ and ‘losses’.................................................................................................................................... 39 Table 2. Mean (±SEM) locomotor activity, response latencies, and omissions during probabilistic discounting or reward magnitude discrimination. ........................................................................ 41 Table 3. Locomotion, choice latencies, and trial omission data. .................................................. 74 Table 4. Mean (±SEM) locomotor activity, response latencies, and omissions. ........................ 117  xii  List of Figures  Figure 1. Probabilistic discounting task design. ........................................................................... 29 Figure 2. Schematic sections of the rat brain showing location of acceptable infusions in the NAc. .............................................................................................................................................. 36 Figure 3. Blockade of D1, but not D2, receptors in NAc reduces risky choice. ............................ 40 Figure 4. Stimulation of D1, but not D2, receptors in NAc modifies risky choice. ....................... 44 Figure 5. Stimulation of D3 receptors in the NAc reduces preference for larger, uncertain rewards. ......................................................................................................................................... 47 Figure 6. Dopaminergic manipulations that decrease risky choice do not impair reward magnitude discrimination.............................................................................................................. 49 Figure 7. Inactivation of the LHb abolishes choice biases during probabilistic discounting. ...... 72 Figure 8. Individual data from all rats tested on the probabilistic discounting task following LHb inactivation and control treatments. .............................................................................................. 73 Figure 9. Inactivation of regions adjacent to the LHb does not affect decision-making. ............. 76 Figure 10. RMTg, but not dorsal raphe, inactivation alters probabilistic choice.......................... 78 Figure 11. LHb inactivation makes animals indifferent in choices between large, delayed and small, immediate rewards. ............................................................................................................ 80 Figure 12. LHb inactivation does not reduce preference for large, certain rewards. .................... 81 Figure 13. Comparison of forced choice latencies during cost/benefit decision-making versus reward magnitude discrimination. ................................................................................................ 83 Figure 14. Choice behavior during reward magnitude discrimination is under goal-directed control. .......................................................................................................................................... 84 xiii  Figure 15. VTA dopamine neuron firing is influenced by LHb train stimulation. ..................... 110 Figure 16. LHb train stimulation inhibits or attenuates evoked dopamine neuron firing. .......... 113 Figure 17. LHb stimulation immediately following rewarded outcomes biases risky choice. ... 116 Figure 18. Uncertainty and precise timing are necessary for outcome-contingent stimulation to bias choice. .................................................................................................................................. 120 Figure 19. LHb stimulation immediately before choices makes animals averse to advantageous probabilistic reward. ................................................................................................................... 122 Figure 20. RMTg stimulation after risky wins decreases risky choice. ...................................... 124 Figure 21. VTA stimulation following risky losses increases risky choice. ............................... 126 Figure 22. Schematic of the role of dopamine in the neural circuit regulating risk/reward decision-making. ......................................................................................................................... 134  xiv  List of Abbreviations  ACC Anterior cingulate cortex ANOVA Analysis of variance AP Anteroposterior BLA Basolateral amygdala CGT Cambridge Gambling Task dlPFC Dorsolateral prefrontal cortex DMSO Dimethyl sulfoxide DV Dorsoventral IGT Iowa Gambling Task ITI Inter-trial interval LDTg Laterodorsal tegmental nucleus LHb Lateral habenula lOFC Lateral orbitofrontal cortex ML Mediolateral mOFC Medial orbitofrontal cortex mPFC Medial prefrontal cortex NAc Nucleus accumbens nRPE Negative reward prediction error OFC Orbitofrontal cortex PFC Prefrontal cortex PPTg Pedunculopontine tegmental nucleus xv  PSTH Peri-stimulus time histogram rGT Rat Gambling Task RMTg Rostromedial tegmental nucleus RPE Reward prediction error TDE Temporal difference error tVTA Tail of the ventral tegmental area VP Ventral pallidum VTA Ventral tegmental area  xvi  Acknowledgements First, I would like to thank my supervisor, Dr. Stan Floresco.  Stan combines a vast scientific knowledge-base with a passion for research and mentoring.  With tireless energy and enthusiasm, he always makes himself available to help his students.  His guidance has been critical for developing my experimental design, writing, and presenting abilities.  Through his commitment, Stan ensures that his students leave his lab with the tools necessary for success both within and beyond academic science.  I am grateful to my committee members, Dr. Stephanie Borgland and Dr. Colleen Brenner, for their time, energy, and commitment to the quality of this thesis and my development as a scientist.  Their expertise has provided me with unique ideas and broadened my perspective of my research.  I would like to thank my funding sources, UBC and the psychology department.  I am thankful to my external examiner Dr. Paul Shepard, university examiners Dr. Kiran Soma and Dr. Lynn Raymond, and examination chair Dr. Neil Cashman, for their time and valuable critique of this thesis.  Members of the Floresco lab, both past and present, have been instrumental to my success.  These individuals have provided unlimited moral and technical support throughout my time in this lab: Meagan Auger, Courtney Bryce, Anna Cantor, Andrea Ching, Takeshi Enomoto, Dave Evans, Gemma Floresco, Sarvin Ghods-Sharifi, Emily Green, Davis Kelly, Shahin Khayambashi, Josh Larkin, Dave Montes, Lauren Ogilvie, Patrick Piantadosi, Naghmeh Shafiei, Jen St. Onge, Shelley Su, Maric Tse, and Candice Wiedman.  I am also grateful to our animal care personnel, Alice Chan, Anne Cheng, and Lucille Hoover.  I am thankful for all the support from my family and friends.  In particular, I am thankful for my parents, who have been incredibly supportive despite the struggles they and the rest of my xvii  family have endured over the past few years.  Finally, I am grateful for 27 years of guidance from my sister, who, through her struggles, taught me to live with courage, determination, and passion.  xviii  Dedication  For Kate 1  Chapter 1: Introduction Macroeconomic theory has long relied on assumptions of normative behavior by individuals to understand and predict how societies handle their resources.  While these assumptions have provided beneficial models, they have been flawed as a result of the irrationality of human decision-making.  To address the inaccuracies of such models, psychologists Daniel Kahneman and Amos Tversky crafted clever studies to gather empirical data on how humans realistically make decisions.  Their findings, forming the burgeoning study of behavioral economics, revealed astounding biases contradicting the widely-held belief, as proposed by expected utility theory, that humans are essentially rational in their decision-making.  In forming an alternative model, prospect theory, Tversky and Kahneman (1974) identified a number of ways people diverge from rational behavior when faced with choices under uncertainty.  Prospect theory states that probabilistic outcomes are not treated uniformly.  The value function differs for gains and losses; people tend to be more sensitive to losses than gains.  Whereas expected utility theory proposed a direct relationship between probability and decision weight, prospect theory demonstrates that people overweight low probabilities and underweight higher probabilities. Inevitably, the acceptance that normative choice biases deviate systematically from rationality provoked a desire to understand the biological basis of our flaws.  While such imperfections now seem counterproductive, these biases must have once been adaptive.  How the brain evolved to manage complex cost-benefit evaluations involving uncertainty remained largely unknown, but as neuroimaging techniques were pioneered, so was the field of neuroeconomics (Glimcher & Rustichini, 2004).  These tools allowed researchers to visualize the distributed networks involved in different forms of decision-making.  Different pieces of the 2  circuitry appeared to handle the different sub-tasks necessary for a decision such as valuation, action-selection, and outcome evaluation (Rangel, Camerer, & Montague, 2008).  Simultaneously, work being done with animal subjects was shedding light on the neurochemical processes encoding uncertainty and guiding decision-making (Floresco, St. Onge, Ghods-Sharifi, & Winstanley, 2008). Continued research on the neural basis of uncertainty and probabilistic decision-making offers insights not only into normative behavior, but also provides a means for understanding and treating various psychiatric conditions, including substance abuse, pathological gambling, schizophrenia, and depression.  Encoding of a reward prediction error signal by midbrain dopamine neurons has been a cornerstone of neuroeconomic research for more than a decade (Glimcher, 2011; Schultz, Dayan, & Montague, 1997).  However, less is known about how modulation of this signal by structures upstream and downstream of midbrain dopamine neurons influences the ability of dopamine to encode reward uncertainty.  This thesis aims to address these challenges and, by doing so, understand the role of dopamine in risk-based decision-making within the framework of a more comprehensive neural circuit.  1.1 Models of Risk-Reward Decision-Making 1.1.1 Human Studies In an effort to better understand how decisions are made under risk and uncertainty, laboratory-based decision tasks have been developed (Bechara, Damasio, Damasio, & Anderson, 1994; Lejuez et al., 2002; Rogers et al., 1999).  This work was pioneered by Antonio Damasio and colleagues using the Iowa Gambling Task (IGT) (Bechara et al., 1994).  During performance of the IGT, a subject is given access to four decks of cards.  Two of the decks yield larger 3  immediate reward which is dwarfed by subsequent larger losses.  As such, these two decks are deemed the disadvantageous decks.  The other two, advantageous, decks deliver smaller initial reward but are accompanied by minimal losses.  Bechara et al. (1994) determined that healthy subjects exhibit physiological responses as the “bad” decks become disadvantageous, and this is subsequently followed by a shift toward selection of the advantageous decks.  The IGT was groundbreaking in its time, as an objective laboratory measure to detect risk-based decision-making impairments in patients with orbitofrontal damage (Buelow & Suhr, 2009).  Clinicians had previously observed real-world decision-making deficits in these patients that eluded detection using laboratory-based cognitive tasks (Bechara et al., 1994).  However, risk-based decision-making is a complex construct (Buelow & Suhr, 2009; Llewellyn, 2008).  The IGT is largely dependent on learning (Buelow & Suhr, 2009).  Subjects begin with seemingly straightforward decisions when the disadvantageous decks are yielding larger rewards, and they are unaware of future losses.  Once the rational subject has learned of the losses, she shifts to the other decks and remains there for the remainder of the task.  Another feature of the IGT is the ambiguity of contingencies.  Subjects are not given any instructions in advance regarding the relative action-outcome contingencies of their options.  The features of the IGT are not necessarily characteristic of realistic gambling situations, since many gamblers are aware of their probability of winning or losing from the outset.  Individual decisions are often independent of one another, not governed by a simple rule dictating when one should switch strategies.  The Cambridge Gambling Task (CGT) was developed, in part, to address the shortcomings of the IGT (Rogers et al., 1999).  Subjects decide between two options that vary in their reward magnitude and probability.  The larger prospect is always associated with a smaller 4  probability of reward.  If the subject guesses correctly he wins the sum of money, but if the subject loses, he loses the sum of money associated with that option.  Thus, for this task, there is no clear strategy, and probabilities of reward and loss are randomized across trials.  Also, subjects are well-informed about contingencies before each decision.  1.1.2 Animal Studies Animal studies are increasingly employed to study decision-making, as they afford greater experimental control.  Some animal models aim to emulate particular human tasks.  One such task is the rat gambling task (rGT) (Zeeb, Robbins, & Winstanley, 2009).  Rats are trained to nosepoke in five-hole operant chambers to select between options differing in reward and punishment.  Sucrose pellets are used for reward and “time outs” for punishment.  The animal receives only the reward or punishment after a selection.  Options with the potential for greater reward are associated with a greater magnitude and probability of punishment.  The time limit of a session is fixed, meaning long time outs markedly decrease the total amount of reward received.  Animals learn to bias choice toward the option with the greatest long term utility, selecting it on nearly 60% of trials.  Like the IGT, the rGT demonstrates that rats temper their choices for large costly rewards, instead preferring more moderate rewards associated with only modest cost.  A difference between the rGT and IGT is that while subjects performing the IGT must learn the best strategy for the task within a single session, rats performing the rGT have many sessions of training.  During the IGT, the decks are stacked such that the advantageous decks switch during the course of a session.  Conversely, optimal performance on the rGT requires rats to maintain the same choice bias throughout a session.  Interestingly, when wins and 5  losses are randomized on the IGT, subjects learn the advantageous strategy early and maintain it (Fellows & Farah, 2005).  Our lab has taken a different approach to studying risk-based decision-making in rats, utilizing a probabilistic discounting task (St. Onge & Floresco, 2009).  During choice trials, rats select between a small-certain (1 sucrose pellet) and large-uncertain (4 or 0 pellets) reward.  The probability of obtaining the large reward shifts across a session in a reliable and predictable manner.  Animals learn to appropriately discount the large reward, selecting it less frequently as the probability of reward decreases.  This task bears obvious similarities to probabilistic discounting tasks that are used in humans (Andrade & Petry, 2012; Madden, Petry, & Johnson, 2009) but also bears similarities to human gambling tasks.  Just as with the IGT, rats performing the discounting task must shift away from the larger reward as it becomes more costly.  For the probabilistic discounting task, animals receive many training sessions.  Also, each shift in probability begins with a series of forced-choice trials.  As a result, by the time any experimental manipulations are administered, animals are making informed decisions.  Similarly, choices on the CGT are preceded by explicit information on the magnitude and probability of the two potential rewards.  This discounting task, along with other animal and human models of decision-making, can be used to elucidate the contributions of various brain regions to risk-based decision-making.  1.2 Neural Basis of Risk/Reward Decision-Making 1.2.1 Human Studies Bechara et al. (1994) used the IGT to understand the contribution of the orbitofrontal cortex (OFC) to risk-based decision-making.  Patients with damage to the OFC do not properly 6  shift their choice upon incurring losses on the disadvantageous decks, and a lack of physiological changes suggests they are not even unconsciously registering their poor decisions (Bechara et al., 1994; Bechara, 1997).  Conversely, OFC-damaged patients performing the CGT display normal decision-making and can even demonstrate risk-averse behavior (Manes et al., 2002; Rogers et al., 1999).  OFC deficits on the IGT may be due more primarily to problems with reversal learning than risky decision-making (Fellows & Farah, 2005; Fellows, 2007).  Other sectors of the PFC seem to be involved in this type of decision-making as well.  Patients with lateral frontal damage prefer risky options, and their impairment increases as a function of lesion size (Clark, Manes, Antoun, Sahakian, & Robbins, 2003).  Unlike with the OFC, impaired performance on the IGT by patients with dorsolateral PFC (dlPFC) damage is not a result of reversal learning deficits (Fellows & Farah, 2005).  Neuroimaging work has increasingly been used to complement lesion studies, exploring how various brain regions contribute to decision-making in healthy subjects.  These studies have dissociated the contributions of cortical subregions (OFC, mPFC, and ACC) and allow researchers to view changes in brain activity as humans anticipate, decide, and process outcomes during risk/reward assessments (Blair et al., 2006; Marsh, Blair, Vythilingam, Busis, & Blair, 2007; Rogers et al., 2004).  Neuroimaging also carries the advantage of being able to study rarely-damaged subcortical structures, including the striatum.  Knutson, Adams, Fong, and Hommer (2001) discovered that the nucleus accumbens (NAc) is activated by anticipation of rewards.  When asked to perform a risk-based decision-making task, NAc activation increased during deliberation preceding choice of a larger, uncertain reward (Kuhnen & Knutson, 2005; Matthews, Simmons, Lane, & Paulus, 2004).  This increased activation for large, risky rewards seems to track reward, regardless of cost.  Knutson, Rick, 7  Wimmer, Prelec, and Lowenstein (2007) observed that NAc activation tracked product preference but was not influenced by their price.  Further research also indicates that NAc responses to rewards are influenced by the presence of losses, and unexpected reward conditions are influenced by exposure to certain reward conditions (Cooper, Hollon, Wimmer, & Knutson, 2009).  This indicates that the NAc is largely influenced by context and relative value.  1.2.2 Animal Studies Animal studies have been used to explore this neural circuit more closely.  Mobini et al. (2002) sought to discern the contribution of the OFC to risk-based decision-making by training rats to choose between a small-certain and large-uncertain reward, with reward contingencies made explicit through forced-choice trials.  Lesion of the OFC made rats risk-averse, specifically during trials when low probability of the larger reward made it the disadvantageous option.  In contrast to the explicit probabilities used by Mobini et al. (2002), a study by Pais-Vieira, Lima, and Galhardo (2007) presented rats with decisions involving ambiguous probability.  In this situation, bearing more similarity to the ambiguity of the IGT (Bechara et al., 1994), OFC-lesioned animals selected the riskier option as the session progressed.  Zeeb and Winstanley (2011) sought to determine the effects of task learning on OFC contribution to risky choice.  Rats were trained on the rGT before or after lesion to the OFC.  OFC-lesioned animals were initially impaired learning the task but eventually acquired an optimal decision strategy.  Lesion of the OFC following task acquisition did not influence performance.  These findings suggest why the OFC is needed for the learning-intensive and ambiguous IGT (Bechara et al., 1994) but not the more straightforward CGT (Rogers et al., 1999). 8   To determine the contributions of various prefrontal cortical regions, St. Onge and Floresco (2010) used reversible inactivation during the probabilistic discounting task.  Inactivation of the OFC, ACC, and insular cortex were without effect, though OFC inactivation increased response latencies when the large-uncertain reward was not the more advantageous option.  Inactivation of the prelimbic medial PFC (mPFC) influenced risky choice, though the directionality of this effect was dependent on the task variant.  Risky choice increased when the probability of the large reward decreased across a session but decreased when probability increased over the session.  Thus, the role of the mPFC in probabilistic discounting is to process changing reward contingencies to inform value representations needed for optimal decision-making.  Anatomical studies suggest a division of the OFC into medial (mOFC) and lateral (lOFC) components (Hoover & Vertes, 2011; Price, 2007).  Most human lesion studies do not discriminate between damage to these subregions (Bechara et al., 1994; Bechara, 1997; Fellows & Farah, 2005; Manes et al., 2002; Rogers et al., 1999).  However, Clark et al. (2008) found that mOFC lesion increased risky choice while lOFC lesion had no effect on decision-making.  In light of this, the null effect of OFC inactivation on probabilistic choice (St. Onge & Floresco, 2010) is unsurprising, given that this study specifically targeted the lateral division.  In an analogous study, selective inactivation of the medial portion increased risky choice, regardless of the progression of odds (ascending or descending) (Stopper, Green, & Floresco, 2014).  Using the probabilistic discounting task, Ghods-Sharifi, St. Onge, and Floresco (2009) determined that the basolateral amygdala (BLA) is important for risky choice.  BLA inactivation decreased risky choice, specifically when the large reward was most uncertain.  Conversely, on the rGT, BLA lesion after task acquisition caused an increase in risky choice (Zeeb & 9  Winstanley, 2011).  The BLA is reciprocally connected with both the mPFC and OFC (Kita & Kitai, 1990; McDonald, 1987; Schoenbaum, Chiba, & Gallagher, 2000).  Since these cortical regions play disparate roles in risk-based decision-making (St. Onge & Floresco, 2010; Zeeb & Winstanley, 2011), it seems plausible that the BLA might contribute differently to performance on the rGT vs. probabilistic discounting.  Functional disconnection experiments have shown that OFC-BLA connections are essential for rGT performance (Zeeb & Winstanley, 2013) and mPFC-BLA communication is needed for probabilistic discounting (St. Onge, Stopper, Zahm, & Floresco, 2012). Probabilistic discounting has been used to understand the role of the NAc in risk-based-decision-making as well.  Cardinal and Howes (2005) used a probabilistic discounting task and lesioned the NAc following task acquisition.  After extensive retraining, the lesioned animals developed risk-averse tendencies, discounting the large-uncertain reward more than sham animals.  As a follow-up, I examined how acute manipulation of more specific subregions of the NAc would influence risk-based decision-making (Stopper & Floresco, 2011).  Animals well-trained on the probabilistic discounting task were given inactivation of the entire NAc or of the core or shell subregion only.  Consistent with the study by Cardinal and Howes (2005), inactivation the NAc decreased risky choice and increased response latencies.  The NAc shell alone was responsible for this effect on choice while only the core was responsible for the speed of responses.  A separate group of animals was trained on a reward magnitude discrimination task during which they chose between a large and small reward, both 100% certain.  NAc inactivation slightly, but significantly, decreased an overwhelming preference for the large-certain reward.  These results indicate that the NAc is necessary for biasing choice toward larger rewards, but becomes increasingly critical when cost is associated with the larger reward.  The 10  NAc and other nodes of this circuit are connected with the dopamine system, suggesting that dopamine is likely important for these processes.  1.3 Dopamine Signaling and Decision-Making Since its discovery more than 50 years ago (Carlsson, Lindqvist, Magnusson, & Waldeck, 1958), the conceptualization of the role of dopamine in reward has undergone sweeping paradigm shifts (Salamone, Correa, Mingote, & Weber, 2005).  Once thought of simply as “reward” neurons (Wise, 1978), dopamine neurons in the ventral tegmental area (VTA), are now more aptly understood to be involved in reinforcement learning and incentive salience, rather than reward, per se (Salamone & Correa, 2012).  A more accurate analysis of dopamine reveals its involvement in behavioral activation, motivation, and instrumental learning (Salamone, 1988;  1992; Salamone & Correa, 2012).  Salamone and colleagues pioneered the study of dopamine in decision-making.  They demonstrated its fundamental role in effort-based decision-making using two tasks: a modified T-maze and concurrent food choice operant procedure.  In the cost/benefit T-maze procedure, animals choose between a cost-free small reward and a large reward that can only be obtained by climbing a barrier.  Systemic dopamine blockade and NAc dopamine depletion decreased preference for the larger reward but only when the barrier was present (Cousins, Atherton, Turner, & Salamone, 1996; Salamone, Cousins, & Bucher, 1994).  During the operant concurrent food choice procedure, animals choose to either complete a modest lever-press requirement to earn sucrose pellets or eat freely available, yet less palatable, lab chow.  Systemic dopamine blockade and NAc dopamine depletion decreased operant responses for sucrose and increased consumption of the free lab chow (Cousins & Salamone, 1994; Salamone et al., 1991; 1996).  11  Thus, dopamine (particularly in the NAc) is needed to express preferences for larger rewards when physical effort is required to obtain them.  Similar findings were observed when either a D1- or D2-selective antagonist were given systemically or directly into the NAc, demonstrating that dopamine acts on both receptors to motivate animals to exert effort to obtain preferred rewards (Cousins, Wei, & Salamone, 1994; Nowend, Arizzi, Carlson, & Salamone, 2001).  Completing lever press requirements takes time, suggesting that dopamine may allow animals to overcome the delay, rather than effort, associated with effort discounting.  To address this concern, Floresco, Tse, and Ghods-Sharifi (2008) trained animals on an effort discounting task with comparable delays attached to the small-certain option.  A non-selective dopamine antagonist decreased choice of the large, costly reward, indicating that dopamine is critical for effort-based decision-making. Dopamine transmission has also been linked to risky decision-making (see Figure 22).  Chronic amphetamine abusers exhibit impaired decision-making on the CGT (Rogers et al., 1999).  The emergence of pathological gambling has been observed in patients receiving dopamine agonist therapy for Parkinson’s disease and restless legs syndrome (Gallagher, O’Sullivan, Evans, Lees, & Schrag, 2007; Quickfall & Suchowersky, 2007).  Zeeb et al. (2009) investigated the role of dopamine in risk-based decision-making using the rGT.  Systemic amphetamine administration impaired decision-making.  Low doses of a D2 antagonist improved decision-making while a D1 antagonist had no effect on choice.  Using probabilistic discounting, St. Onge and Floresco (2009) set out to thoroughly characterize the role of dopamine in risk-based decision-making.  Beginning with amphetamine, they demonstrated that a general boost in dopamine transmission increases risky choice.  This overall effect was not selective, as agonists for both the D1 and D2 receptor independently increased risky choice as well.  Both D1 and D2 12  antagonists attenuated the effects of amphetamine on risky choice, and these antagonists given in isolation decreased risky choice from baseline.  Dopamine plays a less straightforward role in regulating impulsivity.  Amphetamine, which boosts dopamine transmission, is used in low doses to treat attention deficit hyperactivity disorder.  Consistent with these clinical effects, most animal studies have found that appropriate doses of amphetamine increase choice of larger delayed rewards (Floresco, Tse, et al., 2008; van Gaalen, van Koten, Schoffelmeer, & Vanderschuren, 2006; Wade, de Wit, & Richards, 2000; Winstanley, Theobald, Dalley, & Robbins, 2005).  However, others have observed increased impulsivity that is likely due to different doses and experimental protocols used (Cardinal, Robbins, & Everitt, 2000; Evenden & Ryan, 1996).  The effects of amphetamine on delay-based decision-making are dopamine-dependent, since D2 receptor blockade attenuates the effects of amphetamine on delay discounting (van Gaalen et al., 2006).  D1 antagonism, which does not attenuate the effect of amphetamine, acts in isolation to increase impulsive choice (van Gaalen et al., 2006).  Unlike with effort discounting, dopamine depletion in the NAc did not influence delay discounting nor did it attenuate impulsivity caused by amphetamine (Winstanley et al., 2005).  Instead, dopamine depletion in the OFC decreased impulsive choice (Kheramin et al., 2004).  Bari and Robbins (2013) suggest that the role of dopamine in impulsivity may be due primarily to influences on error monitoring and performance adjustment.  These processes are largely governed by the mPFC (Floresco, 2013) – notably during probabilistic discounting (St. Onge & Floresco, 2010) – suggesting that dopamine in the mPFC may be particularly important for risk-based decision-making.  St. Onge, Abhari, and Floresco (2011) investigated the local effects of dopaminergic drugs in the mPFC, since this region is important for probabilistic choice.  As with systemic 13  administration, intra-mPFC infusion of a D1 antagonist decreased risky choice, however, D2 blockade increased risky choice.  Stimulation of D1 receptors in the mPFC did not influence choice.  The D2 agonist produced a true impairment in risky decision-making, flattening the discounting curve.  Compared to control treatment, the D2 agonist decreased choice of the large reward when its probability was 100% and increased risky choice when the large reward became unlikely.  These data indicate that while dopamine transmission in the mPFC is certainly important for risk-based decision-making, dopamine must also play a critical role in other brain regions to influence choice as well.  1.4 Different Modes of Dopamine Transmission and Reward Prediction Error Temporally- and behaviorally-relevant bursts of phasic dopamine transmission are critical for assigning necessary salience to rewards, facilitating the use of proper stimulus-outcome and response-outcome contingencies (Grace et al., 2007).  Electrophysiological recordings in non-human primates have been used to understand the role of individual midbrain dopamine neurons in encoding reward uncertainty.  Wolfram Schultz and colleagues (1997) discovered that while midbrain dopamine neurons initially encode receipt of reward, their pattern of activation changes in a manner consistent with reinforcement learning theories.  As repeated presentation of a cue becomes reliably predictive of subsequent reward, phasic bursts in dopamine neuron firing gradually shift from the reward to the cue.   The same neurons are similarly sensitive to negative outcomes.  If a cue-predicted reward is not delivered, dopamine neurons exhibit a phasic dip in firing.  This set of findings led to the idea that midbrain dopamine neurons encode a reward prediction error (RPE) signal.  Fiorillo, Tobler, and Schultz (2003) quantified uncertainty by delivering cue-predicted rewards with discrete probabilities.  The 14  dopamine response to predictive cues is greatest when the reward is most likely to be delivered (100%).  However, the sustained activity of these neurons from the time of the cue until the time of potential reward delivery is greatest when reward was maximally uncertain (50%).  The response of these neurons follows an inverted-U-shaped function, responding least when the outcome is maximally certain (0 or 100%). The ability of dopamine to convey a teaching signal is enhanced by contrasting modes of transmission (Grace, Floresco, Goto, & Lodge, 2007).  The phasic signal, which is involved in reward prediction error, is brief and spatially restricted (Grace, 1991).  Conversely, tonic dopamine transmission occurs over a long timescale and is spatially diffuse.  Perturbations in tonic and phasic dopamine activity have been proposed to contribute to various psychiatric conditions (Grace, 1991, 1993, 1995, 2000).  In fact, the concept of differing modes of dopamine transmission was initially introduced to explain dopaminergic dysfunction observed in schizophrenia (Grace, 1991). The distinction between tonic and phasic dopamine is unique to the striatum, since this is the only region where enough of the high-affinity dopamine transporter is present to rapidly remove dopamine from the extrasynaptic space (Floresco, West, Ash, Moore, & Grace, 2003).  D1 and D2 receptors have different affinities for dopamine, allowing them to be differentially activated by phasic and tonic dopamine.  The D2 receptor has a high affinity for dopamine, allowing it to be easily activated by low dopamine concentrations in the extrasynaptic space caused by tonic dopamine release.  In the NAc, presynaptic D2 receptors gate PFC afferents.  Stimulation of D2 receptors in the NAc by tonic dopamine inhibits these PFC inputs (Goto & Grace, 2005a; O’Donnell & Grace, 1994; West & Grace, 2002).  Input from the medial prefrontal cortex (mPFC) permits animals to sample options and try new strategies in order to 15  optimally adjust to new contingencies (Goto & Grace, 2005b).  With a low affinity for dopamine, the D1 receptor can be activated by high concentrations of dopamine, as may occur when tonic concentrations of dopamine are high, or in particular, following phasic bursts.  Activation of these D1 receptors in the NAc by phasic dopamine potentiates hippocampal inputs (West & Grace, 2002).  These hippocampal inputs promote perseverance on successful strategies (Goto & Grace, 2005b), aided by phasic dopamine bursts. Fast-scan cyclic voltammetry is a commonly-used technique to measure subsecond changes in phasic dopamine concentration.  Day, Jones, Wightman, and Carelli (2010) measured phasic dopamine in the NAc while animals performed a simple choice task between two rewards of equal magnitude but differing cost.  One option delivered a sucrose reward following a single lever press.  Selection of the other option resulted in the same reward magnitude, but required increased effort or delay.  On free choices, animals preferred the lower cost option.  Phasic dopamine release was greater for forced choices on the lower cost option vs. the higher cost option, indicating that cue-evoked dopamine encodes relative value.  On free-choice trials, cue-evoked dopamine release resembled that observed during forced choices of the preferred option, indicating that dopamine signals for the preferred option, regardless of the choice ultimately made.  Response execution and reward delivery did not influence phasic dopamine release.  Gan, Walton, and Phillips (2010) conducted a similar study, but they held response cost constant between choices that differed in reward magnitude.  On forced choices, cue-evoked phasic dopamine increases were greater for the larger reward. Wanat, Kuhnen, and Phillips (2010) aimed to gain a more detailed understanding of the contributions of effort and delay to phasic dopamine release in the NAc.  Animals were trained on either a fixed ratio (FR) task, for which the effort requirement remained constant, or a 16  progressive ratio (PR) task, for which the effort requirement systematically escalated across a session.  Phasic dopamine release increased across the progressive ratio session but not during the fixed ratio session.  Rather than assume that the increase in dopamine was due to escalating effort, they also tracked dopamine release as a yoked group had to wait equivalent delays for reward without performing the PR effort requirement.  These animals similarly displayed increased phasic dopamine release during the session, indicating that delays, rather than effort, modulate dopamine release to rewards. Recently, researchers have begun to examine phasic dopamine responses occurring when animals choose between rewards of differing probability and reward magnitude (Sugam, Day, Wightman, & Carelli, 2012).  Using fast-scan cyclic voltammetry in the NAc to examine phasic dopamine responses, Sugam et al. (2012) presented animals with choices between a small-certain and large-uncertain reward.  Phasic dopamine responses for the free-choice cue were identical to the cue for a forced-choice on the animal’s preferred option.  A forced-choice on the non-preferred option elicited a smaller burst.  The ultimate choice selected had no bearing on the phasic dopamine response to the cue.  Thus, as with other types of cost (Day et al., 2010), phasic dopamine tracked the availability of the preferred option during uncertainty, regardless of the eventual choice.  In addition, phasic dopamine activity also closely tracked the outcomes of choices during risk/reward decision-making.  Delivery of rewards elicited phasic dopamine bursts which were largest when the reward was large and uncertain.  Omission of reward that occurred during risky losses caused a slight phasic dip.  Thus, outcome-related phasic dopamine activity during uncertain choice followed an RPE model. This collection of findings demonstrates that phasic dopamine is influenced by various aspects of delay-, effort-, and risk-based decision-making (Day et al., 2010; Gan et al., 2010; 17  Sugam et al., 2012; Wanat et al., 2010).  This signal is important for assigning relative value to options differing in reward magnitude or cost (Day et al., 2010; Gan et al., 2010).  However, the correlative nature of voltammetric and electrophysiological studies allows us only to speculate on how phasic dopamine ultimately drives choice behavior.  Although different dopamine receptors have different influences on tonic and phasic dopamine, studies completed thus far do not give insight into the contribution of phasic dopamine to risk-based decision-making (St. Onge et al., 2011; St. Onge & Floresco, 2009).  Systemic and intra-mPFC administration of dopaminergic drugs are unhelpful in this regard since a phasic/tonic dopamine distinction is unique to the NAc (Floresco et al., 2003).  Most electrophysiological and voltammetric studies have focused on phasic bursts.  These bursts are known to be caused by excitatory input dependent on NMDA receptor stimulation on VTA neurons (Chergui et al., 1994; Floresco et al., 2003; Grace & Bunney, 1984a; Lodge & Grace, 2006a; Lokwan, Overton, Berry, & Clark, 1999).  However, prior to recent discoveries regarding the lateral habenula (LHb) (Hikosaka, Sesack, Lecourtier, & Shepard, 2008), the afferent regulatory mechanism causing phasic dips remained unknown.  These phasic dips that occur to punishment and reward omission are important, yet have been overlooked in comparison to the phasic bursts which accompany unexpected reward.  1.5 Lateral Habenula Modulation of Midbrain Dopamine Systems While interest in the modulation of dopamine neurons in the VTA by the LHb is quite recent, general interest in the afferent modulation of midbrain dopamine activity is nothing new (Grace et al., 2007).  Such studies were inspired by the discovery that dopamine neurons behave 18  in three different firing modes.  They can be inactive, display spontaneous patterns of tonic firing, or switch into a burst-firing mode (Grace & Bunney, 1983). Further research led to the discovery that different inputs to the VTA are responsible for regulating the activity state of these neurons.  Input from the ventral pallidum (VP), an output structure of the basal ganglia, provides inhibition to VTA dopamine neurons, keeping them in their silent state (Floresco et al., 2003).  On average, only ~50% of dopamine neurons are active (Bunney & Grace, 1978; Grace & Bunney, 1984a).  Lifting this inhibition allows these neurons to fire tonically, increasing the total number of active cells (Floresco et al., 2003).  Potentiation of hippocampal inputs allows the NAc to inhibit the VP, resulting in this necessary disinhibition (Lodge & Grace, 2006a).  In their tonic state, dopamine neurons fire irregular action potentials at 2-10 Hz (Grace & Bunney, 1983).  It is only once these cells have been activated tonically that they can transition to phasic bursting.  Stimulation of NMDA receptors allows the switch to burst firing (Chergui et al., 1994; Floresco et al., 2003; Grace & Bunney, 1984b), however, this glutamatergic stimulation must be extrinsic.  Glutamate released from the medial prefrontal cortex (mPFC) or pedunculopontine tegmental nucleus (PPTg) can provide the necessary stimulation of NMDA receptors in the VTA to transition tonic-firing dopamine neurons to a phasic bursting mode (Floresco et al., 2003; Lodge & Grace, 2006a; Lokwan et al., 1999).  Though VP activation can keep VTA dopamine neurons tonically inactive, until recently, a brain structure that could phasically inhibit these cells had yet to be identified.  Christoph, Leonzio, and Wilcox (1986) discovered that stimulation of the lateral habenula (LHb) causes a brief, but complete, cessation of dopamine neuron firing.  The behavioral relevance of this phenomenon was given credence when electrophysiological recordings in monkeys determined that LHb neurons are phasically excited by cues predictive of reward omission and inhibited by 19  reward-predictive cues (Matsumoto & Hikosaka, 2007).  Not only do these neurons display the opposite profile of dopamine neurons, but their responses precede those of dopamine neurons, supporting the notion that the LHb sends its inhibitory signal to the VTA.  LHb neurons are nearly all glutamatergic (Kiss, Csáki, Bokor, K. Kocsis, & B. Kocsis, 2002), leading to the belief that they synapse onto GABAergic interneurons in the VTA that inhibit nearby dopamine neurons (Hikosaka et al., 2008).  Ji and Shepard (2007) discovered that the ability of LHb stimulation to inhibit dopamine neuron firing is dependent on a GABA-mediated mechanism.  Recent neuroanatomical studies revealed that the tail of the VTA (tVTA) demonstrates properties unique from the VTA (Bourdy & Barrot, 2012; Jhou, Geisler, Marinelli, Degarmo, & Zahm, 2009).  For example, cFos in this tail region, now referred to as the rostromedial tegmental nucleus (RMTg), was increased in response to aversive stimuli (Jhou, Fields, Baxter, Saper, & Holland, 2009).  The RMTg, immediately posterodorsal to the VTA, is composed nearly entirely of GABA neurons (Perrotti et al., 2005); these neurons receive input from LHb neurons and project to VTA dopamine neurons (Jhou, Geisler, et al., 2009; Kaufling, Veinante, Pawlowski, Freund-Mercier, & Barrot, 2009).  The LHb appears to have heterogeneous effects on downstream targets of dopamine.  Lecourtier, Defrancesco, and Moghaddam (2008) used microdialysis to measure efferent dopamine concentrations following excitation or inhibition of the LHb or VTA.  Electrical stimulation of the VTA increased dopamine concentration in the PFC and NAc, but the increase in the PFC was greater, with a more rapid onset.  Conversely, electrical or chemical stimulation of the LHb decreased dopamine concentration only in the NAc, with no influence on dopamine concentration in the PFC.  Chemical downregulation of LHb activity resulted in a large and sustained increase in NAc dopamine.  This LHb downregulation only increased PFC dopamine 20  for a single sample before dropping significantly below baseline levels.  These results suggest that tonic manipulation of the LHb exerts a greater effect on tonic dopamine levels in the NAc than PFC.  Lammel et al. (2012) used a new approach, optogentics, to examine these projections as well.  By injecting retrograde or anterograde viruses into various brain regions, they could examine which afferent and efferent projections overlapped in which parts of the VTA.  The two inputs they chose were the LDTg, necessary for gating burst firing, and the LHb, with its ability to inhibit these neurons.  As with the study by Lecourtier et al. (2008), the efferents examined were the mPFC and NAc.  The lateral VTA, which projected to the NAc, received most of its afferent input from the LDTg.  Conversely, the medial VTA, which projected mainly to the mPFC, received most of its afferent input from the LHb.  Another subset of LHb cells also selectively projected to GABAergic neurons in the RMTg.  Optogenetic stimulation of only those LHb neurons that synapse onto mPFC-projecting VTA cells produced excitatory post-synaptic currents (EPSCs) in all mPFC-projecting VTA neurons.  While VTA cells receiving LHb input are less likely to project to the NAc, stimulation of these LHb inputs produced inhibitory post-synaptic currents in the NAc ~60% of the time, presumably via feed-forward inhibition.  The results of Lecourtier et al. (2008), showing that LHb stimulation only decreases dopamine concentration in the NAc and inhibition of the LHb eventually results in decreased dopamine concentration in the PFC, seem to support the findings of Lammel et al. (2012).  The fact that the results of the former study suggest a stronger effect on dopamine in the NAc may suggest that LHb stimulation modulates tonic dopamine release to influence limited connections to the NAc.  Conversely, the more prevalent mPFC projections may be reliant on higher dopamine concentrations.  Lammel et al. (2012) demonstrated that blockade of mPFC D1 21  receptors, known to be activated by high dopamine concentrations, blocked EPSCs caused by LHb stimulation.  Interest in the LHb has developed mainly out of a desire to understand the source to phasic dips in firing of dopamine neurons.  The result has been a focus on the phasic increases in LHb neurons that accompany reward omission or aversive outcomes (Matsumoto & Hikosaka, 2007, 2009).  Recent experiments have emphasized only this aspect of LHb neuron firing to propose that the LHb functions as an aversion or “anti-reward” center (Stamatakis & Stuber, 2012).  However, it is important to note that LHb neurons also display phasic dips in response to reward (Bromberg-Martin & Hikosaka, 2011; Matsumoto & Hikosaka, 2007, 2009).  When investigating how the LHb influences risky choice, one should recognize that the LHb integrates both positive and negative aspects of reward uncertainty.  In general, a comprehensive understanding of how phasic dopamine regulates risky decision-making must account for the nuanced and complex ways this signal is modulated.  1.6 Summary and Objectives This thesis aims to understand important subcortical contributions to risk-based decision-making.  Much of the research on the involvement of dopamine in risky choice has focused on midbrain dopamine neurons themselves, in the ventral tegmental area.  By examining the role of the nucleus accumbens and lateral habenula, I provide a greater context for the actions of dopamine, seeing how it is utilized downstream and guided by upstream influences to inform risk-based decision-making.  Whereas much of the foundation for these studies comes from correlative studies linking this circuit to reward uncertainty, the current methodologies – local drug microinfusion and electrical stimulation – actively manipulate this circuitry to see how 22  choice behavior is influenced.  Risk-based decision-making, a higher-order cognitive construct, is largely believed to be driven by cortical influence while the nucleus accumbens and lateral habenula are thought to be responsible for more simple learning of reward and aversion, respectively.  Here, we demonstrate their important and complex contributions to risk-based decision-making.  The following data chapters address the specific aims of this investigation: 1. Examine the receptor-selective role of dopamine in the NAc influencing risk-based decision-making.  These experiments will use locally-infused receptor-selective dopamine agonists and antagonists to determine precisely how dopamine is utilized to modulate probabilistic choice.  These results will inform the relative contributions of tonic and phasic dopamine in directing corticolimbic inputs to the NAc during risk-based decision-making. 2. Determine the contribution of the lateral habenula to probabilistic choice.  Temporary inactivation of the lateral habenula during the probabilistic discounting task and various control experiments will elucidate how the LHb influences decisions involving subjective value judgments. 3. Identify the effects of temporally-specific stimulation of the LHb-RMTg-VTA circuit on choice preference.  Brief, precisely-timed, electrical stimulation of these brain regions during different points in the probabilistic discounting task will generate an understanding of how overriding phasic dopamine bursts and dips can bias risky choice.  Moreover, these experiments will characterize the degree to which phasic dopamine and reward prediction error signaling ultimately influence behavior. 23  Chapter 2: Receptor-Specific Modulation of Risk-Based Decision-Making by Nucleus Accumbens Dopamine  2.1  Introduction Impairments in cost–benefit decision-making requiring evaluations of potential risk and rewards have been associated with an array of psychiatric disorders characterized by dysfunction of mesocorticolimbic dopamine circuitry.  As such, there has been growing interest in clarifying the relationship of the dopamine system to risk-based decision-making in both healthy individuals and clinical populations.  Decreases in dopamine and its metabolites have been observed in cerebrospinal fluid of pathological gamblers, indicative of increased dopamine transmission (Bergh, Eklund, Södersten, & Nordin, 1997).  In addition, dopaminergic drugs administered to humans alter risk-based decision-making.  Acute treatment with amphetamine enhances gambling urges in pathological gamblers (Zack & Poulos, 2004), and treatment with dopamine agonists has been reported to produce pathological gambling tendencies in patients with Parkinson’s disease and restless legs syndrome (Dang, Cunnington, & Swieca, 2011; Gallagher et al., 2007; Quickfall & Suchowersky, 2007).  Studies using animal models of decision-making have further advanced our understanding of how dopamine and its different receptor subtypes modulate risk-based decision-making.  For example, administration of amphetamine or D1 or D2 receptor agonists increased risky choice in rats performing a probabilistic discounting task. Conversely, D1 or D2 antagonists reduced risky choice and blocked the effects of amphetamine (St. Onge & Floresco, 2009).  In contrast, D3 receptor stimulation reduced preference for larger, risky rewards, whereas blockade of these receptors alone were without effect, as were manipulations of D4 receptors. 24  Recent work in our laboratory has begun to investigate the specific terminal regions through which dopamine may exert its effects on decision-making.  In a recent study, St. Onge et al. (2011) reported differential effects of D1 and D2 receptor manipulations in the medial prefrontal cortex on probabilistic choice.  D1 receptor blockade induced risk aversion by increasing negative temporal difference error (TDE) feedback sensitivity, whereas D2 blockade increased risky choice.  The nucleus accumbens (NAc) is another critical efferent of midbrain dopamine neurons that has been implicated in reward and reinforcement learning processes.  Neuroimaging data indicate that in the absence of choice, the NAc is preferentially activated by cues predicting financial gains compared with losses (Knutson, Adams, et al., 2001; Knutson, Fong, Adams, Varner, & Hommer, 2001).  On risk-taking tasks, NAc activation precedes risky choice (Kuhnen & Knutson, 2005; Matthews et al., 2004) or anticipation of larger or more preferred rewards (Knutson et al., 2007), independent of cost.  Cools, Lewis, Clark, Barker, and Robbins (2007) examined NAc activation during probabilistic learning in Parkinson’s patients on or off L-DOPA medication.  They reported that the NAc was activated during reversal learning in patients off medication, but L-DOPA treatment attenuated this effect.  Animal studies complement these findings and have further clarified the specific contribution of the NAc to these types of processes.  Lesion or inactivation of the NAc in rats disrupts decision-making, reducing reward sensitivity and preference for riskier options, particularly when these options have greater long-term utility (Cardinal & Howes, 2005; Stopper & Floresco, 2011). Although a broader understanding of the functional role of the NAc in decision-making is emerging, it is unclear how mesoaccumbens dopamine may modulate these functions, and the specific receptor mechanisms underlying these actions.  NAc D1 and D2 receptors have been shown to differentially contribute to other executive functions regulated by the prefrontal cortex. 25  NAc D1 receptors facilitate complex strategy shifting but not simple reversal learning, whereas blockade of D2 receptors increases response times without affecting performance on these tasks.  In contrast, stimulation of NAc D2 receptors induced more global deficits in behavioral flexibility (Haluk & Floresco, 2009).  Blockade of D1 or D2 receptors in the NAc has also been reported to impair attentional accuracy (Pezze, Dalley, & Robbins, 2007).  In addition, we have recently shown that dynamic fluctuation in NAc dopamine release during decision-making appears to encode integrated signals about reward rates, uncertainty, and choice, reflecting implementation of decision policies (St. Onge, Ahn, Phillips, & Floresco, 2012).  Yet, the manner in which the activity of different dopamine receptors in the NAc may modify cost–benefit assessments about potential risks and rewards remains to be addressed experimentally.  Accordingly, this study was conducted to explore how dopamine receptors in the NAc modulate risk-based decision-making, assessed with a probabilistic discounting procedure.  In doing so, we used local administration of different dopamine receptor-selective agonists and antagonists into the NAc, using compounds that have been shown to alter this aspect of decision-making when administered systemically (St. Onge & Floresco, 2009).  2.2 Methods 2.2.1 Animals Male Long Evans rats (Charles River Laboratories, Montreal, Canada) weighing 250–300 g at the beginning of training were used. On arrival, rats were given 1 week to acclimatize to the colony and food restricted to 85–90% of their free-feeding weight for 1 week before behavioral training and given ad libitum access to water for the duration of the experiment.  Feeding occurred in the rats’ home cages at the end of the experimental day and body weights 26  were monitored daily.  All testing was in accordance with the Canadian Council on Animal Care and the Animal Care Committee of the University of British Columbia.  2.2.2 Apparatus Behavioral testing was conducted in twenty operant chambers (30.5 × 24 × 21 cm; Med Associates, St Albans, VT, USA) enclosed in sound attenuating boxes.  The boxes were equipped with a fan that provided ventilation and masked extraneous noise.  Each chamber was fitted with two retractable levers, one located on each side of a central food receptacle where food reinforcement (45 mg; Bioserv, Frenchtown, NJ, USA) was delivered by a pellet dispenser.  The chambers were illuminated by a single 100-mA house light located in the top center of the wall opposite the levers.  Four infrared photobeams were mounted on the side of each chamber, and another photobeam was located in the food receptacle.  Locomotor activity was indexed by the number of photobeam breaks that occurred during a session.  All experimental data were recorded by personal computers connected to the chambers through an interface.  2.2.3 Lever Pressing Training/Side Bias Testing Before training on the full task, rats received 5–7 days of lever press training, in a manner identical to that used by St Onge and Floresco (2009) (as adapted from Cardinal et al., 2000).  Briefly, rats were initially trained to press each of the two levers on a fixed ratio (FR)-1 schedule, and then received retractable lever training (90 trials per session), requiring them to press one of the two levers within 10 s of its insertion for reinforcement delivered with a 50% probability.  This procedure familiarized them with the association of lever pressing with food reward delivery as well as the probabilistic nature of the subsequent discounting task. 27  Immediately after the last day of retractable lever training, rats that were to be trained on the discounting task were tested for their side bias, using procedures we have described elsewhere (Floresco, Tse, et al., 2008; Haluk & Floresco, 2009).  This procedure was instituted because pilot studies in our laboratory revealed that accounting for rats’ innate side bias when designating the lever to be associated with a larger reward reduced considerably the number of training sessions required to observe prominent discounting by groups of rats.  This session resembled pretraining, except that both levers were inserted into the chamber simultaneously.  On the first trial, a food pellet was delivered after responding on either lever.  Upon subsequent lever insertion, food was delivered only if the rat responded on the lever opposite to the one chosen initially.  If the rat chose the same lever as the initial choice, no food was delivered, and the house light was extinguished.  This continued until the rat chose the lever opposite to the one chosen initially.  After choosing both levers, a new trial commenced.  Thus, a single trial of the side bias procedure consisted of at least one response on each lever.  Rats received 7 such trials, and typically required 13–15 responses to complete side bias testing.  The lever (right or left) that a rat responded on first during the initial choice of a trial was recorded and counted toward its side bias.  If the total number of responses on the left and right lever were comparable, the lever that a rat chose initially four or more times over seven total trials was considered its side bias.  However, if a rat made a disproportionate number of responses on one lever over the entire session (ie, >2 : 1 ratio for the total number of presses), that lever was considered its side bias.  On the following day, rats commenced training on the decision-making task. 28  2.2.4 Decision-Making Tasks 2.2.4.1 Probabilistic Discounting The primary task used in these studies was the probabilistic discounting procedure that has also been described previously (St. Onge & Floresco, 2009), which was originally modified from that described by Cardinal and Howes (2005) (Figure 1).  Rats received daily sessions consisting of 72 trials, separated into four blocks of 18 trials.  The entire session took 48 min to complete, and the animals were trained 5–7 days per week.  A session began in darkness with both levers retracted (the intertrial state).  A trial began every 40 s with the illumination of the house light and the insertion of one or both levers into the chamber.  One lever was designated the large/risky lever, the other the small/certain lever, which remained consistent throughout training.  For each rat, the large/risky lever was set to be opposite of its side bias.  If the rat did not respond within 10 s of lever presentation, the chamber was reset to the intertrial state until the next trial (omission).  When a lever was chosen, both levers retracted.  Choice of the small/certain lever always delivered one pellet with 100% probability; choice of the large/risky lever delivered four pellets but with a particular probability.  After a response was made and food delivered, the house light remained on for another 4 s, after which the chamber reverted back to the intertrial state until the next trial.  Multiple pellets were delivered 0.5 s apart.  The four blocks consisted of eight forced-choice trials where only one lever was presented (four trials for each lever, randomized in pairs) permitting animals to learn the amount of food associated with each lever press and the respective probability of receiving reinforcement over each block.  This was followed by 10 free-choice trials, where both levers were presented and the animal had to decide whether to choose the small/certain or the large/risky lever.  The probability of obtaining four pellets after pressing the large/risky lever was varied systematically across the 29   Figure 1. Probabilistic discounting task design. Format of a single free-choice trial   30  four blocks: it was initially 100%, then 50%, 25%, and 12.5%, respectively.  Thus, when the probability of obtaining the four-pellet reward was 100% or 50%, this option would be more advantageous.  In the 25% block, both options had equal long-term utility, whereas at 12.5%, the small/certain lever would be the more advantageous option in the long term. Rats were trained on the task until as a group, they (1) chose the large/risky lever during the first trial block (100% probability) on at least 80% of successive trials, (2) chose the large/risky lever during the final trial block (12.5% probability) on fewer than 60% of successive trials, and (3) demonstrated stable baseline levels of choice.  Infusions were administered after a group of rats displayed stable patterns of choice for 3 consecutive days, assessed using a procedure similar to that described by St Onge and Floresco (2010).  In brief, data from three consecutive sessions were analyzed with a repeated-measures ANOVA with two within-subjects factors (day and trial block).  If the effect of block was significant at the p<0.05 level but there was no main effect of day or day × block interaction (at the p>0.1 level), animals were judged to have achieved stable baseline levels of choice behavior.  2.2.4.2 Reward Magnitude Discrimination As we have done in other studies (Ghods-Sharifi et al., 2009; St. Onge, Ahn, et al., 2012), we determined a priori that if any drug treatment reduced preference for the large/risky option, we would assess how the most effective dose of that compound altered reward magnitude discrimination.  This was done to confirm whether or not the reduced preference for the risky option was due to a general reduction in preference for larger rewards.  Separate groups of animals were trained and tested on an abbreviated task consisting of 48 trials divided into 4 blocks, each composed of 2 forced- and 10 free-choice trials.  As with the discounting task, 31  choices were between a large (four pellets) option and a small (one pellet) option.  However, the probability of reinforcement for both options was held constant at 100% across blocks.  2.2.5 Surgery Rats were trained on task until they displayed stable levels of choice, after which they were provided food ad libitum for 1–3 days later, and were then subjected to surgery.  Rats were anesthetized with 100 mg/kg ketamine hydrochloride and 7 mg/kg xylazine and implanted with bilateral 23 gauge stainless steel guide cannulae aimed at the NAc.  Rats received implants aimed at the central portion of the NAc along the core/shell border, to inactivate both subregions (flat skull: anteroposterior (AP) = +1.5 mm from bregma; mediolateral (ML) = ±1.4 mm; dorsoventral (DV) = −5.9 mm from dura).  Our previous work has shown that microinfusions aimed at the core/shell border produce the combined behavioral effects observed following inactivation of either subregion individually, with inactivation of the NAc shell specifically reducing risky choice (Stopper & Floresco, 2011).  Guide cannulae were held in place with stainless steel screws and dental acrylic.  Thirty gauge obdurators flush with the end of guide cannulae remained in place until the infusions were made.  Rats were given at least 7 days to recover from surgery before testing.  During this period, they were handled at least 5 min each day and were food restricted to 85% of their free-feeding weight.  2.2.6 Drugs and Microinfusion Protocol Following recovery from surgery, rats were subsequently trained on task for at least 5 days until the group displayed stable levels of choice behavior for 3 consecutive days.  Two to three days before their first microinfusion test day, obdurators were removed, and a mock 32  infusion procedure was conducted.  Stainless steel injectors were placed in the guide cannulae for 2 min, but no infusion was administered.  The day after displaying stable discounting, the group received its first microinfusion test day. A within-subjects design was used for all experiments.  Drugs or vehicle were infused at a volume of 0.5 μl per hemisphere.  This volume has been used for numerous studies that have assessed the effects of infusion of dopamine agonists or antagonists into the NAc on variety of cognitive functions and reward-related behaviors (Bari & Pierce, 2005; Besson et al., 2010; Haluk & Floresco, 2009; Nowend et al., 2001; Pattij, Janssen, Vanderschuren, Schoffelmeer, & van Gaalen, 2007; Pezze et al., 2007).  Furthermore, 0.5 μl infusions of D1, or D2 or D3 antagonists at doses similar to the ones used in this study have been reported to induce dissociable effects on behavior when infused into the shell vs. core region of the NAc, which are separated by ~1.5 mm (Bari & Pierce, 2005; Besson et al., 2010; Pattij et al., 2007).  As our infusions were targeted in the central NAc, it is likely that the effects reported here are due primarily to actions on dopamine receptors residing within the NAc. The following dopaminergic agents were selected because they have been shown to interfere with decision-making either when administered systemically (St. Onge & Floresco, 2009) or to disrupt other executive functions when infused into the NAc (ie, quinpirole; Haluk & Floresco, 2009).  The following dopamine antagonists and doses (per hemisphere) were used: D1 antagonist R-(+)-SCH 23 390 hydrochloride (0.1 and 1 μg; Sigma-Aldrich) and D2 antagonist eticlopride hydrochloride (0.1 and 1 μg; Sigma-Aldrich).  The D1 agonist used was SKF 81 297 (0.2 and 2 μg; Tocris Bioscience).  To stimulate the D2-like family of receptors, our initial studies used quinpirole (1 and 10 μg; Sigma-Aldrich), which stimulates both D2 and D3 receptors.  Functional assays have shown that quinpirole is approximately three times more selective for D3 33  vs. D2 receptors (Sautel et al., 1995).  Subsequent experiments used agonists that display more preferential affinities to a specific receptor subtype.  Bromocriptine (1 and 10 μg; Sigma-Aldrich) was used as a D2-preferring agonist, as it is 10 times as active at the D2 receptor compared with D3 and D4 receptors (Sautel et al., 1995).  PD 128 907 (1.5 and 3 μg; Tocris Bioscience) was used as a D3-preferring agonist.  In comparison with quinpirole, PD 128 907 has a substantially higher selectivity for D3 relative to D2 receptors (>50 times; Bristow et al., 1996; Sautel et al., 1995).  We did not test the effects of a D3 antagonist because previous work has shown that systemic blockade of these receptors alone does not reliably affect decision-making (St. Onge & Floresco, 2009).  All drugs were dissolved in physiological saline, sonicated until dissolved, and protected from light, with the exception of bromocriptine, which was first dissolved in dimethyl sulfoxide (DMSO) and then diluted with saline in a 50 : 50 ratio; this DMSO/saline solution was also used as the vehicle treatment for the bromocriptine experiment.  Infusions were administered bilaterally via 30 gauge injection cannulae that protruded 0.8 mm past the end of the guide cannulae, at a rate of 0.4 μl/min by a microsyringe pump.  Injection cannulae were left in place for an additional 1 min to allow for diffusion.  Each rat remained in its home cage for an additional 10-min period (or 20 min for bromocriptine infusions) before behavioral testing. On separate test days, rats trained on the discounting task received counterbalanced infusions of vehicle and two doses of one drug.  For rats that received saline infusions on their first test, efforts were made to match control performance across different drug groups.  Test days were separated by a baseline training day where no infusion was administered.  If, for any individual rat, choice of the large/risky lever deviated by >15% from its preinfusion baseline during this first baseline day, it received an additional day of training before the next infusion 34  test.  Only three infusions were administered to minimize mechanical damage that can occur with repeated infusions.  As such, doses for each compound were carefully selected from previous studies that have shown them to be effective at altering behavior when infused into the NAc.  Whenever possible, these doses were taken from studies that focused on the effects of these drugs on prefrontal cortex-mediated cognitive functions or reward-related behavior.  For example, intra-NAc infusions of SCH 23 390 (1 μg) or quinpirole (1 or 10 μg) impairs strategy shifting set shifting (Haluk and Floresco, 2009).  Infusions of 0.1 μg eticlopride disrupts social partner preference (Gingrich, Liu, Cascio, Wang, & Insel, 2000), whereas a 1 μg dose increased response latencies and trial omissions, and blocked amphetamine-induced increases in premature responses on a five-choice serial reaction time task (Pattij et al., 2007).  A 0.2 μg dose of SKF 81 297 caused slight, nonsignificant improvements in strategy shifting (Haluk & Floresco, 2009), whereas infusions of 1–3 μg promoted reinstatement of cocaine seeking (Bachtell, Whisler, Karanian, & Self, 2005; Schmidt, Anderson, & Pierce, 2006).  There have been no studies assessing the effects of intra-NAc infusions of PD 128 907 on cognition.  However, infusions of 1.5–3 μg of this drug into the NAc reduce spontaneous and dopamine-induced locomotor activity (Ouagazzal & Creese, 2000).  Similarly, infusions of 10 μg bromocriptine into the NAc has been reported to potentiate dopamine-induced locomotor activity (Jenkins & Jackson, 1986).  In addition, most of the above-mentioned studies have shown these drugs to be behaviorally active for 30–60 min, which is within the time frame of the behavioral tests used here.   2.2.7 Histology After completion of testing, rats were euthanized in a carbon dioxide chamber.  Brains were removed and fixed in a 4% formalin solution.  The brains were frozen and sliced in 50 μm 35  sections before being mounted and stained with Cresyl Violet.  Placements were verified with reference to the neuroanatomical atlas of Paxinos and Watson (2005) (see Figure 2).  All of the placements resided within the main boundaries of the NAc, clustering around the border of the core and shell subregions.  None of the placements encroached on the ventral portion of islands of Calleja; this is particularly relevant for studies with the D3 agonist, as labeling for these receptors is considerably higher in this region vs. NAc (Bouthenet et al., 1991; Levant, 1998).  2.2.8 Data Analysis The primary dependent measure of interest was the proportion of choices directed toward the large reward lever for each block of free-choice trials, factoring out trial omissions.  For each block, this was calculated by dividing the number of choices of the large reward lever by the total number of successful trials (ie, those where the rat made a choice).  Choice data were analyzed using two-way within-subjects ANOVAs, with treatment and trial block as two within-subjects factors.  The effect of trial block was always significant (p<0.001) for the probabilistic discounting task and will not be reported further.  Response latencies, locomotor activity (ie, photobeam breaks) and the number of trial omissions were analyzed with one-way repeated-measures ANOVAs.  2.2.8.1 Win-Stay/Lose-Shift Analyses Whenever we observed a significant main effect of a drug treatment on probabilistic discounting, we conducted a supplementary analysis to further clarify whether changes in choice biases were due to alterations in sensitivity to reward (win-stay performance) or negative TDE   36   Figure 2. Schematic sections of the rat brain showing location of acceptable infusions in the NAc. Numbers correspond to mm from bregma  37  feedback (lose-shift performance) (Bari, Eagle, Mar, Robinson, & Robbins, 2009; St. Onge et al., 2011; St. Onge, Ahn, et al., 2012; Stopper & Floresco, 2011).  Animals’ choices during the task were analyzed according to the outcome of each preceding trial (reward or non-reward) and expressed as a ratio.  The proportion of win-stay trials was calculated from the number of times a rat chose the large/risky lever after choosing the risky option on the preceding trial and obtaining the large reward (a win), divided by the total number of free-choice trials where the rat obtained the larger reward.  Conversely, lose-shift performance was calculated from the number of times a rat shifted choice to the small/certain lever after choosing the risky option on the preceding trial and was not rewarded (a loss), divided by the total number of free-choice trials resulting in a loss.  This analysis was conducted for all trials across the four blocks.  We could not conduct a block-by-block analysis of these data because there were many instances where rats either did not select the large/risky lever or did not obtain the large reward at all during the latter blocks.  Changes in win-stay performance were used as an index of reward sensitivity, whereas changes in lose-shift performance served as an index of negative TDE feedback sensitivity. The win-stay/lose-shift supplementary analyses were conducted to obtain more detailed information regarding the specific processes affected by dopamine receptor manipulations that may have caused an overall change in choice bias.  For example, reduced preference for the large/risky option induced by a particular dose of a drug may have been associated with either a reduced tendency to select the risky option after obtaining the large reward on the previous trial (ie, reduced win-stay behavior), or an increased tendency to select the certain option after selecting risky on the preceding trial and not receiving a reward (ie, increased lose-shift behavior).  Thus, an overall decrease in risky choice could only result in a unidirectional change in one of these measures (eg, an overall decrease in risky choice could not be caused by 38  decreased lose-shift behavior).  Furthermore, we were only interested in conducting these analyses for treatments that caused an overall change in risky choice in the primary analysis.  Thus, when these analyses were conducted, we compared win-stay and lose-shift ratios observed after vehicle treatment with those observed following treatment with a specific drug dose that caused an overall change in choice preference.  To this end, we used separate one-tailed dependent variable t-tests when only one dose caused an overall change in choice behavior, or one-way ANOVAs when both doses were effective at altering choice.  The raw values from which win-stay/lose-shift ratios were calculated are presented in Table 1.   2.3 Results 2.3.1 Blockade of NAc D1 and D2 Receptors 2.3.1.1 D1 Receptor Blockade Rats in this group were trained on the probabilistic discounting task for an average of 21 days before being implanted with guide cannulae in the NAc, retrained on the task, and receiving counterbalanced microinfusions.  A total of 13 rats with acceptable placements were included in the data analysis.  Analysis of the choice data revealed a significant main effect of treatment (F(2, 24) = 4.37, p<0.05) but no treatment × block interaction (F(6, 72) = 0.33, n.s.).  Multiple comparisons parsing out the main effect of treatment confirmed that across all blocks, the high dose of SCH 23 390 (1 μg) significantly decreased preference for the large/risky lever compared with both saline (Tukey’s test, p<0.05) and the low dose (0.1 μg; p<0.05), whereas the low dose produced no reliable change in choice behavior (Figure 3a).  D1 blockade significantly increased response latencies (F(2, 24) = 3.16, p=0.05; Table 2), and decreased locomotor counts (F(2, 24)    39  Table 1. Mean (±SEM) number of risky choices for each probability block separated by ‘wins’ and ‘losses’. ‘Stays’ are choices on the large/risky option immediately subsequent to a rewarded risky choice; ‘shifts’ are those trials for which the small/certain option was selected following an unrewarded risky choice.   Wins Stays Losses Shifts SCH 23 390     1 µg     100% 7.9 (0.5) 7.1 (0.7)   50% 2.6 (0.5) 1.8 (0.5) 2.9 (0.5) 0.9 (0.2) 25% 0.9 (0.3) 0.9 (0.3) 3.7 (0.7) 1.2 (0.2) 12.5% 0.4 (0.2) 0.2 (0.1) 2.9 (0.8) 1.2 (0.3) Saline     100% 8.3 (0.3) 7.5 (0.6)   50% 3.7 (0.6) 3.2 (0.6) 3.3 (0.5) 0.5 (0.3) 25% 1.2 (0.3) 0.8 (0.3) 4.8 (0.6) 1.3 (0.2) 12.5% 0.5 (0.1) 0.2 (0.1) 4.2 (0.7) 1.7 (0.4)      PD 128 907     1.5 µg     100% 6.8 (0.5) 5.9 (0.7)   50% 2.3 (0.4) 1.9 (0.5) 3.3 (0.8) 0.5 (0.3) 25% 1.6 (0.2) 1.2 (0.3) 3.4 (0.4) 1.3 (0.2) 12.5% 0.5 (0.2) 0.2 (0.1) 3.9 (0.7) 1.6 (0.3) Saline     100% 8.3 (0.3) 7.8 (0.5)   50% 3.9 (0.4) 3.4 (0.5) 3.4 (0.3) 0.8 (0.3) 25% 1.0 (0.2) 0.6 (0.2) 5.1 (0.5) 1.7 (0.3) 12.5% 0.4 (0.2) 0.3 (0.2) 4.7 (0.6) 2.0 (0.4)    40   Figure 3. Blockade of D1, but not D2, receptors in NAc reduces risky choice. a. Percentage choice of the large/risky option following two doses of SCH 23 390 or saline into the NAc across four blocks of free-choice trials.  Choice data are plotted as a function of probability block.  Symbols represent mean +SEM.   p<0.05 of average of choice across blocks for the 1.0µg dose condition vs. saline.  b. Win-stay/lose-shift ratios observed after treatment with the 1 µg dose of SCH 23 390 and vehicle saline treatments.  Win-stay values are displayed as the proportion of choices on the large/risky lever following a rewarded risky choice on the preceding trial.  Lose-shift values are displayed as the proportion of choices on the small/certain lever following unrewarded risky choice on the preceding trial.  SCH 23 390 selectively augmented loss sensitivity, increasing the tendency to select the small/certain option after a non-rewarded risky choice.  c. Choice data for animals receiving infusions of two doses of the D2 antagonist eticlopride or saline.  41  Table 2. Mean (±SEM) locomotor activity, response latencies, and omissions during probabilistic discounting or reward magnitude discrimination. *p<0.05 vs. saline.   Locomotor activity (beam breaks/min) Response latency (s) Omissions (no. of trials per session) Probabilistic discounting    Antagonists    SCH 23 390    1 µg 32.9 (5.2)* 1.2 (0.1)* 3.2 (1.6) 0.1 µg 37.4 (4.9) 1.0 (0.2) 2.1 (1.2) Saline  42.5 (5.0) 0.9 (0.1) 0.5 (0.3) Eticlopride    1 µg 24.6 (2.2)* 0.7 (0.1) 1.1 (0.9) 0.1 µg 30.8 (2.6) 0.8 (0.1) 0.1 (0.1) Saline  29.4 (2.9) 0.6 (0.1) 1.0 (0.5) Agonists    SKF 81 297    2 µg 36.8 (3.2) 0.6 (0.1) 0.7 (0.3) 0.2 µg 36.1 (3.3) 0.7 (0.1) 0.3 (0.2) Saline  34.5 (3.9) 0.6 (0.1) 0.5 (0.3) Quinpirole    10 µg 43.1 (5.7 ) 0.6 (0.0) 0.7 (0.4) 1 µg 36.1 (3.7) 0.6 (0.1) 0.2 (0.1) Saline  37.7 (5.2) 0.6 (0.1) 0.8 (0.4) Bromocriptine    10 µg 36.9 (3.9) 0.9 (0.6) 1.5 (0.8) 1 µg 38.0 (3.1) 0.9 (0.3) 2.8 (1.4) Vehicle  35.3 (4.1) 0.9 (0.2) 1.5 (0.9) PD 128 907    3 µg 35.2 (4.5) 0.8 (0.1) 0.8 (0.4) 1.5 µg 28.9 (2.9) 0.9 (0.1) 0.6 (0.3) Saline  35.6 (4.9) 0.8 (0.1) 0.3 (0.1) Reward Magnitude    SCH 23 390 (1µg) 27.0 (3.9) 0.9 (0.1) 2.3 (2.3) Saline  33.6 (4.1) 0.8 (0.1) 0.1 (0.1) PD 128 907 (1.5µg) 47.5 (4.7) 0.9 (0.1) 0.2 (0.2) Saline 44.9 (2.8) 0.8 (0.1) 0.2 (0.2)     42  = 8.35, p<0.005; Table 2).  The high dose of SCH 23 390 also slightly increased trial omissions, but this effect only approached statistical significance (F(2, 24) = 3.29, p=0.06; Table 2). We further analyzed the proportion of ‘win-stay’ and ‘lose-shift’ trials to determine whether the decrease in risky choice induced by the 1 μg dose of SCH 23 390 could be attributed to altered reward or negative TDE feedback sensitivity, respectively.  This analysis revealed that risk aversion induced by the 1 μg dose of SCH 23 390 was not due to decreased reward sensitivity, as win-stay tendencies were unaltered (t(12) = 0.77, n.s.; Figure 3b, left).  In contrast, analysis of lose-shift tendencies revealed that this dose increases negative TDE feedback sensitivity, (t(12) = 1.95, p<0.05, one-tailed; Figure 3b, right).  Thus, following D1 receptor blockade, rats were more likely to shift their response selection toward the safe option after an unrewarded risky choice.   2.3.1.2 D2 Receptor Blockade A total of eight rats with acceptable placements were included in the data analysis.  This group was trained for 25 days, after which they displayed stable discounting behavior.  In stark contrast to the effects of D1 receptor antagonism, blockade of D2 receptors in the NAc did not affect risky choice.  Analysis of the choice data revealed no significant main effect of treatment (F(2, 14) = 0.03, n.s.; Figure 3c) and no treatment × block interaction (F(6, 42) = 0.27, n.s.).  Response latencies tended to be longer following D2 blockade, but this effect was not statistically significant (F(2, 14) = 1.99, n.s.; Table 2).  Trial omissions did not differ across treatments (F(2, 14) = 1.16, n.s.; Table 2).  However, these doses were behaviorally active, as they did significantly decrease locomotor counts (F(2, 14) = 5.41, p<0.05; Table 2).  Collectively, these 43  data show that blockade of D1, but not D2, receptors in the NAc altered probabilistic discounting, reducing preference for larger, uncertain rewards.  2.3.2 Stimulation of NAc D1, D2, and D3 Receptors 2.3.2.1 D1 Receptor Stimulation A total of 11 rats with acceptable placements were included in the data analysis.  These rats required an average of 23 days of training before displaying stable discounting.  As displayed in Figure 4a, infusions of the 2 μg dose of SKF 81 297 induced a particularly interesting profile of choice.  Specifically, D1 receptor stimulation optimized the discounting curve, so that animals tended to choose the risky option more often when it was of greater utility, and less often when this option was of lesser long-term relative value.  Analysis of the choice data revealed no significant main effect of treatment with SKF 81 297 (F(2, 20) = 0.07, n.s.), but did show a significant treatment × block interaction (F(6, 60) = 3.43, p<0.01; Figure 4a).  Subsequent simple main effect analyses of this interaction, analyzing differences in choice behavior during each probability block revealed that the 2 μg dose increased choice of the large/risky lever compared with saline on the 50% block and decreased risky choice on the 12.5% block (p<0.05). Infusions of the 2 μg dose of SKF increased win-stay tendencies, although this difference was not statistically significant (saline = 0.86±0.06; SKF = 0.93±0.02; t(10) = 1.11, n.s.).  Similarly, lose-shift ratios were decreased by SKF, but again this was not a statistically reliable effect (saline = 0.40±0.09; SKF = 0.31±0.05; t(10) = 0.66, n.s.).  Response latencies were unaffected (F(2, 20) = 0.38, n.s.; Table 2), as were trial omissions (F(2, 20) = 0.95, n.s.; Table 2),  44   Figure 4. Stimulation of D1, but not D2, receptors in NAc modifies risky choice. All conventions are the same as Figure 3.  a. Infusions of the D1 agonist SKF 81 297 increased choice on the risky lever on blocks where the probability of obtaining the large/risky reward was high (50%) and decreased risky choice when this option was disadvantageous (12.5%).   p<0.05 for the treatment x trial block interaction.  In contrast, neither infusions of the D2/D3 agonist quinpirole (b) nor the D2-selective agonist bromocriptine (c) affected risky choice.  45  and locomotor counts (F(2, 20) = 0.45, n.s.; Table 2).  Thus, stimulation of D1 receptors in the NAc ‘improved’ decision-making, and optimized choice behavior, wherein choice biases toward the large/risky or small/certain reward were enhanced during periods when these options had greater long-term utility.  2.3.2.2 D2/D3 Stimulation (Quinpirole) Our initial studies investigating the effects of NAc D2 receptor stimulation on decision-making used the mixed D2/D3 receptor agonist quinpirole, at doses we have shown previously to markedly disrupt behavioral flexibility when infused into the NAc (Haluk & Floresco, 2009).  A total of 12 rats with acceptable placements were included in the data analysis.  These rats displayed stable discounting following an average of 28 days of training.  Somewhat surprisingly, infusions of quinpirole did not alter risky choice (main effect of treatment, F(2, 22) = 0.04, n.s.; treatment × block interaction, F(6, 66) = 0.35, n.s.; Figure 4b).  This treatment had no effect on response latencies (F(2, 22) = 0.24, n.s.; Table 2), trial omissions (F(2, 22) = 1.61, n.s.; Table 2), or locomotor counts (F(2, 22) = 1.92, n.s.; Table 2).  2.3.2.3 Preferential D2 Receptor Stimulation (Bromocriptine) As mentioned above, quinpirole has comparable affinity for both D2 and D3 receptors, and previous studies have shown that more preferential stimulation of either of these receptors can induce opposing effects on risky choice using this assay (St. Onge & Floresco, 2009).  As such, we conducted additional experiments whereby we used agonists that had higher relative affinities for either D2 or D3 receptors that have been shown to modify decision-making when administered systemically.  For D2 receptors, we used bromocriptine, at doses of 1 and 10 μg.  A 46  total of 11 rats with acceptable placements were included in the data analysis.  Rats displayed stable discounting following an average of 22 days of training.  Similar to what was observed in the quinpirole experiment, infusions of bromocriptine into the NAc did not modify choice behavior at either dose tested (main effect and interaction F-values<1.0, n.s.; Figure 4c).  Similarly, D2 stimulation had no effect on response latencies, trial omissions, or locomotor counts (all F-values<2.6, n.s.; Table 2).  Thus, stimulation of NAc D2 receptors does not seem to interfere with risk-based decision-making assessed in this manner.  2.3.2.4 Preferential D3 Stimulation (PD 128 907) To test the effect of NAc D3 receptor stimulation, we used the agonist PD 128 907, which has been reported to reduce risky choice when administered systemically (St. Onge & Floresco, 2009).  A total of 11 rats with acceptable placements were included in the data analysis.  Rats displayed stable discounting following 22 days of training.  In contrast to the effects of quinpirole and bromocriptine, preferential stimulation of D3 receptors reduced choice of the large/risky option.  Analysis of the choice data revealed a significant main effect of treatment (F(2, 20) = 3.42, p<0.05; Figure 5a) but no treatment × block interaction (F(6, 60) = 0.36, n.s.), indicating that this treatment caused a reduced preference for the large reward option that was comparable across all blocks.  Multiple comparisons further revealed that both doses of the drug reduced risky choice (p<0.05), although the lower, 1.5 μg, dose produced an effect that was numerically >3 μg dose.  D3 stimulation had a marginal effect on locomotion (F(2, 20) = 3.33, p=0.056), which was due to a significant decrease in locomotor counts following administration of the low (1.5 μg) dose compared with saline (Table 2).  PD 128 907 had no  47   Figure 5. Stimulation of D3 receptors in the NAc reduces preference for larger, uncertain rewards. All conventions are the same as Figure 3.  a. Infusion of PD 128 907 significantly decreased overall choice the large/risky lever.  b. Analysis of win-stay and lose-shift behavior demonstrates that PD 128 907 selectively decreases reward sensitivity, with this effect being more prominent at the 1.5µg dose.  This decrease in win-stay tendencies indicates that PD128 907 reduced the tendency to maintain a preference of the large/risky lever after obtaining the larger reward on preceding trials.   p<0.05 vs. saline.  48  effect on response latencies (F(2, 20) = 0.67, n.s.) or trial omissions (F(2, 20) = 0.54, n.s.; Table 2).  The reduced preference for the large/risky option induced by PD 128 907 was attributable primarily to a reduction in reward sensitivity.  Analysis of win-stay tendencies revealed a significant effect of treatment (F(2, 20) = 4.05, p<0.05).  Thus, after receiving a larger reward following selection of the risky option, rats were less likely to select that option on a subsequent trial after treatment with PD 128 907.  Multiple comparisons further showed that the effect of this D3 agonist displayed a biphasic dose-response function, wherein win-stay tendencies were significantly (p<0.05) reduced following treatment with the 1.5 μg dose, but not the 3 μg dose (Figure 5b, left).  In contrast, lose-shift tendencies were unaffected by these treatments (F(2, 20) = 0.03, n.s.; Figure 5b right).  Thus, stimulation of D3 receptors in the NAc reduced the impact that larger, uncertain rewards exert on subsequent choice.   2.3.3 Reward Magnitude Discrimination Both the D1 antagonist (SCH 23 390) and D3 agonist (PD 128 907) shifted preference away from the option associated with the larger, but uncertain, reward.  To confirm whether or not this effect was attributable to a reduced preference for larger rewards or an inability to discriminate between differing amounts of reward, separate groups of animals independent from those trained on the discounting task were trained on a reward magnitude discrimination task.  After 9 days of training on the task, rats received infusion of saline and either SCH 23 390 (1 μg, n = 7) or PD 128 907 (1.5 μg, n = 6) on separate test days.  As displayed in Figure 6, neither D1 receptor blockade (F(1, 6) = 0.13, n.s.; Figure 6a) nor D3 receptor stimulation (F(1, 5) = 0.46, n.s.; Figure 6b) affected preference for the certain four-pellet option.  As such, these data  49   Figure 6. Dopaminergic manipulations that decrease risky choice do not impair reward magnitude discrimination. Choice data on the reward magnitude discrimination task for SCH 23 390 (a) and PD 128 907 (b) compared against saline.  Animals chose between two certain rewards of differing magnitude (4 pellets vs. 1 pellet).  Data are divided into four blocks of 10 trials.  Neither drug treatment reduced preference for a larger, certain reward.  50  indicate that the effect of these treatments on risk-based decision-making cannot be attributed to a reduced preference for larger rewards.  2.4 Discussion The present data provide novel insight into the contribution of dopamine receptors in the NAc to risk-based decision-making, demonstrating that D1 but not D2, receptor activity exerts important modulatory control over choice between small, certain and larger, uncertain rewards.  Blockade of D1 receptors induced risk aversion and enhanced negative TDE feedback sensitivity, increasing the tendency to shift to the small/certain option following non-rewarded risky choices.  Conversely, stimulation of D1 receptors optimized decision-making biases, reflected by a sharpening of the discounting curve.  The D1 agonist enhanced biases for the option that provided greater long-term utility as the likelihood of delivering reward changed across a session.  On the other hand, neither antagonism nor stimulation of NAc D2 receptors altered choice behavior.  However, stimulation of D3 receptors reduced preference for large/risky rewards, decreasing the likelihood of choosing the large/risky option following a risky win.  These results show that dopamine receptor subtypes within the NAc make distinct contributions to risky choice via differential effects on reward and negative TDE feedback sensitivity. The probabilistic discounting task used here has been used by our laboratory to dissect the relative contribution of different regions of the prefrontal cortex, amygdala, and NAc to certain aspects of risk-based decision-making (Ghods-Sharifi et al., 2009; St. Onge & Floresco, 2010; Stopper & Floresco, 2011).  Thus, this study used the same assay so that we could directly compare the effects of NAc dopamine receptor manipulations with our previous findings.  In this task, rats learn over training to keep track of changes in the probability of obtaining the larger 51  reward in order to facilitate modifications in choice biases when the large/risky reward is of greater, equal, or lesser long-term utility relative to the small/certain option.  Previous work by our group has shown that rats display similar patterns of discounting on this task irrespective of whether the odds of obtaining the larger reward decrease or increase systematically over a session (St. Onge & Floresco, 2010).  Moreover, lesions of the NAc, or systemic dopamine receptor blockade, reduce preference for the large/risky option under each of these task conditions (Cardinal & Howes, 2005; St. Onge & Floresco, 2009), suggesting that the effects reported here are unlikely to be dependent on the manner in which reward probabilities change.  Interestingly, rats trained on a variant where large/risky reward probabilities change in a more randomized manner show considerably less discounting, even with extended training, presumably because they find this task more difficult compared with odds shifts that occur in a more systematic manner (St. Onge, Chiu, & Floresco, 2010).  As such, we chose to use a more standard version of the task to maximize the possibility of observing significant shifts in choice biases.  Note that despite the 20–25 days of training required by rats to display prominent and stable discounting behavior, it is unlikely that their choice patterns reflect habitual-like patterns of choice.  On each training day, rats routinely sample both levers during each block of a session, but shift their bias gradually as reward probabilities change.  In addition, choice behavior can be influenced by satiety manipulations (St. Onge & Floresco, 2009), further arguing against the idea that performance on this task reflects habitual modes of responding.  Rather, choice behavior appears to be guided primarily by changes in action/outcome contingencies that signal variations in the likelihood of obtaining the larger reward.  52  2.4.1 NAc D1 Receptors and Risk-Based Decision-Making Blockade of D1 receptors in the NAc reduced preference for larger, uncertain rewards.  These treatments also increased response latencies and decreased locomotor activity.  However, similar treatments did not alter choice behavior on a simpler reward magnitude discrimination, where rats chose between larger and smaller rewards, both delivered with 100% probability.  This latter finding indicates that alterations in decision-making induced by D1 receptor antagonism in the NAc cannot be easily attributed to disruptions in preference for larger vs. smaller rewards, or nonspecific impairments in motivational or motoric processes. A detailed analysis of choice behavior on trials following those where animals received or did not receive the large/risky reward provides important insight into the underlying processes that were disrupted by D1 receptor blockade.  Under control conditions, rats chose the risky option on ~85% of trials after obtaining the larger reward on the preceding trial.  Conversely, on trials following a risky choice and loss, animals shifted to the small/certain option on ~35% of subsequent trials.  SCH 23 390 administered into the NAc did not alter win-stay tendencies, suggesting that the decrease in risky choice was not attributable to a reduction in reward sensitivity.  Instead, these manipulations selectively enhanced lose-shift tendencies, increasing the likelihood that rats would shift choice after a risky ‘loss’, and select the small/certain option on the subsequent choice.  It is interesting to note that NAc D1 receptor antagonism altered risky choice in a manner very similar to that induced by infusions of SCH 23 390 into the prefrontal cortex (St. Onge et al., 2011).  Prefrontal D1 receptor blockade also increased risk aversion, specifically via an enhanced sensitivity to reward omissions.  These findings suggest that under conditions involving reward uncertainty, D1 receptors in the NAc and prefrontal cortex appear to share a similar function, mitigating the impact that reward omissions exert on subsequent choice 53  preferences, facilitating biases toward potentially more profitable options despite their uncertainty.  In essence, D1 receptors aid in overcoming uncertainty costs and keeping the ‘eye on the prize’, by maintaining choice biases even when a risky choice leads to reward omission. In comparison with the effects of D1 receptor blockade, stimulation of these receptors in the NAc yielded a true ‘improvement’ in decision-making.  Infusion of SKF 81 297 significantly sharpened the discounting curve; when the four-pellet option was more advantageous (50% block), animals selected it with greater frequency, whereas rats chose the risky option less during the 12.5% block when the small/certain option would have greater long-term utility.  Infusions of D1 agonists into the NAc have been reported to improve other aspects of cognition, such as attentional accuracy (Pezze et al., 2007).  With respect to the effects on probabilistic discounting, the observation that choice was shifted sometimes toward or away from the risky option suggests that NAc D1 stimulation may have augmented attention for both delivery of uncertain rewards and reward omissions during different phases of the task.  A more detailed analysis of choice behavior showed that win-stay and lose-shift tendencies across the entire session did not differ following intra-NAc administration of SKF 81 297 vs. control treatments.  The fact that choice biases shifted in opposing directions on different trial blocks following D1 receptor stimulation may be one explanation for the lack of overall effect of SKF 81 297 on these measures.  Thus, D1 receptor stimulation may have reduced negative TDE feedback sensitivity in the earlier portion of the task, when the large/risky option was more advantageous, but had the opposite effect in the last block.  The nature of the probabilistic discounting task used here did not permit an examination of these measures within individual trial blocks, as few losses occur in high probability blocks and few wins occur in low probability blocks.  However, these treatments did tend to increase win-stay tendencies while at the same time causing a slight 54  reduction in negative TDE feedback sensitivity.  As such, the improvements in decision-making induced by NAc D1 receptor stimulation may be attributable to a relatively nonspecific refinement of the impact that both rewarded and non-rewarded outcomes exert on future decisions. D1 receptor activity exerts important neuromodulatory control over excitatory inputs to the NAc.  For example, D1 receptors can facilitate firing of NAc neurons driven by inputs from the basolateral amygdala (Floresco, Blaha, Yang, & Phillips, 2001).  In this regard, functional interactions between the amygdala and the NAc are critically important in driving choice toward larger, uncertain rewards (St. Onge, Stopper, et al., 2012).  Moreover, inactivation of the basolateral amygdala reduces preference for larger, probabilistic rewards, by enhancing lose-shift tendencies (Ghods-Sharifi et al., 2009), in a manner similar to the effects of D1 receptor antagonism reported here.  Therefore, activation of NAc D1 receptors (potentially via phasic increases in dopamine; Sugam et al., 2012), may promote choice of larger, riskier rewards by enhancing task-related activity driven by the basolateral amygdala.  Short-term potentiation of amygdala inputs by NAc D1 receptors may attenuate the salience of losses by augmenting representations of recently rewarded choices, bridging the gap between rewarded and non-rewarded actions, thereby increasing the likelihood of selecting potentially more profitable options at subsequent opportunities.  2.4.2 NAc D2 Receptors and Risk-Based Decision-Making Blockade of D2 receptors with eticlopride did not alter probabilistic discounting.  This result was surprising, given that systemic treatment with this compound induced a marked decrease in preference for a large/risky option using a similar procedure (St. Onge & Floresco, 55  2009).  Furthermore, intra-mPFC infusions of this D2 antagonist actually increased risky choice (St. Onge et al., 2011).  It is unlikely that this lack of effect was due to insufficient dosing with this compound, as this drug displays a high potency at D2 receptor sites, and infusions were behaviorally active, in that they reduced overall locomotor activity.  It is possible that a higher dose of eticlopride may have altered choice behavior.  However, intra-NAc infusions of a 10 μg dose of this compound has been reported to suppress instrumental responding for food, which could potentially confound interpretation of its effects on decision-making (Bari & Pierce, 2005).  The findings that NAc D2 receptor blockade did not affect task performance, in combination with the pronounced effects of D1 receptor manipulations on decision-making is in keeping with other studies reporting that NAc D1 receptors mediate response accuracy whereas D2 receptors play a greater role in motivational aspects of performance (Floresco & Phillips, 1999; Floresco, 2007; Haluk & Floresco, 2009; Pattij et al., 2007).  The present data add to these findings, indicating that NAc D2 receptor activity does not appear to make a discernible contribution to probabilistic discounting.  However, D2 receptors in the NAc have been shown to facilitate other forms of cost–benefit decision-making, specifically those related to evaluation of effort costs (Aberman, Ward, & Salamone, 1998; Cousins et al., 1994; Nowend et al., 2001; Salamone, Arizzi, Sandoval, Cervone, & Aberman, 2002).  This finding, in combination with the reduction in locomotion reported here, suggests that NAc D2 receptors may be more important in overcoming physical effort costs to obtain larger rewards, as opposed to costs related to reward uncertainty. Infusions of D2 receptor agonists also failed to alter risky choice.  An initial experiment used quinpirole, which displays comparable affinity among the D2, D3, and D4 receptors.  Previous studies have shown that intracranial infusion of quinpirole within the dose ranges used here impairs probabilistic choice when administered into the prefrontal cortex (St. Onge et al., 56  2011), and disrupts behavioral flexibility, as evidenced by impairments in set-shifting and reversal learning when infused into the NAc (Haluk & Floresco, 2009).  A subsequent experiment used the agonist bromocriptine, which has a greater affinity for the D2 receptor over the D3 and D4 receptors, and increases risky choice on this task when administered systemically (St. Onge & Floresco, 2009).  Yet, neither of these treatments interfered with the ability to modify choice biases in response to changes in reward probability.  This lack of effect contrasts with the above-mentioned observations that stimulation of NAc D2 receptors markedly impairs flexible responding in situations requiring shifts between or within different discrimination strategies (Goto & Grace, 2005b; Haluk & Floresco, 2009).  When comparing these disparate findings, an important consideration is that classical tests of behavioral flexibility require shifting behavior between responses that either result in reward delivery or do not.  On the other hand, the probabilistic discounting task used here required animals to choose between smaller/certain and larger/probabilistic rewards.  Therefore, it appears that excessive activation of NAc D2 receptors impedes modifications in behavior most prominently when shifting between actions that are reinforced in a deterministic manner (all or none), as opposed to those associated with delivery of uncertain or probabilistic rewards. With regards to the anatomical distribution of different dopamine receptors within the ventral striatum, there is evidence to suggest that NAc outputs may be organized in a manner similar to the well-documented direct and indirect output pathways of the dorsal striatum.  Thus, D1-expressing neurons in the NAc send a direct output to the substantia nigra pars reticulata, whereas neurons inhibited by D2 receptor activity preferentially project to the ventral pallidum and subthalamic nucleus (Nicola, 2007).  The present data suggest that mesoaccumbens 57  dopamine may modulate risk-based decision-making primarily by acting on D1 receptors residing on neurons in the direct output pathway. Despite the lack of effect of D2 receptor manipulations in the NAc on risky choice, the fact remains that systemic treatments with D2 agonists or antagonists can increase or decrease preferences for larger, probabilistic rewards (St. Onge & Floresco, 2009).  Given that infusions of D2 drugs into the prefrontal cortex alter decision-making in a manner qualitatively different from those observed following systemic treatment, it is likely that D2 receptors in other brain regions are critical for modulating probabilistic decisions.  One obvious candidate is the basolateral amygdala.  Electrophysiological studies have shown that stimulation of D2 receptors in the basolateral amygdala potentiates sensory (non-limbic) cortical inputs to the basolateral amygdala, overshadowing D1-driven prefrontal cortical inputs to this nucleus (Grace & Rosenkranz, 2002).  These findings would suggest that D2 stimulation within the basolateral amygdala may suppress prefrontal cortical inputs to this nucleus, which would be expected to increase risky choice (St. Onge, Ahn, et al., 2012).  In addition, blockade of D2 receptors in the basolateral amygdala attenuates cue-induced reinstatement of cocaine seeking, suggesting that these receptors facilitate reward-directed behavior (Berglind, Case, Parker, Fuchs, & See, 2006).  As the contribution of dopamine transmission within the basolateral amygdala to decision-making has been virtually unexplored, future studies on this topic should provide additional insight to these issues. As opposed to the lack of effect of D2 receptor ligands, intra-NAc infusion of the D3-preferring agonist PD 128 907 decreased preference for the large/risky option in a manner similar to that induced by systemic treatment with this drug (St. Onge & Floresco, 2009).  The reduced preference for the larger reward induced by the lower dose of the D3 agonist was apparent during 58  the first, 100%, block when there was no risk associated with this option.  Thus, it may be argued that excessive D3 receptor stimulation may not have affected probabilistic discounting per se, but instead may have disrupted self-control or attentional processes that would cause the rats to not choose the large reward lever.  Note, however, that infusions of PD 128 907 into the NAc in a separate group of rats did not affect performance on a reward magnitude discrimination, demonstrating that these treatments do not always reduce preference for larger rewards.  Interestingly, infusions of quinpirole, which stimulates both D2 and D3 receptors with relatively little selectivity (Sautel et al., 1995) did not affect risky choice.  Thus, it appears that increased D3 activity within the NAc may dampen the impact that larger, uncertain rewards exert over subsequent choice behavior, whereas concurrent activation of D2 receptors may counter this effect.  This notion is consistent with our observation that the lower dose of PD 128 907 caused a significant decrease in win-stay performance whereas the higher dose did not, possibly due to a loss of selectivity. Systemic or local administration of D3 agonists decreases NAc dopamine efflux (Pugsley et al., 1995; Roberts, Cummins, Gnoffo, & Kew, 2006), which has led to the suggestion that these receptors may serve as autoreceptors on dopamine terminals.  Therefore, a parsimonious explanation for the D3 receptor-mediated decrease in risky choice observed in this experiment may be that their activation reduced NAc dopamine transmission, which in turn may have diminished dopaminergic tone on D1 receptors.  However, it is important to highlight that the reduced preference for larger, uncertain rewards induced by D3 receptor stimulation was qualitatively different from that induced by D1 receptor blockade.  PD 128 907 decreased win-stay tendencies, whereas SCH 23 390 increased negative TDE feedback sensitivity.  Furthermore, each of these receptors modulate electrophysiological properties of NAc neurons 59  via dissociable mechanisms.  For instance, postsynaptic D1 receptor stimulation potentiates synaptic NMDA-mediated responses (Harvey & Lacey, 1997; Nicola, Surmeier, & Malenka, 2000), whereas presynaptic D1 receptors may depress excitatory and inhibitory synaptic transmission (Nicola & Malenka, 1997; Nicola et al., 2000; Pennartz, Groenewegen, & Lopes da Silva, 1994).  In comparison, D3 receptor activity suppresses inhibitory synaptic transmission by decreasing the availability of GABA receptors in NAc (Chen, Kittler, Moss, & Yan, 2006).  In light of these findings, it is plausible that reductions in risky choice caused by excessive activation of NAc D3 receptors was the result of a combination neurophysiological alterations that may include, but are not limited to, reductions in dopamine transmission. Dopamine-agonist therapies prescribed for the treatment of Parkinson’s disease have been documented to induce a variety of impulse control disorders, including pathological gambling (Ahlskog, 2011; Lader, 2008).  Some of these drugs, such as pramipexole, have a higher affinity for D3 vs. D2 receptors, which has led to the conjecture that these drugs may impair decision-making and increase gambling behavior via actions on D3 receptors.  However, the fact that stimulation of D3 receptors systemically or locally within the NAc actually decrease risky choice would suggest that this is not the primary mechanisms through which dopaminergic therapies may promote the emergence of impulse control disorders.  Rather, as systemic treatment with D2-preferring agonists increases risky choice (St. Onge & Floresco, 2009), it would appear that these side effects of dopamine-agonist therapies may occur through actions on D2 receptors residing in cortical and/or limbic brain regions beyond the NAc.  60  2.4.3 Summary and Conclusions The findings of this study provide novel insight into the mechanisms through which dopamine transmission in the ventral striatum can refine cost–benefit evaluations requiring risk–reward judgments.  By revealing risk aversion caused by D1 receptor blockade and optimization of decision-making following D1 stimulation, these results suggest that normal levels of NAc D1 activity serve to modify decision-making biases toward or away from larger uncertain rewards to maximize the amount of reward that can be obtained in the long term.  While these effects may be mediated in part by fluctuations in tonic dopamine in the NAc, (St. Onge, Ahn, et al., 2012), they are likely primarily due to influences on a variety of important phasic signals that occur during these types of choices (Sugam et al., 2012).  On the other hand, NAc D2 receptor activity does not appear to make a discernible contribution to these functions, whereas excessive D3 receptor activity blunts the impact that larger rewards exert over decision biases.  Additional studies of the mechanisms through which NAc dopamine may regulate these functions will expand our understanding of how dopamine transmission within this nucleus relates to both normal neuroeconomic processing and aberrant decision-making associated with a variety of disorders linked to dysfunction within this system. 61  Chapter 3: Fundamental Role for the Lateral Habenula in Promoting Subjective Decision Biases  3.1 Introduction When choosing between rewards that differ in terms of their relative value, subjective impressions of which option may be “better” can be colored by certain costs (e.g., effort, delays, uncertainty) that diminish the subjective value of objectively larger rewards.  Decisions of this kind are facilitated by different nodes within the mesocorticolimbic dopamine system (Floresco, St. Onge, et al., 2008).  Dopamine neurons exhibit brief, short-latency bursts or dips to unexpected reward or reward omission, respectively (Schultz et al., 1997).  These phasic events, known as reward prediction errors, are sensitive to subjective value (Schultz, 2013).  As such, phasic dopamine responds to a variety of events as a function of relative reward magnitude, uncertainty, and delay (Fiorillo, Newsome, & Schultz, 2008; Fiorillo et al., 2003; Tobler, Fiorillo, & Schultz, 2005).  Recent studies have highlighted the LHb as a critical nucleus within this circuitry that acts as a “brake” on dopamine activity, via disynaptic pathways through the rostromedial tegmental nucleus (RMTg) (Barrot et al., 2012; Hikosaka et al., 2008; Jhou, Geisler, et al., 2009).  LHb neurons encode negative reward prediction errors opposite of dopamine neurons, exhibiting increased phasic firing in expectation of, or after, aversive events, (e.g., punishments, omission of expected rewards) and reduced activity after positive outcomes (Bromberg-Martin & Hikosaka, 2011; Matsumoto & Hikosaka, 2007, 2009).  LHb stimulation promotes conditioned avoidance and reduces reward-related responding, suggesting that this nucleus conveys an anti-reward/aversive signal (Lammel et al., 2012; Stamatakis & Stuber, 2012).  Yet, LHb neurons also encode rewards of dissimilar magnitude, displaying phasic 62  increases/decreases in firing in anticipation, or after receipt, of smaller/larger rewards (Bromberg-Martin & Hikosaka, 2011).  This differential reward encoding may aid in biasing decisions toward/away from subjectively superior/inferior rewards. Yet, how LHb signals may influence decision biases and volitional choice behavior is unknown.   Here, we challenge the emerging view that the LHb, via the RMTg, simply conveys an aversive or “anti-reward” signal.  We investigated the contribution of the LHb to different forms of cost and benefit decision-making mediated by dopamine circuitry (St. Onge & Floresco, 2009; St. Onge, Stopper, et al., 2012).  Probabilistic choice and delay discounting tasks were employed to determine the role of the LHb specifically in subjective choice.  These experiments, along with important behavioral and anatomical control experiments and analyses, demonstrate a unique role for the LHb in promoting subjective choice biases.  These findings suggest a more complex role for the LHb than once believed and provide important insights into how the regulation of phasic dopamine signals influences decision-making.  3.2 Methods 3.2.1 Experimental Subjects and Apparatus Male Long Evans rats (Charles River Laboratories, Montreal, Canada) weighing 250-300 g (60-70 days old) at the start of the experiment were single-housed and given access to food and water ad libitum.  The colony was maintained on a 12 h light/dark cycle, with lights turned on at 7:00 AM.  Rats were food restricted to no more than 85-90 % of free-feeding weight beginning 1 week before training.  Feeding occurred in the rats’ home cages at the end of the experimental day and body weights were monitored daily.  Animals were trained and tested between 9:00 AM and 5:00 PM.  Individual rats were trained and tested at a consistent time each day.  All testing 63  was in accordance with the Canadian Council on Animal Care.  Testing occurred in operant chambers (Med Associates, St Albans, VT, USA) that were fitted with two retractable levers on either side of central food receptacles where reinforcement (45 mg pellets; Bioserv, Frenchtown, NJ, USA) was delivered by a dispenser, as described previously (St. Onge and Floresco, 2010).  No statistical methods were used to predetermine sample sizes.  3.2.2 Behavioral Tasks Rats were initially trained to press retractable levers within 10 s of their insertion into the chamber over a period of 5-7 days (St. Onge & Floresco, 2009; St. Onge, Stopper, et al., 2012) , after which they were trained on one of four decision-making tasks.  3.2.2.1 Probabilistic Discounting Risk-based decision-making was assessed with a probabilistic discounting task described previously (St. Onge & Floresco, 2009; St. Onge, Stopper, et al., 2012).  Rats received daily training sessions 6-7 days/week, consisting of 72 trials, separated into 4 blocks of 18 trials.  Each 48-min session began in darkness with both levers retracted (the intertrial state).  Trials began every 40 s with houselight illumination and, 3 s later, insertion of one or both levers.  One lever was designated the large/risky lever, the other the small/certain lever, which remained consistent throughout training (counterbalanced left/right).  No response within 10 s of lever insertion reset the chamber to the intertrial state until the next trial (omission).  Any choice retracted both levers.  Choice of the small/certain lever always delivered one pellet with 100% probability; choice of the large/risky lever delivered 4 pellets but with a probability that changed across the four trial blocks.  Blocks were comprised of 8 forced-choice trials (4 trials for each lever, 64  randomized in pairs), followed by 10 free-choice trials, where both levers were presented.  The probability of obtaining 4 pellets after selecting the large/risky option varied across blocks.  Separate group of rats were trained on variants where reward probabilities systematically decreased (100%, 50%, 25%, 12.5%) or increased (12.5%, 25%, 50%, 100%) across blocks.  For each forced- and free-choice trial within a particular block, the probability of receiving the large reward was drawn from a random number generating function (Med-PC) with a set probability distribution (i.e.; 100, 50, 25 or 12.5%). Therefore, on any given session, the probabilities in each block may have varied, but on average across training days, the actual probability experienced by the rat approximated the set value within a block.  Latencies to choose were also recorded.  Rats were trained until, as a group, they (1) chose the large/risky lever during the 100% probability block on at ~90% of trials, and (2) demonstrated stable baseline levels of choice, assessed using an ANOVA analysis described previously (St. Onge & Floresco, 2009; St. Onge, Stopper, et al., 2012).  Data from three consecutive sessions were analyzed with a two-way repeated-measures ANOVA with Day and Trial Block as factors.  If there was no main effect of Day or Day X Trial Bock interaction (at p>0.1 level), performance of the group was deemed stable.  3.2.2.2 Probabilistic Choice with Fixed Reward Probabilities Training on this task was very similar to the probabilistic discounting task, except that the probability of obtaining the larger 4-pellet reward was set at 40%, and remained constant over one block of 20 free-choice trials that were preceded by 20 forced-choice trials.  Data from rats that displayed a preference for the large/risky reward were used in the analysis. 65  3.2.2.3 Delay Discounting This task shared similarities to the probabilistic discounting tasks in a number of respects, but with some key differences.  Daily sessions consisted of 48 trials, separated into 4 blocks of 12 trials (2 forced- followed by 10 free-choice trials/block; 56 min session).  Trials began every 70 s with houselight illumination and insertion of one or both levers.  One lever was designated the small/immediate lever, that, when pressed, always delivered one pellet immediately.  Selection of the other, large/delayed lever delivered four pellets after a delay that increased systematically over the four blocks: it was initially 0 s, then 15, 30 and 45 s.  No explicit cues were presented during the delay period; the houselight was extinguished, and then re-illuminated upon reward delivery.  3.2.2.4 Reward Magnitude Discrimination This task was used to confirm if the reduced preference for larger, costly rewards was due to a general reduction in preference for larger rewards or some other form of non-specific motivation or discrimination deficits.  Rats were trained and tested on a task consisting of 48 trials divided into 4 blocks, each consisting of 2 forced- and 10 free-choice trials.  As with the discounting tasks, choices were between a large 4-pellet and smaller, 1-pellet reward, both of which were delivered immediately with 100% certainty after a choice.  3.2.2.5 Devaluation Tests A separate behavioral experiment was conducted in intact animals to assess whether performance during the reward magnitude discrimination was under habitual or goal-directed control.  A separate group of rats was trained for 9 days on a reward magnitude discrimination in 66  an identical manner to those that received LHb inactivation. On day 10 of training, rats received a reinforcer devaluation test.  One hour prior to the test session, rats received ad libitum access to the sweetened reward pellets in their home cages.  If responding on this task had become habitual, the prediction would be that reinforcer devaluation by pre-feeding should not influence performance during the test.  Conversely, if choice was goal-directed, the bias toward the large reward should be diminished during this test.  Following the sucrose devaluation test, rats were retrained for two additional days on the task under standard food restriction, after which they again were selecting the large reward on nearly every free-choice trial.  On the following day, rats received a response devaluation test during which responding on the large reward lever no longer delivered reward (although selecting the other lever still yielded 1 reward pellet).  3.2.3 Surgery and Microinfusion Protocol Rats were trained on the discounting tasks until they displayed stable levels of choice (20-25 days), after which they were fed ad libitum for 1-3 days and subjected to surgery.  Those trained on the other tasks were implanted prior to training.  Rats were anaesthetized with 100 mg/kg ketamine and 7 mg/kg xylazine and implanted with bilateral 23 gauge stainless-steel cannulae aimed at the LHb (flat skull: AP = -3.8 mm (bregma); ML = +/-0.8 mm; DV = -4.5 mm (dura)).  Separate anatomical control groups were implanted with cannulae at sites, either 0.5-1.0 mm dorsal or 1 mm ventral to the LHb site.  Separate groups of rats to be trained on the fixed probabilistic choice task were implanted with bilateral cannulae in the RMTg (flat skull at 10° laterally: AP = -6.8 mm (bregma); ML = +/-0.7 mm; DV = -7.4 mm (dura)) or a unilateral cannula in the DR (flat skull with cannula at 20° laterally: AP = -7.6mm (bregma); ML = 0.0; 67  DV = -5.2 mm (dura)).  Cannulae were held in place with stainless steel screws and dental acrylic and plugged with obdurators that remained in place until the infusions were made.  Rats were given ~7 days to recover from surgery before testing, during which they were again food restricted. Training was re-initiated on the respective task for at least 5 days until the group displayed stable levels of choice behavior for 3 consecutive days.  One to two days before the first microinfusion test day, obdurators were removed, and a mock infusion procedure was conducted.  The day after displaying stable discounting, the group received its first microinfusion test day.  A within-subjects design was used for all experiments.  Reversible inactivation of the LHb was achieved by infusion of a combination of GABA agonists baclofen and muscimol using procedures described previously (St. Onge, Stopper, et al., 2012) (50 ng each in 0.2 μl, delivered over 45 s).  Injection cannulae were left in place for 1 min for diffusion.  Rats remained in their cage for an additional 10 min period before behavioral testing. On the first infusion test day, half of the rats in each group received control treatments (saline); the remaining received baclofen/muscimol.  The next day, rats received a baseline training day (no infusion).  If, for any individual rat, choice of the large/risky lever deviated by >15% from its preinfusion baseline it received an additional day of training before the next test.  On the following day, a second counterbalanced infusion was given.  3.2.4 Histology Rats were euthanized, brains were removed and fixed in 4% formalin for ≥24 hrs, frozen, sliced in 50 μm sections and stained with Cresyl Violet.  Placements were verified with reference to Paxinos and Watson (2005).  Based on previous autoradiographical, metabolic, 68  neurophysiological and behavioral measures (Allen et al., 2008; Arikan et al., 2002; Floresco, McLaughlin, & Haluk, 2008; Martin & Ghez, 1999), the effective functional spread of inactivation induced by 0.2 µl infusions of 50 ng of GABA agonists would be expected to be between 0.5 and 1 mm in radius from the center of the infusion.  Placements were deemed to be within the LHb only if the majority of the gliosis from the infusions resided within the clearly defined anatomical boundaries of this nucleus.  Alternatively, rats whose placements resided outside this region, either because of direct targeting or from missed placements, were allocated to separate dorsal (hippocampus), medial (third ventricle), or ventral (thalamic) neuroanatomical control groups, which resided beyond the estimated effective functional spread of our inactivation treatments.  Data from these groups were analyzed separately.  3.2.5 Data Analysis The primary dependent measure of interest was the proportion of choices directed towards the large reward lever (ie: large/risky or large/delayed) for each block of free-choice trials, factoring in trial omissions.  For each block, this was calculated by dividing the number of choices of the large reward lever by the total number of successful trials.  For the probabilistic discounting experiment, choice data were analyzed using three-way, between/within-subjects ANOVAs, with treatment and probability block as two within-subjects factors and task variant (ie: reward probabilities decreasing or increasing over blocks) as a between subjects factor.  Thus, in this analysis, the proportion of choices of the large/risky option across the four levels of trial block was analyzed irrespective of the order in which they were presented.  For the delay discounting and reward magnitude experiment, choice data were analyzed with a two-way repeated measures ANOVA, with treatment and trial block as factors.  Choice data from fixed 69  probability experiments were analyzed with paired-sample two-tailed t-tests.  Response latencies (the time elapsed between lever insertion and subsequent choice) and the number of trial omissions (i.e., trials where rats did not respond within 10 s) were likewise analyzed with paired-sample two-tailed t-tests.  Data distribution was assumed to be normal, but this was not formally tested.  The use of automated operant procedures eliminated the need for experimenters to be blind to treatment. Additional analyses were conducted on the latencies to make a response during forced choice trials of the different tasks to explore why LHb inactivation affected choice during the no-cost blocks of the discounting task but not the magnitude task. The rationale was that animals trained on the reward magnitude discrimination learn that the relative value of the larger reward is always higher than the smaller reward while those trained on discounting tasks consistently experienced changes in relative value of the large reward option over a session and learn that the large reward lever is not always the best option available.  To provide support for this hypothesis, we analyzed response latencies to select the large and small reward during all of the forced-choice trials for rats trained on the reward magnitude discrimination, and compared them to large and small reward forced-choice latencies displayed by rats performing the discounting tasks during the 100%/0-sec delay (i.e.; no-cost) blocks.  If well-trained animals perceived the larger reward as considerably “better” than the smaller one, they should display faster response latencies when forced to choose the larger vs. smaller reward.  On the other hand, if the relative value of the two rewards is perceived as more comparable (even in the 100% or 0 sec delay blocks), the difference in response latencies when forced to select one option or the other should be diminished. 70  3.3 Results 3.3.1 LHb Inactivation during Probabilistic Choice Under control conditions, animals discounted the large-risky reward appropriately, regardless of the progression of reward probabilities across the session (Figure 7a-c).  LHb inactivation completely abolished any discernible choice bias, inducing random patterns of responding (Figure 8).  Averaged across subjects, this yielded a choice profile reflective of rats selecting both options with equal frequency, with choice behavior not differing from chance (50%) (Inactivation x block interaction: F(3, 45) = 6.69, p=0.001; average risky choice vs. 50%: t(15) = 1.44, n.s.; Figure 7a).  The probability progression was included as a between-subjects factor in the analysis and was non-significant, showing that this effect was apparent irrespective of whether reward probabilities decreased (n = 9) or increased (n = 7) over time (F(1, 14) = 0, n.s.; Figure 7b,c).  Inactivation increased locomotion (control locomotion = 34.7+/-3.8 beam breaks/min, inactivation = 46.6+/-5.2; t(15) = 2.64, p<0.05; Table 3), suggesting that the LHb likely exerts a tonic inhibitory influence over midbrain dopamine activity.  LHb inactivation also increased hesitation to make a choice (control = 0.7 +/-0.1 sec, inactivation = 1.3+/-0.2 sec; t(15) = 2.53, p<0.05; Table 3), and the number of trials where no choices were made (control = 0.9+/-0.5, inactivation = 6.25+/-1.5; t(15) = 3.22, p<0.01; Table 3).  Moreover, this shift to indifference was apparent during periods where subjects showed a prominent bias for either the large/risky or small/certain option.  The effect of LHb inactivation was analyzed in a subset of animals (n = 5) that preferred the small-certain lever on the 12.5% block.  This group demonstrated an increase in choice of the large-risky lever toward 50% on the 12.5% block (t(4) = 2.83, p<0.05; Figure 7d) and showed an overall choice pattern of indifference (F(3, 12) = 7.36, p=0.005; Figure 7d).  We also observed an identical effect in a separate group (n=7) trained on a simpler task wherein   71     72  Figure 7. Inactivation of the LHb abolishes choice biases during probabilistic discounting. For all graphs, error bars represent S.E.M. a. Percentage choice of the large/risky option across the 4 probability blocks for all rats trained on two variants of the probabilistic discounting task.  LHb inactivation (n=16) abolished probabilistic discounting (treatment x block interaction, F(3, 45) = 6.69, p=0.001), causing rats to randomly select both options with equal frequency (t(15) vs. 50% = 1.44, n.s., dashed line).  Choice after LHb inactivation did not vary across blocks (F(3, 45) = 0.42, n.s.), resulting in a profile indicative of complete indifference.  , p<0.05 vs. control during a particular probability block.  Data from subsets of rats trained on task variants where reward probabilities decreased (n=9) or increased (n=7) over a session are presented in (b) and (c).  LHb inactivation induced a comparable disruption in decision-making in both groups (all effects of task variant, Fs < 2.1, n.s.).  d. A separate analysis was performed on data obtained from a subset of animals tested on the probabilistic discounting task that displayed a strong bias towards the small/certain option during the 12.5% block (i.e., a bias away from the large/risky option) (n=5).  LHb inactivation in this subset completely abolished any bias towards either option (treatment x block interaction, F(3, 12) = 7.36, p=0.005), in a manner similar to the effects observed in the entire group.  Moreover, LHb inactivation decreased choice of the large/risky option during 100-50% blocks, but at the same time, increased risky choice during the 12.5% block towards 50%.  , p<0.05 versus control at a specific probability block. e. Data from a separate experiment where the probability of obtaining the large, uncertain reward remained constant (40%) across a session.  Under these conditions, rats (n=7) chose the risky option on ~80% of trials following control treatments, but again, choice dropped to chance levels (50%) after LHb inactivation (F(1, 6) = 25.36, p<0.005).  73   Figure 8. Individual data from all rats tested on the probabilistic discounting task following LHb inactivation and control treatments. For clarity, the data have been separated based on treatment and the specific task variant.  Under control conditions, all rats shifted their choice bias away from or toward the large/risky option in a relatively consistent manner as the odds of obtaining the larger reward decreased or increased over the test session (white circles represent the group means +/- s.e.m.).  In contrast, LHb inactivation caused rats to respond in a haphazard manner across blocks, so that when averaged across subjects (grey squares), choice behavior of the group did not differ from chance.    74  Table 3. Locomotion, choice latencies, and trial omission data. Values represent means +/- s.e.m.  *p<0.05, **p<0.01 vs. control   Locomotion (beam breaks/min) Response Latency (s) Omissions (no. of trials per session) LHb    Probabilistic discounting    Control 34.7 (3.8) 0.7 (0.1) 0.9 (0.5) Inactivation 46.6 (5.2) * 1.3 (0.2) * 6.3 (1.5) ** Fixed probability risk    Control 48.2 (5.9) 0.5 (0.1) 0.1 (0.1) Inactivation 73.4 (12.3) * 1.2 (0.2) * 2.4 (2.1) Delay discounting    Control 22.1 (3.8) 1.0 (0.1) 1.2 (0.7) Inactivation 34.2 (5.3) 1.6 (0.6) 3.3 (1.5) Reward magnitude    Control 33.1 (6.4) 1.2 (0.2) 2.0 (2.0) Inactivation 39.3 (10.9) 1.0 (0.3) 0.4 (0.2) RMTg    Fixed probability risk    Control 37.8 (4.1) 1.7 (0.4) 2.5 (0.7) Inactivation 53.0 (8.7) 2.8 (0.8) 11.7 (6.3)     75  the odds of obtaining a larger reward remained constant at 40% throughout the session (F(1, 6) = 25.36, p<0.005; Figure 7e).  Animals in this group preferred the large-risky option during control conditions but defaulted to indifference during LHb inactivation, indicating that promotion of choice biases by the LHb is not restricted to situations where reward probabilities are volatile, unlike other nodes within dopamine decision circuitry (St. Onge & Floresco, 2010).  3.3.2 Inactivation of Regions Adjacent to LHb The small size of the LHb and its proximity to other important brain sites necessitated an investigation into the anatomical specificity of the effect of LHb inactivation on choice (Figure 9a).  In a group of animals intended for the LHb experiment, off-target placements resulted in one of the injectors infusing into the ventricle (n = 8).  In separate groups of animals, infusions were directed dorsally, in the hippocampus (n = 8), or ventrally, in the thalamus (n = 11) (Figure 9a).  Inactivation of neither the hippocampus (Inactivation: F(1, 7) = 0.79; n.s.; Inactivation x block: F(3, 21) = 0.48, n.s.; Figure 9b) nor infusion into the ventricle (Inactivation: F(1, 7) = 0.19; n.s.; Inactivation x block: F(3, 21) = 0.04, n.s.; Figure 9c) influenced decision-making compared to control conditions.  Inactivation of the thalamus significantly increased the number of trial omissions (control = 5.5+/-2.4, inactivation = 33.5+/-8.0; t(10) = 3.49, p<0.01).  This indicates that the mediodorsal thalamus is necessary for basic task engagement.  Since most of this group completed too few trials to analyze choice behavior, a small subset (n = 5) with more than half of trials completed was used for this analysis.  This analysis revealed that inactivation of the thalamus did not influence risky choice (Inactivation: F(1, 4) = 0.05, n.s.; Inactivation x block: F(3, 12) = 0.07, n.s.; Figure 9d).  Thus, the effects of LHb inactivation were anatomically  76   Figure 9. Inactivation of regions adjacent to the LHb does not affect decision-making. a. Location of infusions residing within the LHb for all experiments, and control placements in the adjacent hippocampus, ventricle or thalamus.  Numbers correspond to mm from bregma.  Rats with placements located dorsal to the LHb within the (b) hippocampus, (c) adjacent to the ventricle or (d) ventral to the LHb in the thalamus showed no differences in choice on the probabilistic discounting task following either inactivation or control treatments (main effects of treatment, hippocampus: F(1, 7) = 0.79, n.s.; thalamus: F(1, 4) = 0.05, n.s.; ventricle: F(1, 7) = 0.19, n.s.).   77  specific, and disruption of decision biases induced by inactivation treatments was attributable to suppression of neural activity circumscribed to the LHb but not adjacent regions.  3.3.3 Inactivation of LHb Efferents In addition to sending direct projections to midbrain dopamine neurons that promote aversive behaviors (Lammel et al., 2012), the LHb projects to the RMTg (which in turn inhibits dopamine neurons, as well as other targets), and also directly to the dorsal raphe (where serotonin neurons reside) (Lammel et al., 2012).  To clarify which of these projection targets may interact with the LHb to promote probabilistic choice biases, separate groups of rats were trained on the fixed probabilistic choice task.  RMTg inactivation increased locomotion (control = 37.8+/-4.1 beam breaks/min, inactivation = 53.0+/-8.7), but this effect was not significant (t(5) = 2.57, p=0.20), likely due to the small number of subjects in this experiment.  This trend suggests that the RMTg, like the LHb, likely exerts a tonic inhibitory influence over midbrain dopamine activity.   Inactivation of the RMTg induced a choice profile resembling indifference, similar to LHb inactivation (t(5) = 3.17, p<0.05; Figure 10a,b).  In contrast, dorsal raphe inactivation had no effect on choice (t(3) = 0.01, n.s.; Figure 10c,d).  Thus, modification of probabilistic choice biases by the LHb seems to be mediated via projections to the RMTg, that may inhibit midbrain dopamine neurons, but not via direct projections to the dorsal raphe.  3.3.4 Inactivation of the LHb during Delay Discounting We next investigated whether the LHb is specifically involved in cost/benefit judgments entailing reward uncertainty or if it plays a broader role in promoting biases during other decisions about rewards of different subjective value.  To this end, we used a delay-discounting  78   Figure 10. RMTg, but not dorsal raphe, inactivation alters probabilistic choice. a. RMTg inactivation (n=6) reduced preference for a large/risky reward to chance levels on a task where the probability of obtaining the large, uncertain reward remained constant (40%) across a session (t(5) = 3.17, p<0.05).  b. Acceptable placements for RMTg infusions.  c. Dorsal raphe inactivation had no effect on choice (n=4; t(3) = 0.01, n.s.).  d. Acceptable placements for DR infusions.  Numbers on histology figures correspond to mm from bregma. 79  task, requiring rats (n=6) to choose either a small reward delivered immediately or a larger, delayed reward.  Here, the subject is always guaranteed the larger reward, yet, delaying reward delivery after choice (0-45 s) diminishes its subjective value and shifts bias towards the small/immediate reward (Figure 11, circles).  In parallel to probabilistic discounting, LHb inactivation abolished delay discounting (Inactivation x block: F(3, 15) = 3.99, p<0.05; Figure 11), as choice shifted to a point of indifference (t(5) vs. 50% = 0.37, n.s.).  Therefore, the LHb  appears to play a fundamental role in promoting biases in situations requiring choice between rewards that differ in their subjective value.  3.3.5 Inactivation of the LHb during Reward Magnitude Discrimination A separate group (n = 5) was trained on a reward magnitude discrimination, choosing between 4 vs. 1 reward pellet, both delivered with 100% certainty.  LHb inactivation did not alter preference for the larger reward which, in this instance, clearly had greater objective value (F(1, 4) = 2.98, n.s.; Figure 12).  Choice latencies (t(4) = 0.35, n.s.) and trial omissions (t(4) = 0.76, n.s.) were also unaffected (Table 3). Thus, it is unlikely that the profound disruptions induced by LHb inactivation on cost/benefit decision-making can be attributed to motivational or discrimination deficits.  Instead, the LHb contributes selectively to evaluation of rewards that differ in terms of their relative costs and subjective values, but not to simpler preferences for larger vs. smaller rewards of equal cost.  3.3.6 Analysis of Relative Value Representation The fact that LHb inactivation reduced preference for the larger reward during the 100%/0 sec delay blocks of the discounting tasks but not on the reward magnitude discrimination   80   Figure 11. LHb inactivation makes animals indifferent in choices between large, delayed and small, immediate rewards. Percentage choice of the large/delayed reward across the 4 blocks of the delay discounting task, wherein rats chose between a small, immediate reward and a larger, delayed reward (n=6).  LHb inactivation abolished choice preference (F(3, 15) = 3.99, p<0.05), resulting in indifference (t(5) vs. 50% = 0.37, n.s., dashed line). , p<0.05 vs. control during a particular block.       81   Figure 12. LHb inactivation does not reduce preference for large, certain rewards. Rats trained on a reward magnitude discrimination task chose between a large and small reward, both delivered with 100% certainty (n=5).  In contrast to the other experiments, LHb inactivation did not alter preference for larger, cost-free rewards (F(1, 4) = 2.98, n.s.).    82  likely reflects differences in the relative value representation of the larger vs. smaller rewards that emerge after experience with these two types of tasks.  This notion was supported by an analysis of forced-choice response latencies on the larger and smaller reward levers (Figure 13).  Rats tested on the magnitude task showed greater response latencies when forced to select the smaller vs. larger reward after both treatments.  In contrast, for the discounting tasks, this difference during the forced-choice trials in the 100% or 0 sec blocks was significantly muted or non- existent (Task x response lever interaction for control: F(2, 24) = 6.90, p<0.01; for LHb inactivation: F(2, 24) = 5.53, p<0.05).  Thus, rats trained on the magnitude discrimination viewed the 1-pellet option as substantially inferior to the 4-pellet option, whereas for those in the discounting tasks, this discrepancy was not as apparent, similar to previous findings (St. Onge, Stopper, et al., 2012).  This may explain why preference for the larger reward during the no-cost blocks of the discounting task may have been more susceptible to disruption following LHb inactivation.  3.3.7 Reinforcer and Response Devaluation during Reward Magnitude Discrimination It could be argued that the lack of effect of LHb inactivation on reward magnitude discrimination was due to rats responding in a habitual manner on this task relative to the discounting tasks, which were clearly goal-directed.  To address this, we conducted a subsequent behavioral experiment to show that choice behavior under these conditions was sensitive to reinforcer or response devaluation.  A separate group of rats well-trained on the reward magnitude discrimination was given a reinforcer devaluation test (see Methods) which caused a marked reduction in choice for the larger reward (Treatment: F(1, 7) = 8,78; p<0.05; treatment x block: F(3, 21) = 4.72, p<0.05; Figure 14a,b).  Animals still had a decreased preference for the  83   Figure 13. Comparison of forced choice latencies during cost/benefit decision-making versus reward magnitude discrimination. We analyzed response latencies to select the large and small reward on the forced-choice trials for rats trained on the reward magnitude discrimination and compared them to large and small reward forced-choice latencies displayed by rats performing the discounting tasks during the 100%/0-sec delay blocks.  If the larger reward was perceived as considerably “better” than the smaller one, rats should display faster response latencies when forced to choose the larger reward.  Conversely, if the two options were perceived as more comparable (even during the 100% or 0 sec delay blocks), the difference in response latencies should be diminished.  Displayed are response latencies to press the large (black bars) or small reward lever (grey bars) after saline infusions (left) and after inactivation of the LHb (right) for rats trained on the reward magnitude discrimination, probabilistic discounting or delay discounting tasks (hatched bars represent the difference between latencies).  Under control conditions, rats trained on the discounting tasks showed a smaller or no difference in latencies to press the larger vs. smaller reward lever, compared to the large difference in latencies displayed by rats trained on the simpler magnitude discrimination (Task x Reward Lever interaction (F(2, 24) = 6.90, p<0.01).  Furthermore, following LHb inactivation, there were no differences in latencies to respond on the larger vs. smaller reward lever on the discounting tasks, but rats trained on the magnitude task continued to display a prominent difference on this measure (Task x Lever interaction, F(2, 24) = 5.53, p<0.05).  ,  denotes p<0.05, <0.001; n.s.- not significant.   84   Figure 14. Choice behavior during reward magnitude discrimination is under goal-directed control. A separate group of rats was trained on the reward magnitude discrimination task in a manner identical to rats that received LHb inactivation.  a. Choice data over the first 8 days of training show the emergence of a preference for the larger reward option.  On day 10 of training, rats were given a reinforcer devaluation test.  One hour prior to the test session, rats received ad libitum access to the sweetened reward pellets in their home cages.  b. During the reinforcer devaluation test, rats made fewer choices of the large reward option, relative to baseline performance on the preceding day (Main effect of treatment, F(1, 7) = 8.78, p<0.05; treatment x block interaction, F(3, 21) = 4.72, p <0.05).  c. When factoring out trial omissions, devaluation reduced the proportion of completed trials where rats selected the large reward  (t(7) = 2.51, p<0.05).  d. Trial omissions (t(7) = 2.82, p<0.05) and (e) choice latencies (t(7) = 3.10, p<0.05) were increased following reinforcer devaluation.  Following this first test, rats were retrained for two additional days on the task under standard food restriction, after which they again were selecting the large reward on nearly every free-choice trial.  On the following day, rats received a response devaluation test during which responding on the large reward lever no longer delivered reward (although selecting the other lever still yielded 1 reward pellet).  f. This response devaluation caused rats to make fewer choices of the lever formerly associated with the larger reward.  (Main effect of treatment, F(1, 7) = 16.00, p<0.01). These results indicate that choice behavior during the reward magnitude discrimination is unlikely to be under automatic, habitual control.   , p<0.05 vs. baseline.      85  larger reward on completed trials (t(7) = 2.51, p<0.05;Figure 14c) even when factoring out the increase in trial omissions following reinforcer devaluation (t(7) = 2.82, p<0.05; Figure 14d).  Reinforcer devaluation also increased response latencies (t(7) = 3.10, p<0.05; Figure 14e)  An additional response devaluation test was conducted two days later, during which responses on the large reward lever did not provide reward.  This manipulation also decreased preference for the lever formerly associated with the larger reward (F(1, 7) = 16.00, p<0.01; Figure 14f).  Thus, because choice on the reward magnitude task was altered following devaluation of either the reinforcer or the response contingency, this suggests that animals maintained a representation of the relative value of the two options and were responding in a goal-directed, as opposed to a habitual, manner.  As such, the lack of effect of LHb inactivation on this task, combined with the profound disruption in decision-making on the discounting tasks, renders it unlikely that these differential effects are attributable to differences in the contribution of the LHb to goal-directed vs. habitual behavior.  Instead, these data add further support to the notion that the LHb plays a selective role in promoting choice biases in situations when larger rewards are tainted by some form of cost, but not in expressing a more general preference for larger/smaller rewards.  3.4 Discussion These data provide the first evidence that the LHb serves as more than a simple aversive or “anti-reward” center.  Temporary inactivation of the LHb caused well-trained animals to make erratic or random decisions when confronted with choices between a large-uncertain reward and a small-certain reward.  As a group, the pattern of choice defaulted to indifference, as animals chose each option about half of the time regardless of the probability of receiving the larger reward.  This pattern of indifference was conserved even when the probability of the large 86  reward remained fixed throughout the session.  Inactivation of the RMTg made animals indifferent, suggesting that the LHb influences choice, at least in part, by activating this disynaptic pathway involving the RMTg.  The RMTg inhibits, among other targets, VTA dopamine neurons.  Inactivation of the dorsal raphe, another important projection of the LHb did not influence probabilistic choice, suggesting that serotonin may be less relevant for the influence of the LHb on probabilistic choice.  Delay discounting was also abolished by LHb inactivation, indicating that this region is integral for decision-making when evaluating various subjective costs.  Importantly, inactivating the LHb did not erode the strong bias that animals had for large-certain rewards on a reward magnitude discrimination task.  That the LHb is not necessary for performance on this simple, but still goal-directed, task supports the notion that the LHb is critical only for subjective preferences.  3.4.1 Objective vs. Subjective Reinforcement and Avoidance vs. Preference In addition to encoding reward omission (Matsumoto & Hikosaka, 2007), LHb neurons are also excited by aversive outcomes (Matsumoto & Hikosaka, 2009).  Matsumoto and Hikosaka (2011) also demonstrated that stimulation of the LHb, can actually influence behavior.  In a simple saccade task, monkeys were required to saccade both left and right to receive reward.  Only saccades to one direction were paired with LHb stimulation.  Response latencies to saccade to the stimulated side were significantly longer than saccades in the non-stimulated direction, suggesting that this LHb stimulation was aversive.  The RMTg, which receives dense excitatory afferent input from the LHb (Jhou, Geisler, et al., 2009), also encodes aversive stimuli (Jhou, Fields, et al., 2009).  Most RMTg neurons encode an identical nRPE as LHb neurons (Hong, 87  Jhou, Smith, Saleem, & Hikosaka, 2011), suggesting that it is via the LHb-RMTg pathway that organisms encode both aversion and reward omission. Using optogenetic stimulation, Stamatakis and Stuber (2012) isolated the LHb-RMTg pathway, demonstrating its importance for both passive and active avoidance behaviors.  To examine passive avoidance, mice were allowed to freely move in a two-sided chamber.  Occupation of one side of the chamber was paired with continuous 60 Hz stimulation of the LHb-RMTg pathway.  Animals not only avoided the stimulation-paired side compared to controls but were conditioned to avoid that side up to 7 days after any stimulation.  Mice also learned to actively avoid LHb-RMTg stimulation.  Animals were given the opportunity to nosepoke to interrupt continuous LHb-RMTg stimulation.  While controls rarely nosepoked, stimulated animals continued to nosepoke over the course of a session, demonstrating that animals were motivated to remove this stimulation.  Additionally, when LHb-RMTg stimulation was paired with positive reinforcement (a sucrose reward), animals were less likely to nosepoke for reward.  The current experiments differ substantially from those of Stuber and Stamatakis (2012), revealing a more complex role for LHb beyond simple aversion.  First, the experiments we have employed allow the animal to choose between two options that differ in subjective value.  Stamatakis and Stuber (2012) used tasks for which the course of action was more straightforward.  In the case of passive avoidance, the decision to confine themselves to the non-stimulated side of the chamber is a simple one, since there is nothing inherently rewarding about the other side to offset the cost of aversive LHb-RMTg stimulation.  In the case of active avoidance, the animal must incorporate the physical cost (albeit minimal) of nosepoking to remove LHb-RMTg stimulation.  Nonetheless, these are objective choices, and these 88  experiments reveal that animals are motivated to eliminate the negative consequences of LHb-RMTg stimulation.  In comparison, the experiments of this chapter involved choices between two options, both associated with reward.  They differ subjectively; while one reward is of greater magnitude, the other is of lower cost. The current experiments also differed from those of Stamatakis and Stuber (2012) in the lack of behavioral contingency attached to LHb inactivation.  Stamatakis and Stuber (2012) attached stimulation (or relief from stimulation) to specific actions performed (or avoided) by subjects.  Therefore, stimulation of the LHb-RMTg pathway in this context served as the negative reinforcement or punishment shaping instrumental action.  This is also true of another experiment used to examine the ability of LHb stimulation to bias behavior (Matsumoto & Hikosaka, 2011).  In the absence of other motivational factors, one can imagine that any reinforcer of negative valence, regardless of its potentially low salience, would noticeably influence behavior.  Conversely, animals in our experiments were trained to perform instrumental actions to obtain primary reinforcement (food) in the absence of LHb stimulation.  Rather than trying to understand how animals alter their behavior to influence their LHb activation, we sought to understand how LHb activation influences behavioral biases.  This approach allowed us to examine how LHb manipulation, independent of behavior, influences the motivational valence and salience attached to subjective rewards.  3.4.2 The LHb Modulates a Variety of Phasic Events to Influence Choice Biases The LHb is most noted for its encoding of reward omission and punishment (Matsumoto & Hikosaka, 2007, 2009).  This important signal is believed to be the source of inhibition for dopamine neurons, allowing for the phasic “dips” that accompany reward omission (Schultz et 89  al., 1997).  However, it is important to note that the LHb also encodes other phasic events as well.  Just as dopamine neurons display “dips” in response to reward omission, firing of LHb neurons is attenuated by unexpected reward (Matsumoto & Hikosaka, 2007).  In addition to excitatory input (Floresco et al., 2003; Lodge & Grace, 2006a; Lokwan et al., 1999), this signal may provide a useful disinhibition of dopamine neurons to allow for behaviorally-appropriate bursts.  A detailed examination of phasic dopamine reveals that it is sensitive to subjective reward value (Fiorillo et al., 2008; Kobayashi & Schultz, 2008; Roesch, Calu, & Schoenbaum, 2007; Schultz, 2013).  Dopamine neurons respond to reward magnitude in a relative manner (Tobler et al., 2005).  In the same neuron, rewards of a tenfold difference in magnitude can exhibit the same response, or alternatively, the same reward magnitude can produce different neuronal responses under different expectations (Tobler et al., 2005).  The LHb is responsive to relative reward value, thus contributing to this encoding of subjective value (Bromberg-Martin & Hikosaka, 2011).  When animals perform a task in which some level of reward is guaranteed, LHb neurons still differentially encode outcomes.  An unpredicted large reward still causes a phasic dip; however, an unexpected small reward causes a phasic burst.  This example has important implications for the probabilistic choice and delay discounting tasks used for the current experiments, since a large reward, small reward, and no reward are all potential outcomes.  Dopamine neurons respond to both uncertainty and expected utility (Fiorillo et al., 2003).  In response to informative cues, dopamine neurons encode expected utility; stimuli predicting a higher probability of reward elicit a greater response.  In complement to responses to conditioned stimuli, dopamine responses to the unconditioned stimulus (reward) are greatest when this 90  reward is least expected.  In addition, dopamine neurons also display a gradual response between the conditioned stimulus and unconditioned stimulus.  Rather than tracking expected utility monotonically, this response co-varies with uncertainty.  This gradually ramping-up of firing is greatest when reward is most uncertain (50%) and smallest when the outcome is maximally certain (0% or 100%).  Inactivation of the LHb would likely interfere with both of these signals, influencing their response to both expected utility and uncertainty.  Dopamine encodes expected utility via opposing phasic signals (Schultz et al., 1997).  Reward-predictive cues and receipt of unexpected reward elicit phasic bursts.  Omission of an expected reward results in phasic dips – very brief, but complete, cessation of firing.  LHb neurons, which indirectly inhibit dopamine neurons, display the opposite pattern; they exhibit phasic bursts to reward omission and phasic dips to unexpected reward receipt (Matsumoto & Hikosaka, 2007).  Inactivation of the LHb in the current study is presumably altering both of these important signals.  Not only are LHb bursts being blocked; phasic dips cannot occur while LHb neuron firing is constantly depressed.  By extension, phasic dopamine responses elicited by cues or outcomes, likely driven in part by phasic LHb signals, would be severely compromised.  When options differ in subjective value, these various phasic signals would be necessary to assign relative value.  Based on the data presented here, it appears that in the absence of these signals, choice behavior becomes random.  That LHb inactivation causes random or indifferent choice suggests that this region is influencing many, if not all, of these phasic events occurring during subjective decision-making.  While it is well-demonstrated that phasic dopamine responds to various subjective aspects of reward magnitude, uncertainty, and delay, the current experiments suggest that modulation of many, if not all, of these signals exerts a direct influence on decision-making. 91  3.4.3 Summary and Conclusions These findings reveal a previously uncharacterized role for the LHb in reward-related processing, in that it is critical for promoting choice biases during evaluation of the subjective costs and relative benefits associated with different actions.  Disruption of LHb signal outflow rendered animals unable to display any sort of preference towards larger, costly rewards or smaller, cheaper ones.  Rather, they behaved as if they had no idea which option might be better for them, defaulting to an inherently unbiased and random pattern of choice, but only when the relative value of the larger reward was tainted by some sort of cost (uncertainty or delays).  LHb stimulation induces avoidance behaviors and suppresses reward-related responding, and phasic increases in LHb neural firing encode aversive or disappointing events (Matsumoto & Hikosaka, 2007, 2009).  As such, an emerging consensus is that the LHb conveys some form of aversive or “anti-reward” signal (Lammel et al., 2012; Stamatakis & Stuber, 2012).  Our findings call for a refinement of this view.  Indeed, suppression of LHb activity did not enhance responding for larger rewards, but instead disrupted expression of a subjective preference for rewards of different value.  In this regard, it is important to note that LHb neurons encode both aversive and rewarding situations via dynamic and opposing changes in activity.  Thus, while phasic increases in firing encode aversive/non-rewarded expectations or events, or smaller rewards, these fast-firing LHb neurons also show reduced activity in response to rewarding stimuli (Bromberg-Martin & Hikosaka, 2011; Bromberg-Martin, Matsumoto, Hong, & Hikosaka, 2010; Matsumoto & Hikosaka, 2007).  Our findings indicate that suppressing these differential signals, encoding expectation or occurrence of negative/positive events, renders a decision-maker incapable of determining which option may be “better”.  As such, it is apparent that the LHb does not merely serve as an “anti-reward” center, but, more properly, this nucleus may be 92  viewed as a “preference” center, whereby integration of differential LHb reward/aversion signals sets a tone that is crucial for expression of preferences for one course of action over another.  Expression of these subjective preferences is likely achieved through subsequent integration of these dynamic signals by regions downstream of LHb, including the RMTg and midbrain dopamine neurons (Bromberg-Martin, Matsumoto, & Hikosaka, 2010; Jhou, Fields, et al., 2009).  Indeed, the LHb exerts robust control over the firing of dopamine neurons (Christoph et al., 1986; Ji & Shepard, 2007), and like the LHb, mesolimbic dopamine circuitry plays a preferential role in biasing choice towards larger, costly rewards but not larger/smaller rewards of equal cost (Salamone et al., 1994; St. Onge, Stopper, et al., 2012).  Collectively, these findings suggest that the LHb, working in collaboration with other nodes of dopamine decision circuitry, plays a fundamental role in helping an organism make up its mind when faced with ambiguous decisions regarding the cost and benefits of different actions.  Activity within this evolutionarily-conserved nucleus aids in biasing behavior from a point of indifference toward committing to choices that may yield outcomes perceived as more beneficial.  Further exploration of how the LHb facilitates these functions may provide insight to the pathophysiology underlying psychiatric disorders associated with aberrant reward processing and LHb dysfunction, such as depression (Meng et al., 2011; Sartorius & Henn, 2007). 93  Chapter 4: Multiple Phasic Dopamine Signals Exert a Direct Influence on Risk-Based Decision-Making  4.1 Introduction Dopamine activity in striatal, cortical, and limbic regions is essential for various forms of cost-benefit decision-making (Floresco, St. Onge, et al., 2008).  Subjective decisions often involve choices between options associated with differential uncertainty of reward.  Recording from neurons of monkeys receiving both predictable and unexpected rewards led to a model of how dopamine neurons encode reward uncertainty (Fiorillo et al., 2003; Schultz et al., 1997).  Data in support of this theory demonstrate that midbrain dopamine neurons can encode RPEs, in that they fire initially to reward delivery, but through conditioning, transition to fire in response to predictive cues.  Bursts of dopamine neuron firing occur when a reward is delivered unexpectedly, and phasic dips in firing occur when an expected reward is not delivered.  Most investigations into the role of dopamine in decision-making under uncertainty have focused on the VTA (Fiorillo et al., 2003; Schultz et al., 1997; Schultz, 2010) or important efferent structures receiving a strong dopaminergic signal.  Studies employing microdialysis, fast-scan cyclic voltammetry, and drug microinfusion have highlighted the importance of dopamine in the NAc (St. Onge, Ahn, et al., 2012; Stopper, Khayambashi, & Floresco, 2013; Sugam et al., 2012) and PFC (St. Onge et al., 2011; St. Onge, Ahn, et al., 2012), among other regions, in regulating risk-based decision-making.  St. Onge, Ahn et al. (2012) used microdialysis to measure changes in tonic dopamine concentration during probabilistic discounting.  In the NAc, tonic dopamine integrated information about long term-changes in reward uncertainty and availability of preferred rewards.  Sugam et al. (2012) measured changes 94  in phasic dopamine in the NAc while animals chose between a small-certain and large-uncertain reward.  Compared to forced choices on the non-preferred reward, forced choices on the preferred reward and free choices elicited greater phasic dopamine increases.  This indicates that cue-evoked phasic dopamine signals the availability of preferred reward, regardless of whether it is ultimately chosen.  Reward-evoked phasic dopamine encoded an RPE, with the largest increase accompanying uncertain large rewards.  Reward omission during risky losses caused a small dip in phasic dopamine.  Through direct manipulation of various phasic signals, the current experiments test if these signals exert a direct influence over decision-making. It is clear that fluctuation in both tonic and phasic dopamine signaling aids in guiding behavior during risk-reward judgments.  However, what remains unclear is how afferent regulation of the VTA influences how dopamine neurons encode uncertainty.  The PPTg (Floresco et al., 2003; Lodge & Grace, 2006a; Lokwan et al., 1999), LDTg (Lodge & Grace, 2006b), and mPFC (Overton, Tong, & Clark, 1996; Tong, Overton, & Clark, 1996) have all been implicated in driving excitatory activity of midbrain dopamine neurons and allowing a switch from tonic to burst-firing mode. However, an afferent inhibitory circuit, responsible for phasic dips in dopamine, has until recently remained elusive.  The discovery that single-pulse stimulation of the LHb causes a brief inhibition of VTA dopamine neurons (Christoph et al., 1986; Ji & Shepard, 2007) introduced this structure as a key node that may serve to suppress midbrain dopamine neuron firing.  Recording from LHb neurons in monkeys revealed that they exhibit a negative reward prediction error (nRPE), burst firing in response to reward omission and exhibiting phasic dips in response to unexpected reward (Bromberg-Martin & Hikosaka, 2011; Matsumoto & Hikosaka, 2007, 2009).  These neurons also encode relative reward magnitude.  When reward is associated with both options, phasic bursts 95  and dips are elicited by the smaller reward (Bromberg-Martin & Hikosaka, 2011).  As the LHb is composed mainly of glutamatergic neurons (Kiss et al., 2002), the mechanism by which it inhibits VTA dopamine neurons was not known until the recent discovery of the RMTg (Jhou, Geisler, et al., 2009).  Situated posterior to the VTA, and alternatively referred to as the tail of the VTA (tVTA), the RMTg is a small nucleus comprised almost exclusively of GABAergic neurons that, among other connections, receive input from the LHb and project to VTA dopamine neurons.  Similar to the LHb, RMTg neurons are activated by aversive stimuli and are inhibited by rewards or reward-predictive stimuli (Hong et al., 2011; Jhou, Fields, et al., 2009).  While the responses of VTA and LHb neurons to reward uncertainty have been well-characterized (Matsumoto & Hikosaka, 2007, 2009), the manner in which phasic bursts and dips of dopamine neural activity may aid in guiding action selection during risk/reward judgments remains unclear.  Pharmacological manipulations of dopamine transmission in terminal regions does not permit the specificity to ascertain the specific contribution of phasic dopamine signaling to these processes, as they disrupt both phasic and tonic dopamine signaling non-selectively.  Alternatively, more precise manipulation of phasic events in these brain regions during risk-based decision-making will provide valuable insights into how these RPE and nRPE signals ultimately drive choice behavior.  As a means to study this, animals trained on a probabilistic, or risk-based decision-making, task were implanted with stimulating electrodes in the LHb or downstream in the RMTg or VTA.  Animals with electrodes in the LHb or RMTg received stimulation following different types of rewarded outcomes or prior to choices, whereas those with VTA electrodes were stimulated following unrewarded trials.  Thus, we were able to investigate if these temporally-precise artificial disruptions could override natural phasic dopamine signals to bias choice behavior. 96   A functional contribution of the LHb in nRPE was clarified by the discovery that LHb neurons inhibit spontaneous VTA dopamine neuron firing (Christoph et al., 1986; Ji & Shepard, 2007).  This signal is the likely driving force behind phasic dips in dopamine neuron firing. However, it remains to be confirmed whether activation of the LHb can also suppress increased firing of VTA dopamine neurons driven by afferent excitatory input.  Midbrain dopamine neurons also receive excitatory input from a variety of cortical and subcortical sources that facilitate bursting (Grace et al., 2007).  Thus, in an ancillary study, we investigated whether increases in dopamine neural activity induced by stimulating excitatory inputs to the VTA could be occluded by activation of the LHb.  Specifically, we recorded from VTA dopamine neurons that were excited by stimulation of either the PPTg or mPFC.  Concurrent stimulation of the LHb in these neurons could then determine if the inhibitory influence of LHb stimulation is dwarfed by excitatory input or if it can indeed neutralize or abolish evoked activity.  4.2 Methods 4.2.1 Neurophysiological Studies 4.2.1.1 Experimental Subjects Male Long Evans rats (Charles River Laboratories, Montreal, Canada) weighing 300-500 g at the time of the experiment were used.  Animals were group housed upon arrival for a minimum of one week before the experiment.  They were provided with food and water ad libitum.  97  4.2.1.2 Surgery, Extracellular Recordings, and Cell-Searching Procedures Rats were anesthetized with chloral hydrate (400 mg/kg) and implanted with a jugular catheter for intravenous (IV) administration with chloral hydrate throughout the experiment.  The animal was then mounted in a stereotaxic frame with the incisor bar set at 3.3 mm.  Body temperature was maintained at 37°C with a temperature-controlled heating pad.  The scalp was incised and holes were drilled in the skull overlying the LHb, VTA, PPTg, and mPFC.  After drilling the burr hole, dura was resected.  Concentric bipolar stimulating electrodes (SND 1000; David Kopf instruments) were implanted in the LHb using the following stereotaxic coordinates (flat skull): AP = -3.8 mm (bregma); ML = +0.8 mm; DV = -4.7 mm (cortex) (Paxinos & Watson, 2005).  Two additional stimulating electrodes were implanted to evoke burst firing of VTA dopamine neurons, placed in the PPTg (AP = -8.0 mm (bregma); ML = +1.8 mm; DV = -6.4 mm (cortex)) and mPFC (AP = +3.2 mm (bregma); ML = +0.7 mm; DV = -3.8 mm (cortex)).  Single-barrel extracellular recording microelectrodes were constructed from 2.0-mm-outer-diameter borosilicate capillary tubing using a vertical micropipette puller.  The tips of the electrode were broken back against a glass rod to ~1 µm tip diameter and filled with 1 M NaCl containing 2% Pontamine sky blue dye.  The in vivo impedance of microelectrodes ranged from 5 to 10 MΩ.  Recording electrodes were lowered into the VTA with a hydraulic microdrive (coordinates: -5.1-5.5 mm posterior from bregma, 0.6-1.0 mm lateral from the midline, 6.5-8.5 ventral from the brain surface).  The electrode signal was amplified and filtered (50-1000 Hz) using an X-Cell3+ microelectrode amplifier (FHC).  Action potential data were acquired, discriminated from noise, stored, and analyzed using Spike2 software (Cambridge Electronic 98  Design) running on an Intel-based personal computer with a data acquisition board interface (micro 1401 mk II; Cambridge Electronic Design).  After the glass microelectrode had been lowered into the dorsal border of the VTA, a cell-searching procedure began.  Spontaneously active dopamine neurons were sampled by making up to 9 vertical passes of the electrode (200 µm apart) through the dopamine cell region with the low end of the filter set at 100 Hz.  Putative dopamine neurons were identified using established electrophysiological criteria characterized by a long-duration action potential, often with a break between the initial segment and somatodendritic spike components (Grace & Bunney, 1983; Ungless & Grace, 2012).  Once a putative dopamine neuron was isolated, the low end filter was reduced to 50 Hz to more accurately assess the waveform, and 30 s of spontaneous activity was recorded to document the cell’s firing rate.  Once the baseline activity of the dopamine neuron had been recorded, the LHb received pulse-train stimulation by way of cathodal constant current pulses (0.2 ms duration) through an Iso-Flex optical isolator via a Master-8 programmable pulse generator (A.M.P.I.) using the parameters noted below.  Pulse trains consisted of 20 or 4 pulses delivered at 100 Hz with a current of 700 µA.  We would stimulate the LHb with enough pulse trains to generate peri-stimulus time histograms (PSTHs).  From these histograms, we could determine whether the individual neuron was inhibited by LHb train stimulation.  When a dopamine neuron was inhibited by LHb train stimulation, we then assessed whether this stimulation could suppress increases in firing evoked by either the PPTg or mPFC, two sites that promote burst firing of VTA dopamine neurons.  Stimulation was delivered to one of these regions via single pulse stimulation (700 µA for PPTg, 1000 µA for mPFC).  In cases where stimulation of either PPTg or mPFC increased firing of the dopamine neuron, we then ran 99  sweeps where the LHb and the excitatory inputs were stimulated simultaneously to determine if LHb train stimulation (4 pulses, 100 Hz) was capable of preventing or occluding a burst-firing dopamine neuron.  Stimulation of the PPTg or mPFC was timed to occur 10 ms after the final LHb pulse.  4.2.1.3 Histology Iron deposits were made in stimulation sites by passing direct current (150 µA, 20 s) through the stimulating electrode.  In addition, direct current (30 µA, 45 min) was passed through the recording electrode to eject Pontamine sky blue dye into the recording site.  Brains were removed and fixed in Formalin containing 0.1% potassium ferrocyanide for at least 24 h.  After completion of behavioral testing, rats were euthanized in a carbon dioxide chamber.  Brains were removed and fixed in a 4% formalin solution.  The brains were frozen and sliced in 50 μm sections before being mounted and stained with Cresyl Violet.  Placements were verified with reference to the neuroanatomical atlas of Paxinos and Watson (2005).   4.2.1.4 Data Analysis The primary dependent measures of interest were firing rate and duration of inhibition.  Duration of inhibition was used as a measure of how LHb train stimulation suppressed dopamine neuron firing.  In addition to recording the length of inhibition, the onset of inhibition following the final pulse of the train was also recorded.  In the case that a clear monosynaptic response was observed from stimulation, the change in firing probability was used to measure the ability of LHb stimulation to attenuate firing.  If the evoked response consisted of bursts or was possibly polysynaptic, firing rates were compared to determine the efficacy of LHb stimulation.  Group 100  data were compiled and represented as average firing rate (Hz).  The data were divided into 1 ms bins.  For each bin, the total number of spikes for all cells across all sweeps for that condition was summed.  This total was divided by the total number of sweeps for all cells and represented as instantaneous frequencies in Hz (divided by 0.001 s).  The group peri-stimulus time histograms were plotted with each bar representing one of these 1 ms bins.  For each condition, the 50 ms prior to stimulation was displayed for the baseline.  The mean group baseline firing rate was calculated by averaging each of these 50 1ms-bins.  4.2.2 Behavioral Studies 4.2.2.1 Experimental Subjects Male Long Evans rats (Charles River Laboratories, Montreal, Canada) weighing 250-300 g at the beginning of training were used.  On arrival, rats were given 1 week to acclimatize to the colony and then were food restricted to 85-90 % of their free feeding weight for an additional week before behavioral training and given ad libitum access to water for the duration of the experiment.  Feeding occurred in the rats’ home cages at the end of the experimental day and body weights were monitored daily.  All testing was in accordance with the Canadian Council on Animal Care and the Animal Care Committee of the University of British Columbia.  4.2.2.2 Operant Apparatus Behavioral testing was conducted in twelve operant chambers (30.5 X 24 X 21 cm; Med Associates, St Albans, VT, USA) enclosed in sound attenuating boxes.  The boxes were equipped with a fan that provided ventilation and masked extraneous noise.  Each chamber was fitted with two retractable levers, one located on each side of a central food receptacle where 101  food reinforcement (45 mg; Bioserv, Frenchtown, NJ, USA) was delivered by a pellet dispenser.  The chambers were illuminated by a single 100-mA house light located in the top center of the wall opposite the levers.  Four infrared photobeams were mounted on the side of each chamber, and another photobeam was located in the food receptacle.  Locomotor activity was indexed by the number of photobeam breaks that occurred during a session.  All experimental data were recorded by personal computers connected to the chambers through an interface.  4.2.2.3 Lever Press Training Our initial training protocols were identical to those of St. Onge and Floresco (2009), as adapted from Cardinal and Howes (2005).  On the day before their first exposure to the operant chamber, rats were given approximately 25 food reward pellets in their home cage.  On the first day of training, 2-3 pellets were delivered into the food cup and crushed pellets were placed on a lever before the animal was placed in the chamber.  Rats were first trained under a fixed ratio 1 schedule to a criterion of 60 pellets in 30 min, first for one lever, and then repeated for the other lever (counterbalanced left/right between subjects).  They were then trained on a simplified version of the full task.  These 90-trial sessions began with the levers retracted and the operant chamber in darkness.  Every 40 s, a trial was initiated with the illumination of the house light and the insertion of one of the two levers into the chamber, and the rat was required to press it within 10 s of its insertion.  Failure to press the lever resulted in its retraction, no food delivered, and the chamber reverting to darkness.  Responding on a lever delivered a pellet with 50% probability.  This procedure was used to familiarize the rats to the probabilistic nature of the full task.  In every pair of trials, the left or right lever was presented once, and the order within the pair of 102  trials was random.  Rats were trained for 3-5 days to a criterion of 80 or more successful trials (i.e. < 10 omissions).  4.2.2.4 Probabilistic Discounting Risk-based decision-making was assessed with a probabilistic discounting task modified from a task described previously (Floresco & Whelan, 2009; Ghods-Sharifi et al., 2009).  Instead of the four different probability blocks used in the original task (100, 50, 25, and 12.5%), the current experiments used only two probabilities – 50 and 12.5%.  The 100% block was eliminated to avoid repeated and reliable LHb stimulation early in the task that could associate the larger reward with aversion.  Rats received daily sessions consisting of 60 trials, separated into two blocks of 30 trials.  The free-choice portion of each probability block consisted of 20 free-choice trials – double the number of free-choice trials per block on the original four-probability task.  This increased the likelihood that animals would select each option more per block, since in the experiments manipulating outcome-related signals, stimulation only occurred after a subset of choices and/or outcomes.  The entire session took 40 min to complete, and the animals were trained 5-7 days per week.  Since repeated stimulation can be detrimental, this modified version shortens the task without sacrificing free-choice trials.  A session began in darkness with both levers retracted (the intertrial state).  A trial began every 40 s with the illumination of the house light and the insertion of one or both levers into the chamber.  One lever was designated the large-uncertain lever, the other the small-certain lever, which remained consistent throughout training (counterbalanced left/right).  If the rat did not respond within 10 s of lever presentation, the chamber was reset to the intertrial state until the next trial (omission).  When a lever was chosen, both levers retracted.  Choice of the small/certain lever always 103  delivered one pellet with 100% probability; choice of the large/risky lever delivered four pellets but with a particular probability.  After a response was made and food delivered, the house light remained on for another 4 s, after which the chamber reverted back to the intertrial state until the next trial.  Multiple pellets were delivered 0.5 s apart.  Each of the two blocks consisted of ten forced-choice trials where only one lever was presented (five trials for each lever, randomized in pairs), permitting animals to learn the amount of food associated with each lever press and the respective probability of receiving reinforcement over each block.  This was followed by 20 free-choice trials, where both levers were presented and the animal chose between the small/certain or the large/risky lever.   The probability of obtaining four pellets after pressing the large/risky lever was varied systematically across the two blocks, 50% for the first block and 12.5% for the second block.    Thus, when the probability of obtaining the 4-pellet reward is 50%, this option is more advantageous.  At 12.5%, the small/certain lever is the more advantageous option in the long term.    Stimulation experiments were conducted to determine if manipulation of phasic responses biases choice by changing reward valuation.  In separate groups of animals, probe tests were administered to determine if choice is biased by directly manipulating reward contingencies within a session.  These animals were well-trained on the standard task, and no stimulation was delivered during probe tests.  Three different reward probe tests were conducted.  For one probe condition, reward was always omitted when the risky option was selected, with the expectation that preference should shift away from the risky option during the session.  For the second test, the small-certain reward was omitted – this was presumed to increase risky choice.  The same animals were used for both of these probe tests.  Following LHb stimulation tests, these animals 104  were retrained for 5 days.  Half received one probe test first, and the other half received the other probe test first.  After the first probe test, animals were retrained on the task for 3 days and then received the other probe test.  The third probe increased the large reward probability to 100%, a manipulation expected to increase risky choice.  Animals receiving this probe test did not receive either of the other probe tests.  4.2.2.5 Reward Magnitude Discrimination As we have done in other studies (Ghods-Sharifi et al., 2009; St. Onge, Stopper, et al., 2012), we determined a priori that if stimulation reduced preference for the large/risky option, we would assess how this manipulation altered reward magnitude discrimination.  This was done to confirm whether or not the reduced preference for the risky option was due to a general reduction in preference for larger rewards.  A separate group of animals was trained and tested on an abbreviated task consisting of 48 trials divided into four blocks, each with 2 forced- and 10 free-choice trials.  As with the discounting task, choices were between a large (four pellet) option and a small (one pellet) option.  However, the probability of reinforcement for both options was held constant at 100% across blocks.  4.2.2.6 Stereotaxic Surgery and Brain Stimulation Protocol Rats were trained on their respective tasks until they displayed stable levels of choice, after which they were provided food ad libitum for 1-3 days and were then subjected to surgery.  Rats were anaesthetized with 100 mg/kg ketamine hydrochloride and 7 mg/kg xylazine and implanted with a unilateral bipolar stimulating electrode (0.15 mm diameter) within a concentric 26 G stainless steel guide cannula aimed at the LHb (flat skull: AP = -3.8 mm (bregma); ML = 105  +/-0.8 mm; DV = -4.7 mm (cortex)), VTA (AP = -5.3 mm (bregma); ML = +/-0.8 mm; DV = -8.3 (cortex)), or RMTg (implanted at a 10° angle in the ML plane; AP = -7.0 mm (bregma); ML = +/-1.0 mm; DV = -7.6 mm (cortex)) using standard stereotaxic techniques.  Electrodes were held in place with stainless steel screws and dental acrylic.  Rats were given at least 7 days to recover from surgery before testing.  During this period, they were handled at least 5 min each day and were food restricted to 85% of their free-feeding weight. Rats were subsequently trained on their respective task for at least 5 days until the group displayed stable levels of choice behavior for 3 consecutive days.  One to two days before their first stimulation test day, rats were connected to the stimulation apparatus by connecting a cable from the headcap to a commutator above the operant chamber, and a mock stimulation procedure was conducted.  The animal was allowed to move freely while completing its operant session, but no stimulation was administered.  The day after displaying stable discounting, the group received its first stimulation test day. A within-subjects design was used for all experiments.  Stimulation was achieved by delivering a 100 Hz train of cathodal constant current pulses (0.3 ms duration, 200 µA amplitude) using a PHM-15X intracranial stimulation current stimulator (Med Associates, St. Albans, VT) interfaced with the Med-PC software via Z-pulses.  These stimulation parameters were used for two reasons.  First, similar patterns of stimulation have been shown to be effective at altering certain forms of reinforcement learning (Shumake, Ilango, Scheich, Wetzel, & Ohl, 2010).  Second, the 100 Hz bursts were used in an attempt to mimic the elevated firing rates that accompany reward-related bursts.  In response to reward omission, LHb and RMTg neurons can briefly fire well-above 100 Hz, sometimes even reaching firing rates in excess of 150 Hz (Hong et al., 2011; Matsumoto & Hikosaka, 2007).  Although VTA neurons do not fire at 100 Hz, this 106  frequency was used for the current experiments to override phasic suppression of dopamine neurons. One group of animals implanted with LHb electrodes were given up to three types of outcome-contingent stimulation tests presented in varying order.  “Risky win” stimulation occurred on trials when the animal selected the large-uncertain option and received four sucrose pellets.  Four trains, each lasting 200 ms, were delivered to coincide with delivery of each of the four reward pellets.  “Small-certain” stimulation occurred on all trials when the animal chose the small-certain option.  As only one pellet was delivered, a single, 400 ms, train was initiated in conjunction with pellet delivery.  To ascertain the temporal specificity of our outcome-contingent LHb stimulation, intertrial interval (ITI) stimulation was used as a temporal comparison to “risky win” stimulation.  ITI stimulation consisted of four 200 ms trains delivered 500 ms apart only following a risky win.  However, instead of coinciding with pellet delivery, these trains were initiated randomly during the intertrial interval (ITI), at least 6 s, but no more than 14 s, following the trial’s conclusion.  A separate group of rats received stimulation prior to free-choice trials (pre-choice).  Here, stimulation was delivered immediately before each free choice trial.  1 s after the houselight was illuminated, a single 400 ms train was delivered that terminated 600 ms before the extension of the levers. Animals with RMTg electrodes received only “risky win” stimulation in a manner similar to LHb stimulation.  Animals implanted with VTA electrodes were stimulated following risky losses.  Thus, on trials when animals selected the large/risky lever option and did not receive reward, a 200 ms train was delivered immediately after the choice. Stimulation test days were separated by 1-3 “mock” test days during which the animal’s headcap was connected to the commutator during the training session, but no stimulation was 107  delivered.  The mock day immediately preceding a stimulation test day was used as the baseline comparison for that stimulation test.  For rats receiving outcome contingent stimulation of the LHb, each individual animal was given as many types of stimulation tests as was possible. For each type of stimulation test, animals received two such test separated by 2-5 days of baseline training.  In most instances, the data from two stimulation test days of the same type were averaged, as were the baseline data collected from each of the mock days preceding those stimulation days.   In some of these experiments, there was a considerable rate of attrition, as a result of headcaps being damaged during a testing session.  As such, all animals did not receive all stimulation tests, and some animals received only one instance of a particular type of test.   4.2.2.7 Histology The same histology procedures as for the neurophysiological studies were followed.  4.2.2.8 Data Analysis The primary dependent measure of interest was the proportion of choices directed towards the large reward lever (ie: large/risky or large/delayed) for each block of free-choice trials, factoring in trial omissions.  For each block, this was calculated by dividing the number of choices of the large reward lever by the total number of successful trials.  Choice data were analyzed using two-way, within-subjects ANOVAs, with treatment and probability block as two within-subjects factors.  For the probabilistic discounting task, the effect of trial block was always significant (P < 0.05) and will not be reported further.  Response latencies (the time elapsed between lever insertion and subsequent press) for both tasks were analyzed with a 108  similar ANOVA model.  Locomotor activity (i.e. photobeam breaks) and the number of trial omissions were analyzed with one-way repeated-measures ANOVAs.   We conducted a supplementary analysis to further clarify whether changes in choice behavior were due to alterations in sensitivity to reward (win-stay performance) or negative TDE  feedback (lose-shift performance) (Bari et al., 2009; St. Onge et al., 2011; St. Onge, Stopper, et al., 2012; Stopper & Floresco, 2011).  Animals’ choices during the task were analyzed according to the outcome of each preceding trial (reward or non-reward) and expressed as a ratio. The proportion of win-stay trials was calculated from the number of times a rat chose the large-uncertain lever after choosing the risky option on the preceding trial and obtaining the large reward (a win), divided by the total number of free-choice trials where the rat obtained the larger reward.  Lose-shift performance was calculated from the number of times an animal shifted to the small-certain lever after choosing the risky option on the preceding trial and not receiving reward (a loss), divided by the total number of free-choice trials resulting in a loss.  This analysis was conducted for all trials across the four blocks.  We could not conduct a block-by-block analysis of these data because there were many instances where rats did not obtain the large reward at all during the latter blocks.  Changes in win-stay performance were used as an index of reward sensitivity, whereas changes in lose-shift performance served as an index of negative TDE feedback sensitivity.  4.3 Results 4.3.1 Neurophysiological Studies For electrophysiology studies, animals anesthetized with chloral hydrate were implanted with a glass micropipette recording electrode in the VTA (Figure 15a) and a bipolar stainless  109   110  Figure 15. VTA dopamine neuron firing is influenced by LHb train stimulation. All histograms display an average of all cells included in each analysis.  a. Location of recording sites in the VTA.  b. Location of stimulation sites in the LHb.  c. In half of the responsive dopamine neurons, 100 Hz, 20-pulse train stimulation of the LHb inhibited firing.  In each of these cells, a complete cessation of firing occurred for at least 54 ms (mean inhibition = 84 ms).  d. In the other half of responsive dopamine neurons, this same stimulation protocol evoked a more complex response.  While these cells seem to have gradually become inhibited over the course of the 190 ms train, they are sharply excited at the end of the train, followed by a lengthy inhibition (mean inhibition = 336 ms).  e. Four-pulse train stimulation of the LHb provides reliable inhibition of dopamine neurons.  In 5 of 6 dopamine cells, 100 Hz, 4-pulse stimulation of the LHb inhibits these cells for a mean of 110 ms.   111  steel stimulating electrode in the ipsilateral LHb (Figure 15b).  To search for spontaneously-active dopamine neurons, a microdrive was used to gradually lower the recording electrode through the dorsal-ventral axis of the VTA until a dopamine neuron was located (electrophysiological properties to identify a dopamine neuron previously described by Grace and Bunney (1983)).  Allowing for direct comparison with the behavioral studies, extracellular recordings were taken from VTA dopamine neurons during the delivery of train stimulation (20 pulses) to the LHb at 100 Hz.  Ten dopamine neurons with stable spontaneous activity (mean firing rate = 5.5 Hz) were stimulated with 20 pulse, 100 Hz LHb trains.  Of these 10 cells, the majority (8) were noticeably affected by the LHb train.  Half (4) responded to the LHb train with a brief, but complete, cessation of firing beginning by the end of the train and lasting for at least 50 ms (mean inhibition = 84 ms; mean baseline (50 ms pre-train) firing rate = 6.3 +/-1.2 Hz; t(3) = 32.98, p<0.0001; Figure 15c).  However, the other half (4) of responsive cells were excited by the LHb train (Figure 15d).  This unexpected effect may be a rebound from prolonged inhibition, since these cells were gradually attenuated across the 190 ms train, only being excited at the conclusion of the train.  Additionally, these cells were inhibited for at least 150 ms (mean inhibition = 336 ms) following the initial excitation period.  As subsequent experiments aimed to determine if LHb trains could attenuate evoked firing, we searched for a parameter for the LHb train that would reliably result in inhibition immediately after the train.  In 6 cells from 3 animals, 100 Hz trains with only 4 pulses were delivered.  This stimulation parameter inhibited spontaneous activity in five VTA dopamine neurons for ~100 ms (mean inhibition = 110 ms; mean baseline (50 ms pre-train) firing rate = 3.5 +/-09 Hz; t(4) = 3.95, p<0.05; Figure 15e). In addition to the LHb stimulating electrode, some animals were also implanted with stimulating electrodes in the PPTg and mPFC.  Alternating single pulse stimulation of the PPTg 112  and mPFC was applied to find spontaneously-active dopamine neurons that exhibited evoked activity as well.  Seven cells from 4 animals (mean baseline firing rate = 3.7 Hz) were evoked by single pulse PPTg stimulation (mean evoked firing = 47%; Figure 16a,b).  Evoked firing was monosynaptic and occurred with latency as short as 5 ms.  In 6 of these neurons, the evoked response was attenuated or entirely inhibited by the 4 pulse, 100 Hz LHb train (mean evoked firing = 7%; F(1, 5) = 69.51, p<0.001; Figure 16c).  Of these 6 responsive neurons, 4 were entirely inhibited. Five different cells from 2 animals (mean baseline firing rate = 3.4 Hz) were evoked by single pulse mPFC stimulation (Figure 16d,e).  This excitation, while also monosynaptic, occurred at slightly longer latency (16 ms) than PPTg excitation.  The evoked response was typically characterized by rapid double- or triple-spiking, making a simple analysis of evoked firing probability problematic.  By separating the post-stimulation period into 5 ms epochs, it was determined that the firing rate of these neurons was elevated for about 100 ms (9.2 Hz), with the greatest increase occurring within 30 ms of stimulation (17 Hz).  In 4 of these neurons, the evoked response (as well as any spontaneous activity) was completely inhibited by LHb train stimulation (mean inhibition = 96 ms; mean baseline (50 ms pre-train) firing rate = 10.8 +/-1.9 Hz; t(3) = 7.00, p<0.01; Figure 16f).  Thus, these data show that train stimulation of the LHb not only is effective at suppressing spontaneous dopamine neuron activity, but can also occlude firing evoked by excitatory inputs from either the PPTg or the PFC. 113   Figure 16. LHb train stimulation inhibits or attenuates evoked dopamine neuron firing. All histograms display an average of all cells included in each analysis.  a. Location of stimulation sites in the PPTg.  b. Single pulse stimulation of the PPTg monosynaptically evokes dopamine neuron firing (47% firing probability).  c. Stimulation of the LHb with 100 Hz, 4-pulse trains attenuates this evoked firing (7% firing probability).  d. Location of stimulation sites in the mPFC.  e. Single pulse stimulation of the mPFC causes excitatory bursts of dopamine neuron firing.  f. 100 Hz, 4-pulse train stimulation of the LHb inhibits this firing (mean inhibition = 96 ms). 114  4.3.2 Behavioral Studies 4.3.2.1 LHb Stimulation Rats receiving LHb stimulation were trained on the probabilistic discounting task for an average of 23 days before being implanted with an electrode in the LHb, retrained on the task, and receiving stimulation tests.  When possible, animals with LHb electrodes were given multiple types of stimulation tests separated by baseline retraining days until stable patterns of responding were reestablished.  A total of 11 rats with acceptable placements (4 left, 7 right; Figure 17a) were stimulated following risky wins (rewarded selections of the large-uncertain option; Figure 17b).  Analysis of the choice data revealed a significant main effect of stimulation (F(1, 10) = 11.25, p<0.01) but no stimulation x trial block interaction (F(1, 10) = 1.63, n.s.).  Specifically, stimulation following a risky win decreased choice of the large-uncertain reward (Figure 17c).  Stimulation did not affect response latencies (F(1, 10) = 0.04, n.s.; Table 4), locomotion (F(1, 10) = 1.82, n.s.; Table 4), or trial omissions (F(1, 10) = 0.71, n.s.; Table 4).  Placements outside the LHb were analyzed to assess the anatomical specificity of this effect (Figure 17a, grey squares).  Stimulation on the hippocampal and thalamic borders after a risky win did not significantly bias risky choice (F(1, 2) = 0.32, n.s.; Figure 17c inset). We further analyzed the proportion of “win-stay” and “lose-shift” trials to determine whether the decrease in risky choice induced by stimulation following a risky win could be attributed to altered reward- or negative TDE feedback sensitivity, respectively.  We hypothesized that “win-stay” tendencies would be impacted, as stimulation was occurring only on those trials that the animal received the uncertain reward.  Indeed, LHb stimulation following a risky win decreased win-stay tendencies (Control = 70 +/-5%; Stimulation = 39 +/-8%; F(1,  115    116  Figure 17. LHb stimulation immediately following rewarded outcomes biases risky choice. a. Placements of stimulation electrodes in the LHb.  Accurate placements are displayed as black circles, and inaccurate placements are displayed as grey squares.  b. For “risky win” stimulation, 20 pulses were delivered at 100 Hz coinciding with the delivery of each of the 4 pellets.  c. Stimulation of the LHb immediately following a risky win decreases choice of the large-uncertain option.  Stimulation just outside the LHb did not influence choice (inset) d. LHb stimulation after a risky win decreases reward sensitivity and increases loss sensitivity.  This manipulation increased the likelihood of a safe choice after a risky loss (lose-shift) and also decreased the likelihood of a risky choice after a risky win (win-stay).  e. Omitting the large reward completely also decreases risky choice.  In a separate group of animals not receiving LHb stimulation, a probe test was given, during which the large reward was always omitted on uncertain choices.  This manipulation biased choice similarly to when other animals received LHb stimulation after a risky win.    f. For “small, certain” stimulation, 40 pulses were delivered at 100 Hz coinciding with the single pellet.  g. Stimulation of the LHb immediately following a small-certain choice increases risky choice.  There is a significant stimulation x trial block interaction such that LHb stimulation with a small reward only increases risky choice when the probability of reward on the large-uncertain option is 12.5%.  h. Omitting the small reward also increases risky choice.  In a separate group of animals not receiving LHb stimulation, a probe test was given, during which the small reward was never delivered.  This manipulation biased choice similarly to when other animals received LHb stimulation during receipt of a small reward.  For discounting graphs, probability blocks are divided into early (first 10) and late (last 10) free choice trials.  (  p<0.05;  p<0.01)    117  Table 4. Mean (±SEM) locomotor activity, response latencies, and omissions. *p<0.05, **p<0.01 vs. control  Locomotor activity (beam break/min) Response latency (s) Omissions (no. of trials per session) LHb    Risky win    Control 41 (4) 1.0 (0.1) 1.3 (0.7) Stimulation 47 (7) 1.0 (0.2) 1.8 (0.5) Small reward    Control 38 (3) 0.9 (0.2)  1.4 (1.0) Stimulation 42 (5) 1.4 (0.2) * 5.3 (2.2) ITI after risky win    Control 32 (2) 1.1 (0.2) 2.8 (1.0) Stimulation 42 (3) ** 1.3 (0.2) 4.8 (1.2) Reward magnitude    Control 32 (3) 1.0 (0.1) 0.2 (0.1) Stimulation 42 (3) * 3.0 (0.4) ** 2.4 (0.4) ** Pre-choice    Control 33 (5) 0.6 (0.1) 0.6 (0.2) Stimulation 53 (9) ** 2.0 (0.3) ** 3.7 (0.7) ** RMTg    Risky win    Control 19 (3) 1.5 (0.2) 5.4 (2.3) Stimulation 19 (3) 3.0 (0.8) 17.0 (9.2) VTA    Risky loss    Control 32 (2) 0.9 (0.1) 1.6 (0.5) Stimulation 31 (3) 0.9 (0.3) 2.4 (1.9)    118  10) = 7.49, p<0.05; Figure 17d).  Interestingly, and somewhat unexpectedly, this same stimulation increased lose-shift tendencies (Control = 45 +/-5%; Stimulation = 58 +/-9%; F(1, 10) = 5.03, p<0.05).  Thus, stimulation of the LHb during reward receipt following a risky win made rats more risk-averse following any risky choice; whether they won or lost, rats shifted toward the small, certain option. Ten animals previously used for the various stimulation experiments were retrained on the probabilistic discounting task for 5 days before receiving a probe test during which the large reward was never delivered.  This behavioral probe test did not involve any brain stimulation.  Omission of the large reward entirely during this probe test significantly decreased choice of this reward in a manner similar to stimulation of the LHb following a risky win (F(1, 9) = 30.85, p<0.0005; Figure 17e). As a comparison, 8 rats with acceptable placements (4 left, 4 right; Figure 17a) received stimulation of the LHb after receiving the small-certain reward (Figure 17f).  Analysis of the choice data revealed a significant main effect of stimulation (F(1, 7) = 5.75, p<0.05; Figure 17g), as stimulation during the receipt of a small/certain reward increased risky choice.  As well, there was a significant stimulation x block interaction (F(1, 7) = 11.39, p<0.05); specifically, stimulation increased risky choice only during the 12.5% block (p<0.05).  Stimulation during a small/certain reward significantly increased response latencies (F(1, 7) = 5.80, p<0.05; Table 4), but had no effect on locomotion (F(1, 7) = 3.37, n.s.; Table 4) or trial omissions (F(1, 7) = 2.53, n.s.; Table 4). Ten animals previously used for stimulation experiments were retrained on the probabilistic discounting task for 5 days before receiving a probe test during which the small reward was never delivered.  Just as stimulation of the LHb following receipt of a small reward 119  had increased risky choice, omission of the small reward during this probe test significantly increased choice of the large/risky option (F(1, 9) = 5.19, p<0.05; Figure 17h). Further analysis of outcome sensitivity for the stimulated animals revealed no significant effect on win-stay tendencies (Control = 52 +/-6%; Stimulation = 66 +/-8%; F(1, 7) = 2.21, n.s.) nor on lose-shift tendencies (Control = 45 +/-6%; 41 +/-7%; F(1, 7) = 2.53, n.s.).  Thus, increased risk preference was not a result of altered sensitivity to one specific type of uncertain outcome. To assess the importance of precise timing of stimulation, 8 rats with acceptable placements (5 left, 3 right; Figure 17a) were stimulated 6-14 s following risky wins (Figure 18a).  In contrast to effects observed when stimulation occurred immediately following a risky win, delayed stimulation during the ITI after risky wins did not disrupt risky choice (F(1, 7) = 0.47, n.s.; Figure 18b).  This manipulation also did not produce a significant stimulation x block interaction (F(1, 7) = 1.47, n.s.).  Just as ITI stimulation did not bias choice, it had no effect on other measures related to choice, as response latencies (F(1, 7) = 0.60, n.s.; Table 4) and trial omissions (F(1, 7) = 2.80, n.s.; Table 4) were unaltered.  However, general locomotor activity was increased by ITI stimulation (F(1, 7) = 22.72, p<0.005; Table 4). A group of 7 rats (4 left, 3 right; Figure 17a), separate from those trained on the probabilistic discounting task, were trained on the reward magnitude discrimination task.  For this simpler task, subjects chose between a small-certain option (1 pellet, 100%) and large-certain option (4 pellets, 100%).  Unlike stimulation following uncertain large rewards, stimulation following reliable, cost-free, large rewards did not disrupt choice biases (Figure 18c).  Animals had a clear preference for large, certain rewards, and stimulation during delivery of these large, certain rewards did not alter this preference (F(1, 6) = 3.98, n.s.; Figure 18d).  This  120   Figure 18. Uncertainty and precise timing are necessary for outcome-contingent stimulation to bias choice. a. LHb stimulation during the ITI was administered the same as for a risky win, but rather than delivering each pulse train in conjunction with pellet delivery, stimulation was delayed.  The 4 trains, occurring at 0.5 ms intervals, began after a randomized delay, 6-14 s following a risky win.  b. LHb stimulation delayed after a risky win does not decrease risky choice.  The lack of effect indicates that LHb stimulation must be precisely timed to the receipt of the unexpected reward.  c. A separate group of animals was trained on a reward magnitude discrimination task during which animals chose between a small (1 pellet) and large (4 pellet) reward that were both reliably delivered 100% of the time.  LHb stimulation on the reward magnitude task was the same as for “risky win” stimulation on the probabilistic discounting task, with trains being delivered in conjunction with each of the 4 pellets.  d. LHb stimulation did not decrease preference for cost-free large rewards.  Stimulation of the LHb during delivery of expected large rewards did not decrease choice of this option.  Thus, stimulation of the LHb decreases preference only for unexpected large rewards but not large rewards in general.  121  manipulation did however have a pronounced effect on auxiliary measures.  Stimulation greatly increased hesitation to respond (F(1, 6) = 22.44, p<0.005; Table 4), trial omissions (F(1, 6) =32.03, p<0.005; Table 4), and locomotor activity (F(1, 6) = 11.51, p<0.05; Table 4).  These results suggest that while choice itself was unaffected, the stimulation condition was behaviorally active. To determine if stimulation of the LHb before choice was effective in biasing decision-making, 8 rats with acceptable placements (4 left, 4 right; Figure 17a) received stimulation immediately before all free choices (Figure 19a).  Stimulation significantly decreased risky choice (F(1, 6) = 9.50, p<0.05; Figure 19b).  There was a trend toward significance for the stimulation x trial block interaction (F(1, 6) = 4.77, p=0.07).  This was a result of decreased risky choice only during the 50% block (p<0.0005).  This effect on choice was accompanied by a marked increase in response latencies (F(1, 6) = 14.22, p<0.01; Table 4), locomotion (F(1, 6) = 17.20, p<0.01; Table 4), and trial omissions (F(1, 6) = 17.39, p<0.01; Table 4). Closer inspection of the individual data from this experiment revealed that under baseline conditions, 2 rats showed a strong preference for the risky option across the entire session. The remaining 5 rats showed more optimal shifts in choice that tracked the utility of each option, selecting the 4-pellet option more during the 50% block, and the small/certain option more when large-reward odds were 12.5%.  Despite the relatively small number of subjects, we re-analyzed the choice data incorporating group (risky-preferring versus optimizers) as an additional between-subjects factor.  This analysis reveals a significant group x stimulation x block interaction (F(1, 5) = 13.37, p<0.05). As displayed in Figure 19c, pre-choice LHb stimulation markedly reduced choice of the risky option across all trial blocks in risky-preferring rats (p<0.05).  In comparison, LHb stimulation in optimizer rats reduced risky choice relative to  122   Figure 19. LHb stimulation immediately before choices makes animals averse to advantageous probabilistic reward. a. Pre-choice LHb stimulation consisted of a single train of 40 pulses delivered 1 s after the houselight was illuminated, 1 s before the levers were extended.  b. LHb stimulation before choice decreases risky choice.  Stimulation decreased risky choice on the 50% trial block, when the large-uncertain option was more advantageous.  c. In animals with a baseline preference for the large-uncertain option during the 12.5% block, stimulation decreased risky choice across both blocks (n = 2).  d. In the other animals (n = 5), stimulation tended to increase risky choice during the 12.5% block (p<0.15)  e. This manipulation specifically decreased reward sensitivity, as win-stay percentage decreased while lose-shift percentage was unchanged. ( p<0.05).     123  baseline conditions during the 50% block (p<0.05; Figure 19d).  During the 12.5% block, these rats selected the small/certain option more during baseline conditions.  Here, pre-choice LHb stimulation actually caused a slight increase in preference for the large/risky option (baseline = 22 +/-3%; stimulation = 29 +/-3%), although this effect was not statistically significant (p=0.128).  Taken together, the findings from this experiment suggest that during risk/reward decision making, suppression of pre-choice phasic increases in dopamine signaling (via LHb stimulation) interferes with action selection directed towards more preferred options, particularly when those options are associated with larger, uncertain rewards. Stimulation prior to choice had a specific influence on reward sensitivity.  This manipulation decreased win-stay behavior (Control = 69 +/-5%; Stimulation = 36 +/-9%; F(1, 6) = 9.15, p<0.05; Figure 19e).  However, lose-shift behavior was unaffected by stimulation (Control = 51 +/-4%; Stimulation = 51 +/-7%; F(1, 6) = 0.00, n.s.), indicating that this manipulation did not influence negative TDE feedback sensitivity.  4.3.2.2 RMTg Stimulation LHb neurons project to VTA dopamine neurons directly or disynaptically, via the GABAergic RMTg.  To assess the possibility that the effects of LHb stimulation were attributable to inhibitory influences, via the LHb-RMTg-VTA pathway, 5 rats (2 left, 3 right; Figure 20a) implanted with electrodes in the RMTg were trained and tested on the probabilistic discounting task.  Stimulation of the RMTg following a risky win (Figure 20b) significantly decreased risky choice (F(1, 4) = 9.52, p<0.05; Figure 20c).  While this manipulation influenced choice behavior, it did not influence response latencies (F(1, 4) = 5.69, n.s.; Table 4), trial omissions (F(1, 4) = 1.98, n.s.; Table 4), or locomotion (F(1, 4) = 0.05, n.s.; Table 4).  RMTg   124   Figure 20. RMTg stimulation after risky wins decreases risky choice. a. Placements of RMTg stimulation electrodes.  Accurate placements are displayed as black circles and inaccurate placements are displayed as grey squares.  b. Identically to the LHb, the RMTg was stimulated following a risky win, with pulse trains delivered in conjunction with each of the 4 pellets.  c. RMTg stimulation during delivery of unexpected rewards decreased risky choice. ( p<0.05)    125  stimulation after a risky win did not influence lose-shift behavior (control = 50+/-9%, inactivation = 56+/-14%; F(1,4) = 0.44, n.s.).  Win-stay behavior was decreased (control = 53+/-6%, inactivation = 34+/-16%), though this effect did not reach significance (F(1,4) = 2.44, p=0.19), likely attributable to the low number of subjects in the analysis.  4.3.2.3 VTA Stimulation The experiments described above demonstrate that stimulation of the LHb during risky wins biases choice away from large-uncertain choices.  The RMTg, while having various other afferent and efferent connections, consists of GABAergic neurons that receive excitatory input from the LHb and can inhibit VTA dopamine neurons.  The observation that RMTg stimulation provides the same effect as LHb stimulation supports the possibility that the effects of LHb stimulation are mediated, at least in part, by disynaptic inhibition of dopamine neurons.  It is well-established that reward omissions result in phasic dips in dopamine neuron activity, both during simpler Pavlovian settings and more complex forms of behavior involving risk/reward decision-making (Schultz, 1997; Sugam et al., 2012).  However, the manner in which these dips may affect choice behavior is still unclear.  Thus, it was of interest to determine if artificial increases in phasic dopamine, via stimulation of the VTA that would be expected to override phasic dips linked to reward omissions, might shift biases toward the large-uncertain option following risky losses.  A total of 9 rats with acceptable placements (6 left, 3 right; Figure 21a) received stimulation of the VTA immediately following unrewarded choices of the large-uncertain option (Figure 21b).  Stimulation of the VTA immediately following a risky loss significantly increased risky choice (F(1, 8) = 9.19, p<0.05; Figure 21c), an effect that was independent of probability   126   Figure 21. VTA stimulation following risky losses increases risky choice. a. Placements of stimulating electrodes in the VTA.  Accurate placements are displayed as black circles and inaccurate placements are displayed as grey squares.  b. Opposite the LHb and RMTg, the VTA was stimulated following risky losses.  A single 20 pulse train was delivered immediately after animals chose risky and received no reward.  c. Stimulation of the VTA following risky losses increased overall choice of the large-uncertain option.  d. Reliable delivery of the large reward increases risky choice.  In a separate group of animals not receiving VTA stimulation, a probe test was given, during which the large reward was always delivered on uncertain choices.  This manipulation biased choice similarly to when other animals received VTA stimulation after a risky loss.  e. Stimulation specifically decreased loss sensitivity, as lose-shift percentage decreased while win-stay percentage was unchanged. ( p<0.05)       127  (stimulation x block interaction: F(1, 8) = 0.71, n.s).  While this manipulation impacted choice behavior itself, it did not influence response latencies (F(1, 8) = 0.07, n.s.; Table 4), trial omissions (F(1, 8) = 0.25, n.s.; Table 4), or locomotion (F(1, 8) = 0.3, n.s.; Table 4).  These data suggest that temporally-precise phasic activation of dopamine neurons, which presumably would override phasic dips associated with reward omissions, shifts choice biases towards larger, uncertain rewards.  By extension, this suggests that phasic dips in dopamine neuron activity convey important short-term information about non-rewarded actions that in turn has an effect on the subsequent direction of choice. Nine different animals previously used for stimulation experiments were retrained on the probabilistic discounting task for 5 days before receiving a probe test during which the large reward was always delivered (no brain stimulation occurred).  During this probe test, rats selected the risky option more, in a manner similar to that induced by stimulation of the VTA following a risky loss (F(1, 8) = 4.17, p=0.075; Figure 21d).  Subsequent analysis of stimulated animals for outcome sensitivity revealed a significant decrease in lose-shift tendencies (Control = 51 +/-8%; Stimulation = 37 +/-6%; F(1, 8) = 6.49, p<0.05; Figure 21e) but no effect on win-stay behavior (Control = 61 +/-10%; Stimulation = 66 +/-9%; F(1, 8) = 0.47, n.s.).  This profile suggests a selective decrease of negative TDE feedback sensitivity, a logical result given that stimulation occurred only when animals were denied reward.  4.4 Discussion Electrophysiological investigations revealed that pulse-train stimulation of the LHb inhibits or attenuates externally-evoked dopamine neuron firing.  This suggests that LHb 128  stimulation could override phasic bursting of VTA dopamine neurons that is known to occur in response to unexpected rewards.  We found that stimulation of the LHb after risky wins on a probabilistic discounting task decreases risky choice.  Schultz et al. (1997) discovered that phasic firing of dopamine neurons encodes a reward prediction error signal.  Monkeys received juice reward which, through extended training, became predicted by a conditioned stimulus.  As the monkeys learned the predictive value of this cue, the phasic dopamine burst, initially elicited by the reward, shifted to the cue.  Subsequent violations of this expected cue-reward relationship resulted in phasic responses during the reward phase.  Delivery of an unexpected reward (one not preceded by the cue) caused a phasic burst.  Omission of an expected reward resulted in a phasic dip – a very brief, but complete, cessation of firing.   Expanding on these findings, Sugam et al. (2012) examined how phasic dopamine changes during a risky decision-making task with animals choosing between a small-certain and large-uncertain reward.  During forced-choice trials, phasic dopamine release to reward predictive cues was greater for the preferred reward.  During free-choice trials, cue-evoked dopamine release signaled the availability of the preferred option.  During the reward phase, phasic dopamine encoded a reward prediction error signal.  Reward evoked dopamine responses were greatest for risky wins and moderate for receipt of the small-certain reward.  Dopamine dipped slightly following risky losses.  The authors conclude that phasic dopamine is encoding the subjective value of future rewards, possibly influencing future decisions.  The data from Chapter 2 indirectly support the idea that these phasic responses are necessary to guide choice (Stopper et al., 2013).  Manipulation of the D1 receptor in the NAc, known to be sensitive to phasic transmission, biased risky choice.  However, we know that both tonic and phasic 129  dopamine are sensitive to risk-based decision-making (St. Onge, Ahn, et al., 2012; Sugam et al., 2012).  Prior to the current experiments, we did not know how specific phasic bursts or dips guide risky choices.  We used LHb stimulation as a tool to override phasic dopamine bursts.  Previous studies indicate that single-pulse stimulation causes a brief, but complete, cessation of dopamine neuron firing (Christoph et al., 1986; Ji & Shepard, 2007).  The present study employed high-frequency trains in order to mimic the way these neurons would fire during reward-related bursts (Matsumoto & Hikosaka, 2007).  The high frequency trains were capable of temporarily inhibiting dopamine neuron firing.  Our neurophysiology experiments also demonstrated that high frequency LHb train stimulation not only inhibited spontaneous activity, but importantly, also suppressed burst-like firing evoked by activation of excitatory afferent inputs.  These afferent regions are critical to transition dopamine neurons into a burst-firing mode (Chergui et al., 1994; Floresco et al., 2003; Grace & Bunney, 1984a; Lodge & Grace, 2006a; Lokwan et al., 1999).  Presumably, these LHb trains would also inhibit naturally-occurring dopamine bursts caused by reward.  Outcome-contingent LHb stimulation was effective at biasing choice.  Stimulation after risky wins decreased risky choice.  Conversely, stimulation after small-certain rewards increased risky choice.  Non-stimulation reward omission probe tests produced choice biases similar to stimulation tests.  On a probe when the large reward was never delivered, risky choice decreased, similar to LHb stimulation after a risky win.  When the small-certain reward was always omitted, risky choice increased, similar to stimulation after receiving this reward.  This collection of results suggests that LHb stimulation may be tricking the animal to acting as if it has not received a reward.  Although there is a substantial interval between choices, the stimulation 130  received in conjunction with reward outcomes still managed to bias subsequent choices, indicating the power of this signal that can persist for some time following activation of these pathways to influence subsequent choice behavior.  Furthermore, stimulation during the inter-trial interval after risky wins did not bias choice.  This finding indicates that stimulation must precisely override reward-evoked dopamine bursts.  A separate group of animals received stimulation following large rewards on a reward magnitude discrimination task.  There was a clear baseline preference for large-certain rewards and LHb stimulation did not erode this bias.  This highlights the importance of phasic dopamine responses specifically for rewards that differ in subjective value.  Phasic inhibition of dopamine does not diminish preference for cost-free, objectively greater, rewards.  This is consistent with previous findings showing that dopamine depletion or receptor blockade in the NAc does not impair reward magnitude discrimination (Salamone et al., 1994)  These results suggest that brief reward-evoked phasic bursts are important for biasing future choices between rewards of differing subjective value. Stimulation of the VTA directly was used to override phasic dips elicited by reward omission.  When paired with risky losses, VTA stimulation increased risky choice.  In a non-stimulation probe test, delivering the large reward with 100% certainty also increased risky choice.  This pairing of results suggests that VTA stimulation after risky losses may be fooling the animal into acting as if it has won, when in reality it received no reward.  Therefore, phasic dips in dopamine seem to encode losses and bias future choices away from risky options.  Sugam et al. (2012) found that cue-evoked phasic dopamine responses signal the preferred reward on free choices.  To determine if this signal influences choice, we administered LHb stimulation trains immediately before all free choice.  This manipulation greatly reduced risky choice, particularly during the 50% block when baseline preference was for the large-131  uncertain reward.  In a small subset of animals with a baseline preference for the large-uncertain reward on the 12.5% block, pre-choice stimulation caused a more pronounced decrease in risky choice on this trial block.  These data demonstrate that pre-choice phasic bursts are necessary to direct choice toward preferred larger rewards.  Tonic dopamine transmission is also influenced during risky decision-making.  St. Onge, Ahn, et al. (2012) used microdialysis to measure changes in tonic dopamine concentrations in the NAc during probabilistic discounting.  Regardless of the progression of probabilities within a session, tonic dopamine concentrations increased as a function of reward probability.  In a yoked task where animals received equivalent rewards without making choices, NAc dopamine tracked the amount of reward received by the animal.  However, dopamine was slightly higher during the choice task than the yoked task, specifically on the 50-12.5% blocks when reward was uncertain.  Comparing the slope for dopamine concentration with the slopes for choice and food revealed that dopamine more accurately tracks choice behavior than the amount of food received.  On the choice task, dopamine concentration was greater during free-choices compared to forced-choices.  This echoes phasic findings which indicate that dopamine tracks potential availability of preferred rewards (Day et al., 2010; Gan et al., 2010; Sugam et al., 2012).  St. Onge, Ahn, et al. (2012) propose a complex and multi-faceted role for tonic dopamine transmission in the NAc with regard to risky decision-making.  This signal integrates reward uncertainty, reward preferences, active choice behavior, and changes in reward availability (St. Onge, Ahn, et al., 2012). These data mesh with the current findings to show that phasic and tonic dopamine both encode important reward information on different time-scales.  Tonic dopamine maintains a stable and slowly-changing representation of average outcomes while the phasic signal helps 132  update preferences on a choice-by-choice basis.  The current data demonstrate that both pre-choice and outcome-contingent phasic responses are critical for risk-based decision-making.  Through precise timing, these brief signals exert a powerful influence over subsequent choices.  Via phasic signaling, individual violations in reward prediction error can rapidly update reward valuation. 133  Chapter 5: Discussion This thesis examined the subcortical modulation and utilization of dopamine to guide risk-based decision-making.  Among the numerous terminal regions through which dopamine exerts its influences in the brain, one of its most prominent projection is to the nucleus accumbens (see Figure 22).  The importance of dopamine in the NAc is due not only to its relatively dense innervation by dopamine but also because striatal regions are those that permit a prominent segregation between tonic and phasic modes of transmission (Grace et al., 2007).  The NAc directly influences goal-directed action, owing to its input from a variety of cortical and limbic regions that are pre- and post-synaptically regulated by dopamine (Grace et al., 2007).  For the first aim, selective dopamine agonists and antagonists were infused into the NAc.  Blockade of D1 receptors decreased risky choice on the probabilistic discounting task.  Stimulation of D1 receptors optimized risky choice.  The D1 agonist increased choice of the larger reward when it was highly probable but decreased risky choice when it was disadvantageous.  Neither D2 stimulation nor blockade influenced risky choice.  As well, quinpirole, a compound with relatively similar affinity for D2 and D3 receptors, was without effect.  However, the agonist PD 128 907, with more selective affinity for D3 receptors, decreased risky choice.  These results indicate that the effects of dopamine in the NAc are mainly mediated via the D1 receptor to modulate risk-based decision-making.  This may have implications for the specific inputs to the NAc that are potentiated and the importance of the tonic/phasic balance in mediating this behavior. The lateral habenula regulates important midbrain regions that distribute various neurotransmitters, including dopamine, serotonin, norepinephrine, and acetylcholine (Lecourtier & Kelly, 2007).  Of these targets, its inhibition of dopamine neurons in the VTA has drawn  134   Figure 22. Schematic of the role of dopamine in the neural circuit regulating risk/reward decision-making. The mOFC tempers urges toward risky rewards.  Activation of the mPFC allows animals to change their choice bias as the odds of winning change.  This tendency is influenced by changes in dopamine acting on D2 receptors, and exerts a powerful top-down influence over the BLA.  Afferent input from the BLA to the NAc directs preference toward large, risky rewards, particularly when they are advantageous long-term.  Stimulation of D1 receptors in the NAc – likely by phasic bursting – promotes this strategy.  The LHb may exert phasic inhibition of dopamine, allowing animals to express subjective preferences during risk/reward decision-making.  Arrow colors indicate the nature of these connections: white indicates excitatory (glutamate) transmission, green indicates dopamine transmission, and black represents disynaptic inhibition (via the RMTg).   135  considerable attention (Ji & Shepard, 2007; Matsumoto & Hikosaka, 2007).  This inhibition is responsible for LHb neurons transmitting a negative reward prediction error signal (Matsumoto & Hikosaka, 2007) which has caused some to view the LHb as an aversion or “anti-reward” center (Stamatakis & Stuber, 2012).  Despite a growing body of literature that highlights the encoding of reward omission by LHb neurons, they seem to encode a variety of nuanced phasic events (Bromberg-Martin & Hikosaka, 2011).  Only a few studies have looked at inactivation or lesion of the LHb (Lecourtier et al., 2008; Lecourtier, Neijt, & Kelly, 2004; Lecourtier & Kelly, 2005), and prior to the current experiments the contribution of the LHb to decisions involving reward uncertainty was not understood.  Using GABA agonists, I inactivated the LHb of animals well-trained on the probabilistic discounting task.  These animals, but not animals receiving inactivation of closely adjacent brain regions, became erratic in their decision-making.  As a group, LHb-inactivated animals became indifferent in choices between large-uncertain and small-certain rewards, regardless of the relative advantageousness of options.  That LHb inactivation did not alter choice on a reward magnitude discrimination task suggests that the LHb is not required for simple objective choices between large and small rewards.  However, LHb inactivation also created an indifferent profile of choice in animals trained on a delay discounting procedure, indicating that the LHb is important in maintaining choice biases when larger rewards are associated with any subjective cost.  Thus, while others have asserted that the LHb is an “anti-reward” center, this chapter proposes that the LHb may operate more as a “pro-preference” center. The influence of the LHb on behavior in situations that present reward uncertainty may be attributable to its influence on phasic dopamine signaling (Matsumoto & Hikosaka, 2007).  As such, I was interested to determine if artificially manipulating phasic dopamine bursts and dips, 136  with LHb or VTA stimulation, could bias reward preference.  By administering a brief pulse-train to the VTA immediately following a risky loss, I determined that substituting a phasic burst of dopamine, when a dip would normally occur, biases choice toward the large-uncertain reward.  Likewise, stimulation of the LHb immediately following a risky win decreased risky choice, likely by inhibiting a phasic dopamine burst.  The RMTg, which receives excitatory input from the LHb, consists of GABAergic neurons that inhibit midbrain targets that include VTA dopamine neurons.  Since RMTg stimulation had the same effect as LHb stimulation, this suggests that the effect is at least partially mediated via disynaptic inhibition of dopamine neurons.  Delayed LHb stimulation during the inter-trial interval following a risky win did not affect choice, highlighting the importance of timing for this phasic effect.  Electrophysiological findings further supported the ability of LHb stimulation to occlude dopamine bursts.  In addition to inhibiting spontaneous dopamine neuron firing, short LHb trains also attenuated or inhibited phasic bursts induced by PPTg or mPFC stimulation.  5.1 Mesoaccumbens Dopamine and Risk-Based Decision-Making St. Onge and Floresco (2009) discovered an important role for systemic dopamine in risk-based decision-making.  D1 and D2 receptors contributed similarly; blockade of either receptor made animals risk-averse and stimulation of either receptor made animals riskier.  The experiments of Chapter 2 demonstrate that dopamine in the NAc modulates risk-based decision-making in a receptor-selective manner.  Neither stimulation nor blockade of D2 receptors influenced decision-making on a probabilistic discounting task, but risky choice was biased by manipulation of D1 receptors.  D1 antagonism decreased risky choice across all probability 137  conditions.  Stimulation of D1 receptors optimized decision-making, steepening discounting to better account for changes in expected utility.    The unique effect of D1 stimulation in the NAc is likely a result of the way the NAc uses dopamine receptors to select particular afferent inputs (Grace et al., 2007).  In the NAc, D2 receptor stimulation attenuates mPFC input, and D2 blockade increases mPFC input (Goto & Grace, 2005b; O’Donnell & Grace, 1994; West & Grace, 2002).  Stimulation of D1 receptors potentiates hippocampal input (West & Grace, 2002).  Differential modulation of these inputs by dopamine is important for learning and behavioral flexibility necessary for goal-directed action.  Goto and Grace (2005b) trained animals on a two-choice maze task to obtain reward by turning into the arm with a visual cue.  Once they acquired this strategy, the rule was shifted so that animals were rewarded by always turning the same way, regardless of which arm was visually cued.  Using a disconnection method, the hippocampus or PFC was inactivated unilaterally and a D1 antagonist, D2 agonist, or saline was given contralaterally in the NAc.  In the PFC-D2 disconnection, task switching was impaired, but initial task acquisition was not impaired.  Hippocampus-D1 rats were impaired at initial learning and subsequent task switching.  Other work has shown that NAc D1 receptors are not needed to learn a new strategy but are needed to maintain it (Haluk & Floresco, 2009).  Thus, PFC input to the NAc, suppressed by D2 receptor stimulation, mediates behavioral flexibility, and limbic input, via D1 receptors, promotes strategy learning and maintenance (Goto & Grace, 2005b; Grace et al., 2007).  These studies provide evidence that different dopamine receptors select for different inputs, and consequently, a particular behavior may be reliant on a particular dopamine receptor type.  138  5.1.1 The D1 Receptor in Risk-Based Decision-Making The differential contribution of dopamine receptors in the NAc to risk-based decision-making is supported by functional disconnection studies involving the NAc and its cortical and limbic afferents (St. Onge, Stopper, et al., 2012).  The fact that neither stimulation nor blockade of NAc D2 receptors affected risky choice suggests that the modulation of cortical input to the NAc by dopamine does not play a prominent role in modifying risk/reward choice behavior.  In support of this idea, St. Onge, Stopper, et al. (2012) demonstrated that a functional disconnection of the mPFC and NAc does not influence risky choice.  Some may be puzzled by this, given that cortical information must eventually route to the NAc in order to influence action selection.  However, in that same study, by selectively disconnecting the ascending and descending cortico-amygdalar pathway, they determined that serial flow of information from the mPFC to the BLA, and subsequent transmission from the BLA to the NAc, is essential for risk-based decision-making.  BLA-NAc disconnection reduced risky choice, similar to inactivation of either region in isolation (Ghods-Sharifi et al., 2009; Stopper & Floresco, 2011), further supporting that their connection is particularly important.  Like the hippocampus, BLA input to the NAc is potentiated by D1 receptors (Floresco et al., 2001).  The fact that direct BLA, but not PFC, input to the NAc is critical for risk-based decision-making is in keeping with the finding that D1, but not D2, receptors in the NAc are crucial for this form of choice.  Mesoaccumbens dopamine is important for many different and overlapping processes (Salamone & Correa, 2012).  The functional and chemical heterogeneity of the accumbens allows it to integrate diverse information (Zahm, 1999).  The accumbens can be divided into subregions that show different dopamine receptor expression and influence behavior differently (Zahm, 1999).  Previous studies determined that the shell subregion of the NAc processes risky 139  decisions (Stopper & Floresco, 2011).  It has been proposed that the NAc consists of neuronal ensembles that share functional and anatomical characteristics (Heimer et al., 1997).  Their common neurochemistry and connectivity may allow them to accomplish specific behavioral functions.  The shell region of the NAc in particular receives dense projections from the BLA (Zahm, 1999).  NAc shell inactivation, BLA inactivation, and BLA-NAc disconnection all decrease risky choice  (Ghods-Sharifi et al., 2009; St. Onge, Stopper, et al., 2012; Stopper & Floresco, 2011).  Since D1 blockade in the NAc produced the same effect, and the D1 receptor facilitates BLA input to the NAc, this receptor may modulate the functioning of an important BLA-shell ensemble to guide risky choice.  Conversely, D1 stimulation, possibly by activating this ensemble, facilitates improvement in risky choice. The experiments of subsequent chapters were used to more closely examine the contribution of the phasic dopamine signal to risk-based decision-making.  5.2 The LHb and Dopamine Modulation As discussed previously, dopamine neurons in the VTA encode an RPE signal (Schultz et al., 1997).  Phasic bursts of dopamine are evoked by delivery of an unexpected reward or a cue that predicts reward, and these cells also display phasic dips, briefly ceasing to fire in response to omission of expected reward.  The conditions necessary to trigger phasic bursts have been well-characterized.  Stimulation of NMDA receptors in the VTA via afferent excitatory inputs (mPFC, PPTg, and LDTg) allow for the transition to burst firing (Chergui et al., 1994; Floresco et al., 2003; Lodge & Grace, 2006a; Lodge & Grace, 2006b; Lokwan et al., 1999; Overton et al., 1996; Tong et al., 1996).  The inhibitory inputs necessary for phasic dips have not been as well-characterized.  The VP provides inhibitory input to the VTA that keeps dopamine neurons in 140  their silent (non-firing) state (Floresco et al., 2003).  Lifting this VP inhibition permits a proportion of “silent” dopamine neurons to fire in a tonic state and increase the overall activity of the population of dopamine cells without impacting burst firing of these neurons.  In comparison, Christoph et al. (1986) discovered that single-pulse stimulation of the LHb inhibits nearly all dopamine neurons.  Matsumoto and Hikosaka (2007) demonstrated that LHb neurons use this mechanism to encode a nRPE signal.  In opposition to dopamine neurons, these cells are inhibited by rewarding stimuli and excited by reward omission or punishment.  The LHb inhibits dopamine neurons by way of the GABAergic RMTg (Jhou, Fields, et al., 2009; Ji & Shepard, 2007) which also encodes a nRPE (Hong et al., 2011).  The data from Chapter 3 show that inactivation of the LHb causes a pronounced disruption in decision-making (Stopper & Floresco, 2014).  This effect is mediated by way of the RMTg, suggesting that the fundamental role of the LHb in decision-making is dependent on downstream impact on dopamine neurons.  A parsimonious expectation was that LHb inactivation would increase risky choice by disinhibiting dopamine, since systemic stimulation of dopamine activity, via amphetamine or dopamine agonists, increases risky choice (St. Onge & Floresco, 2009).  Instead, LHb inactivation caused animals to make random and erratic choices, with average choice across the group defaulting to indifference.  Though LHb inactivation was almost certainly influencing tonic and phasic dopamine transmission, previous findings using microdialysis and electrophysiology suggest that the unique role for the LHb in risk-based decision-making is primarily reliant on phasic signals.  Lecourtier et al. (2008) used microdialysis to determine the impact of the LHb on tonic dopamine transmission in important efferents.  Inactivation of the LHb with an AMPA antagonist resulted in a sustained increase of dopamine in the NAc.  However, the effect on 141  decision-making observed in Chapter 3 is unlikely to have resulted from a general increase in tonic NAc dopamine.  The data from Chapter 2 (Stopper et al., 2013) suggest that a general increase in NAc dopamine would have either no effect or a fundamentally different effect.  D1 stimulation in the NAc actually optimized probabilistic discounting, and D2 stimulation had no effect.  The fact that LHb inactivation did not produce either of these effects, but instead caused a complete disruption of choice biases, suggests that the contribution of this nucleus to decision-making goes beyond regulation of tonic dopamine levels and may be more related to its powerful influence over phasic dopamine signaling.  5.2.1 The LHb and Multiple Phasic Signals  LHb neurons burst fire in response to reward omission, causing a phasic dip in dopamine neurons (Matsumoto & Hikosaka, 2007).  However, LHb neurons also display phasic dips themselves to unexpected reward and reward-related cues (Matsumoto & Hikosaka, 2007).  This disinhibitory signal likely allows dopamine neurons to be excited by other afferents that promote bursting (Chergui et al., 1994; Floresco et al., 2003; Lodge & Grace, 2006a; Lodge & Grace, 2006b; Lokwan et al., 1999; Overton et al., 1996; Tong et al., 1996).  LHb inactivation suppresses all types of phasic signals.  With activity completely suppressed, LHb bursts and dips are both abolished.  If one considers that inactivating the LHb suppresses different, and often opposing, phasic signals, and assumes that phasic dopamine signaling plays a key role in facilitating relative value judgments, it is logical that choice would become so erratic and indifferent.  With no phasic signals to follow, animals would not be able to assign value to outcomes.  They would therefore behave as if the two options were of arbitrary relative value.  142  With this in mind, the experiments of Chapter 4 sought to understand the importance of each of the phasic signals individually during probabilistic choice.  5.3 Phasic Dopamine in Risk-Based Decision-Making Sugam et al. (2012) characterized the various phasic dopamine responses occurring during risk-based decision-making.  Animals selected between a small-certain and large-uncertain reward of equal expected utility.  Responses to reward outcomes followed an RPE model.  Phasic dopamine dipped in response to reward omission but showed bursts when reward was delivered.  These bursts were greater if the reward was larger and unexpected.  On forced choices, the cue indicating the beginning of a trial elicited a larger dopamine response when the preferred reward was cued.  Cues preceding free-choice trials elicited the same response as for forced preferred choice, and this phasic response did not influence the subsequent choice made by the animal.  Thus, pre-choice phasic dopamine bursts signal the availability of reward, particularly if it is preferred. The results of Chapter 4 demonstrate that all the different major phasic signals are important in influencing choice biases.  Electrical stimulation was used to override these phasic bursts or dips.  Brief high-frequency stimulation of the LHb attenuated or inhibited firing of VTA dopamine neurons evoked by mPFC or PPTg stimulation.  Thus, LHb stimulation was used to attenuate phasic dopamine bursts during probabilistic choice.  Likewise, VTA stimulation was used to substitute naturally-occurring phasic dips with bursts.  Stimulation of the VTA after a risky loss increased risky choice by decreasing sensitivity to negative outcomes.  Different types of phasic bursts also proved to be important, as evidenced by the effect of LHb stimulation.  Stimulation after a risky win decreased risky choice.  Stimulation of the RMTg produced the 143  same effect, suggesting that the effects of LHb stimulation may be at least partially mediated by disynaptic inhibition of midbrain targets that include VTA dopamine neurons.  Though likely not as pronounced, one would expect a small dopamine burst to accompany receipt of a small-certain reward (Sugam et al., 2012).  LHb stimulation after a small reward drove choice away from the small-certain option, increasing risky choice.  Phasic dopamine increases also occur at the beginning of choice trials (Sugam et al., 2012).  Attenuating this signal decreased risky choice, particularly when the large reward was more advantageous.  Although the magnitude of pre-choice phasic dopamine responses does not correlate with choice (Sugam et al., 2012), the findings of Chapter 4 show that this signal is still quite important for making these types of choices.  Stimulation during the inter-trial interval was without effect, highlighting that these manipulations must accompany brief and temporally-precise phasic events to influence choice.  Uncertainty is a driving force behind these effects; LHb stimulation accompanying large-certain rewards does not influence choice.  These findings demonstrate the importance of a variety of phasic dopamine signals for risk-based decision-making.  Though its contribution is different, the tonic signal also provides important information necessary for these decisions.  5.4 Tonic Dopamine in Risk-Based Decision-Making St. Onge, Ahn, et al. (2012) used microdialysis to measure changes in tonic dopamine during probabilistic discounting.  In the PFC, tonic dopamine concentration was correlated with the amount of food animals received.  PFC dopamine concentrations were highest during blocks when there was a higher probability of the larger reward.  However, in a yoked task, during which animals received equivalent reward amounts without any decision-making, equivalent changes in PFC dopamine were observed.  Tonic dopamine in the NAc encoded for more 144  complex information.  NAc dopamine was also correlated with food consumption.  However, dopamine concentration during decision-making was elevated compared to the yoked condition during periods of the task when likelihood of obtaining the large reward after a choice was uncertain.  During the decision-making task, tonic dopamine concentration increased during free-choice trials compared to forced-choice trials.  Thus, tonic NAc dopamine integrates information regarding reward, uncertainty, and choice during risk-based decision-making. Since both tonic and phasic signals convey important information, the question remains as to why this system is designed to have these two modes of transmission.  Phasic dopamine allows for information gathered from a single outcome to rapidly update an animal’s framework for subsequent decisions.  Tonic dopamine allows for animals to register more slowly changing contextual information such as average reward rates and availability of free choices.  This complementary system may be viewed as a system of checks and balances.  Phasic dopamine allows for the choice model to be updated in real time, accounting for individual outcomes to signal changes in reward contingencies.  Tonic dopamine ensures that individual outcomes are not overemphasized, so average expected utility can be used to rationally guide choices.  The combination of both systems allows animals to be adaptive, but not short-sighted, during goal-directed action.  5.5 Experimental Strengths By actively manipulating brain activity, the experiments of this thesis allow for a more precise understanding of how the dopamine system impacts decision-making.  Early experiments that demonstrated the role of dopamine in RPE and uncertainty used Pavlovian conditioning methods  with animals passively receiving reward (Fiorillo et al., 2003; Schultz et al., 1997; 145  Tobler et al., 2005).  Initial studies by St. Onge and Floresco (2009) demonstrated that manipulations of systemic dopamine impact probabilistic choice.  However, these manipulations were anatomically non-selective and influenced both tonic and phasic transmission.  Tonic and phasic dopamine transmission have been shown to track aspects of risk-based decision-making, but these investigations have used purely correlative microdialysis and voltammetry techniques (St. Onge, Ahn, et al., 2012; Sugam et al., 2012).  The experiments of Chapter 2 provide the first evidence that selective manipulations of dopamine in the NAc, where phasic/tonic compartmentalization exists, influence choice under uncertainty.  The LHb has a powerful inhibitory influence over the VTA (Christoph et al., 1986; Ji & Shepard, 2007), conveying an important nRPE signal (Matsumoto & Hikosaka, 2007, 2009) that promotes phasic dips in dopamine neuron firing.  Numerous studies have determined the mechanism of this influence over dopamine (Hong et al., 2011; Jhou, Geisler, et al., 2009; Ji & Shepard, 2007) and how dopamine target regions are differentially influenced by LHb manipulations (Lammel et al., 2012; Lecourtier et al., 2008).  While much has become known about its physiology, an understanding of its role in behavior has been largely correlative.  Only a few studies have used inactivation or lesion of the LHb to examine its default contribution to behavior (Lecourtier & Kelly, 2005; Lecourtier et al., 2004).   The experiments of Chapter 3 demonstrate, via active manipulations, that the LHb is necessary for decision-making.  These studies are further strengthened by crucial control experiments that demonstrate that these effects are anatomically-precise.  Its identical influence on delay discounting indicates that the role of the LHb extends across different forms of decision-making involving subjective value.  However, the LHb is not needed for simple goal-directed action.  This study reveals a very 146  unique role for the LHb.  While it unnecessary for simple choice, it is absolutely imperative when subjective costs are added.  It has been known for some time that single-pulse LHb stimulation causes a brief, but complete, cessation of spontaneously-active dopamine neurons (Christoph et al., 1986; Ji & Shepard, 2007).  Moreover, LHb neurons naturally fire in high frequency bursts in response to punishment or reward omission (Matsumoto & Hikosaka, 2007), which in turn is thought to suppress dopamine neuron activity.  The experiments in Chapter 4 were the first to test how activation of this pathway may affect volitional, reward-related choice behavior.  Precisely-timed, brief high frequency stimulation of the LHb was able to attenuate, and in some cases, entirely inhibit phasic dopamine bursts.  This experiment not only advances the understanding of the physiological role of LHb neurons but also introduces perhaps the first technique to selectively inhibit individual phasic dopamine bursts.  Recent studies have demonstrated that phasic dopamine tracks important aspects of reward outcomes and predictive cues during risk-based decision-making (Sugam et al., 2012).  In the experiments of Chapter 4, stimulation coinciding with rewarded and unrewarded outcomes verify that the RPE signal observed by Sugam et al. (2012) is important for guiding subsequent choices.  Interestingly, manipulating the phasic signal occurring immediately before choices revealed an effect not discernible from the correlative experiments of Sugam et al. (2012).  Sugam et al. (2012)  found that pre-trial phasic responses did not correlate with the actual choice made.  However, I found that pre-choice stimulation had a powerful influence on choice.  This finding is very important, showing that changes in dopamine observed with a correlative method like voltammetry do not map perfectly onto decision-making behavior.  One must actively manipulate these signals to gain an accurate and complete understanding of them. 147  5.6 Limitations  Risk-based decision-making is a complex construct (Llewellyn, 2008), and it is therefore impossible to accurately model all aspects of risk with a single task.  We used the probabilistic discounting task for a variety of practical reasons.  Risk-based decision-making can involve either uncertain (reward probabilities are explicit) or ambiguous (probabilities are unknown) choices (Llewellyn, 2008).  Extensive training and forced-choice trials make probabilities explicit and well-understood during free-choice trials of the probabilistic discounting task.  The aim of this thesis was to specifically understand the role of phasic dopamine in risk-based decision-making.  Dopamine is also critical for learning and novelty-seeking (Salamone & Correa, 2012), so to achieve this aim it was important to use a task that removed these confounds to isolate choice behavior.  The probabilistic discounting task requires rather straightforward decisions between two options, with the probability of one reward changing in a reliable manner.  This design was chosen to allow for the most simple binary choices between a small-certain and large-uncertain reward.  Other tasks, like the rGT involve choices between a greater number of options, each differing in reward magnitude, probability, and punishment.  These types of decisions may be more similar to many types of realistic decisions; however, the aim of this thesis was not to model particular disorders like pathological gambling.  Ideally, these studies would have used a task with randomized probability shifts, but previous experiments have shown that animals do not learn randomized variants of the discounting task as well (St. Onge et al., 2010).  Another potential limitation of the probabilistic discounting task is that, unlike many real-world decisions, this task does not incorporate any distinct punishments.  Other groups have attempted to incorporate punishment into their risky decision-making tasks.  Some have combined an 148  electrical shock with risky wins during probabilistic discounting (Simon, Gilbert, Mayse, Bizon, & Setlow, 2009).  The rGT associates a longer ‘time-out’ punishment with riskier losses.   Selection of the probabilistic discounting task clearly is associated with a variety of shortcomings that do not allow for a complete analysis of risk-based decision-making.  Other tasks would assess more realistic types of risky decisions that involve ambiguity, randomness, impulsivity, and a greater number of options.  However, this straightforward task allows for simple choices that involve reward uncertainty.  Changes in reward probability require animals to consider changes in expected utility and update their value comparison between rewards.  The task allows animals to engage in decisions involving uncertainty, predictions, winning, and losing – all aspects important for assessing the role of phasic dopamine in risk-based decision-making.   5.7 Future Directions & Applications Since the phasic/tonic dopamine distinction was first described (Grace, 1991) its purpose was to understand the pathophysiology of schizophrenia (Grace, 1991, 1993) and substance abuse (Grace, 1995, 2000).  While these models do a thorough job of understanding aberrant tonic/phasic functioning in these disorders, the ability of tonic/phasic manipulation to bias behavior has only been tested in a few instances (Grieder et al., 2012; Parker et al., 2010; Zweifel et al., 2009).  While the probabilistic discounting task does not model any particular disorder, it tests a behavior that is an important aspect of a variety of disorders, including schizophrenia, substance abuse, and pathological gambling.  While methods to manipulate specific phasic signals in humans are likely years away from use in the clinic, these experiments provide an important proof-of-concept that targeted phasic manipulations can alter cognitive 149  performance.  In the future, it may be possible to pair targeted phasic manipulations with drug- or gambling-related cues to bias decision-making away from risky detrimental choices.  In anticipation of these potential clinical applications, pre-clinical animal studies may be used to examine if manipulation of these signals alters gambling or substance abuse behaviors in rats using similar electrical or optogenetic stimulation methods.  A logical experiment would be to pair phasic manipulations with similar cues and outcomes during the rGT.  Similar manipulations could be used on rats performing drug self-administration procedures.  Phasic manipulations could be paired with drug-related cues during drug-seeking, reinstatement, and extinction procedures.  5.8 Conclusions The research of this thesis contributes to the circuit-based approach of understanding the neuroscience of risk-based decision-making.  Specifically, these experiments provide important context for the actions of dopamine in the cortico-limbic-striatal circuit in modulating probabilistic choice.  The NAc, which provides information about relative reward value, relies on various dopamine receptors differently, suggesting unique contributions for tonic and phasic dopamine.  The LHb, as an influential modulator of both tonic and phasic dopamine, is crucial for risk-based decision-making.  This critical contribution of the LHb and dopamine is driven by various temporally-precise phasic events accompanying cues and outcomes. This thesis represents the first proof that the phasic dopamine signal exerts a direct influence over decision-making involving uncertainty.  The field continues to investigate the intricacies and nuances of phasic dopamine reward prediction error, demonstrating that this signal, once believed to be rather simplistic, is implicated in model-based learning, subjective 150  value, and neuronal plasticity (Schultz, 2013).  As we refine our view of the capabilities of the phasic dopamine prediction error signal, it is equally important that we explore ways that manipulation of this signal can be used to treat a range of behavioral pathologies.  The current analysis in this thesis aims to pave the way for future investigations examining the clinical effectiveness of using targeted phasic dopamine manipulations to treat disorders including substance abuse and schizophrenia. 151  References Aberman, J. E., Ward, S. J., & Salamone, J. D. (1998). Effects of dopamine antagonists and accumbens dopamine depletions on time-constrained progressive-ratio performance. Pharmacology, Biochemistry, and Behavior, 61(4), 341–8. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/9802826 Ahlskog, J. E. (2011). Pathological behaviors provoked by dopamine agonist therapy of Parkinson’s disease. Physiology & Behavior, 104(1), 168–72. doi:10.1016/j.physbeh.2011.04.055 Allen, T. A., Narayanan, N. S., Kholodar-Smith, D. B., Zhao, Y., Laubach, M., & Brown, T. H. (2008). Imaging the spread of reversible brain inactivations using fluorescent muscimol. Journal of Neuroscience Methods, 171(1), 30–8. doi:10.1016/j.jneumeth.2008.01.033 Andrade, L. F., & Petry, N. M. (2012). Delay and probability discounting in pathological gamblers with and without a history of substance use problems. Psychopharmacology, 219(2), 491–9. doi:10.1007/s00213-011-2508-9 Arikan, R., Blake, N. M. J., Erinjeri, J. P., Woolsey, T. A., Giraud, L., & Highstein, S. M. (2002). A method to measure the effective spread of focally injected muscimol into the central nervous system with electrophysiology and light microscopy. Journal of Neuroscience Methods, 118(1), 51–7. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/12191757 Bachtell, R. K., Whisler, K., Karanian, D., & Self, D. W. (2005). Effects of intra-nucleus accumbens shell administration of dopamine agonists and antagonists on cocaine-taking and cocaine-seeking behaviors in the rat. Psychopharmacology, 183(1), 41–53. doi:10.1007/s00213-005-0133-1 Bari, A. A., & Pierce, R. C. (2005). D1-like and D2 dopamine receptor antagonists administered into the shell subregion of the rat nucleus accumbens decrease cocaine, but not food, reinforcement. Neuroscience, 135(3), 959–68. doi:10.1016/j.neuroscience.2005.06.048 Bari, A., Eagle, D. M., Mar, A. C., Robinson, E. S. J., & Robbins, T. W. (2009). Dissociable effects of noradrenaline, dopamine, and serotonin uptake blockade on stop task performance in rats. Psychopharmacology, 205(2), 273–83. doi:10.1007/s00213-009-1537-0 Bari, A., & Robbins, T. W. (2013). Inhibition and impulsivity: behavioral and neural basis of response control. Progress in Neurobiology, 108, 44–79. doi:10.1016/j.pneurobio.2013.06.005 Barrot, M., Sesack, S. R., Georges, F., Pistis, M., Hong, S., & Jhou, T. C. (2012). Braking dopamine systems: a new GABA master structure for mesolimbic and nigrostriatal 152  functions. Journal of Neuroscience, 32(41), 14094–101. doi:10.1523/JNEUROSCI.3370-12.2012 Bechara, A. (1997). Deciding Advantageously Before Knowing the Advantageous Strategy. Science, 275(5304), 1293–1295. doi:10.1126/science.275.5304.1293 Bechara, A., Damasio, A. R., Damasio, H., & Anderson, S. W. (1994). Insensitivity to future consequences following damage to human prefrontal cortex. Cognition, 50(1-3), 7–15. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/8039375 Bergh, C., Eklund, T., Södersten, P., & Nordin, C. (1997). Altered dopamine function in pathological gambling. Psychological Medicine, 27(2), 473–475. doi:10.1017/S0033291796003789 Berglind, W. J., Case, J. M., Parker, M. P., Fuchs, R. A., & See, R. E. (2006). Dopamine D1 or D2 receptor antagonism within the basolateral amygdala differentially alters the acquisition of cocaine-cue associations necessary for cue-induced reinstatement of cocaine-seeking. Neuroscience, 137(2), 699–706. doi:10.1016/j.neuroscience.2005.08.064 Besson, M., Belin, D., McNamara, R., Theobald, D. E., Castel, A., Beckett, V. L., … Dalley, J. W. (2010). Dissociable control of impulsivity in rats by dopamine d2/3 receptors in the core and shell subregions of the nucleus accumbens. Neuropsychopharmacology, 35(2), 560–9. doi:10.1038/npp.2009.162 Blair, K., Marsh, A. A., Morton, J., Vythilingam, M., Jones, M., Mondillo, K., … Blair, J. R. (2006). Choosing the lesser of two evils, the better of two goods: specifying the roles of ventromedial prefrontal cortex and dorsal anterior cingulate in object choice. Journal of Neuroscience, 26(44), 11379–86. doi:10.1523/JNEUROSCI.1640-06.2006 Bourdy, R., & Barrot, M. (2012). A new control center for dopaminergic systems: pulling the VTA by the tail. Trends in Neurosciences, 35(11), 681–90. doi:10.1016/j.tins.2012.06.007 Bouthenet, M. L., Souil, E., Martres, M. P., Sokoloff, P., Giros, B., & Schwartz, J. C. (1991). Localization of dopamine D3 receptor mRNA in the rat brain using in situ hybridization histochemistry: comparison with dopamine D2 receptor mRNA. Brain Research, 564(2), 203–19. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/1839781 Bristow, L. J., Cook, G. P., Gay, J. C., Kulagowski, J. J., Landon, L., Murray, F., … Hutson, P. H. (1996). The behavioural and neurochemical profile of the putative dopamine D3 receptor agonist, (+)-PD 128907, in the rat. Neuropharmacology, 35(3), 285–94. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/8783203 Bromberg-Martin, E. S., & Hikosaka, O. (2011). Lateral habenula neurons signal errors in the prediction of reward information. Nature Neuroscience, 14(9), 1209–16. doi:10.1038/nn.2902 153  Bromberg-Martin, E. S., Matsumoto, M., & Hikosaka, O. (2010). Dopamine in motivational control: rewarding, aversive, and alerting. Neuron, 68(5), 815–34. doi:10.1016/j.neuron.2010.11.022 Bromberg-Martin, E. S., Matsumoto, M., Hong, S., & Hikosaka, O. (2010). A Pallidus-Habenula-Dopamine Pathway Signals Inferred Stimulus Values. Journal of Neurophysiology, 1, 1068–1076. doi:10.1152/jn.00158.2010. Buelow, M. T., & Suhr, J. A. (2009). Construct validity of the Iowa Gambling Task. Neuropsychology Review, 19(1), 102–14. doi:10.1007/s11065-009-9083-4 Bunney, B. S., & Grace, A. A. (1978). Acute and chronic haloperidol treatment: comparison of effects on nigral dopaminergic cell activity. Life Sciences, 23, 1715–1728. Cardinal, R. N., & Howes, N. J. (2005). Effects of lesions of the nucleus accumbens core on choice between small certain rewards and large uncertain rewards in rats. BMC Neuroscience, 6, 37. doi:10.1186/1471-2202-6-37 Cardinal, R. N., Robbins, T. W., & Everitt, B. J. (2000). The effects of d -amphetamine, chlordiazepoxide, α-flupenthixol and behavioural manipulations on choice of signalled and unsignalled delayed reinforcement in rats. Psychopharmacology, 152(4), 362–375. doi:10.1007/s002130000536 Carlsson, A., Lindqvist, M., Magnusson, T., & Waldeck, B. (1958). On the presence of 3-hydroxytyramine in brain. Science, 127(3296), 471. Chen, G., Kittler, J. T., Moss, S. J., & Yan, Z. (2006). Dopamine D3 receptors regulate GABAA receptor function through a phospho-dependent endocytosis mechanism in nucleus accumbens. Journal of Neuroscience, 26(9), 2513–21. doi:10.1523/JNEUROSCI.4712-05.2006 Chergui, K., Akaoka, H., Charlety, P. J., Saunier, C. F., Buda, M., & Chouvet, G. (1994). Subthalamic nucleus modulates burst firing of nigral dopamine neurones via NMDA receptors. Neuroreport, 5(10), 1185–1188. Christoph, G. R., Leonzio, R. J., & Wilcox, K. S. (1986). Stimulation of the lateral habenula inhibits dopamine-containing neurons in the substantia nigra and ventral tegmental area of the rat. Journal of Neuroscience, 6(3), 613–9. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/3958786 Clark, L., Bechara, A., Damasio, H., Aitken, M. R. F., Sahakian, B. J., & Robbins, T. W. (2008). Differential effects of insular and ventromedial prefrontal cortex lesions on risky decision-making. Brain, 131(Pt 5), 1311–22. doi:10.1093/brain/awn066 154  Clark, L., Manes, F., Antoun, N., Sahakian, B. J., & Robbins, T. W. (2003). The contributions of lesion laterality and lesion volume to decision-making impairment following frontal lobe damage. Neuropsychologia, 41(11), 1474–1483. doi:10.1016/S0028-3932(03)00081-2 Cools, R., Lewis, S. J. G., Clark, L., Barker, R. A., & Robbins, T. W. (2007). L-DOPA disrupts activity in the nucleus accumbens during reversal learning in Parkinson’s disease. Neuropsychopharmacology, 32(1), 180–9. doi:10.1038/sj.npp.1301153 Cooper, J. C., Hollon, N. G., Wimmer, G. E., & Knutson, B. (2009). Available alternative incentives modulate anticipatory nucleus accumbens activation. Social Cognitive and Affective Neuroscience, 4(4), 409–16. doi:10.1093/scan/nsp031 Cousins, M. S., Atherton, A., Turner, L., & Salamone, J. D. (1996). Nucleus accumbens dopamine depletions alter relative response allocation in a T-maze cost/benefit task. Behavioural Brain Research, 74(1-2), 189–197. doi:10.1016/0166-4328(95)00151-4 Cousins, M. S., & Salamone, J. D. (1994). Nucleus accumbens dopamine depletions in rats affect relative response allocation in a novel cost/benefit procedure. Pharmacology, Biochemistry, and Behavior, 49(1), 85–91. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/7816895 Cousins, M. S., Wei, W., & Salamone, J. D. (1994). Pharmacological characterization of performance on a concurrent lever pressing/feeding choice procedure: effects of dopamine antagonist, cholinomimetic, sedative and stimulant drugs. Psychopharmacology, 116(4), 529–537. doi:10.1007/BF02247489 Dang, D., Cunnington, D., & Swieca, J. (2011). The emergence of devastating impulse control disorders during dopamine agonist therapy of the restless legs syndrome. Clinical Neuropharmacology, 34(2), 66–70. doi:10.1097/WNF.0b013e31820d6699 Day, J. J., Jones, J. L., Wightman, R. M., & Carelli, R. M. (2010). Phasic nucleus accumbens dopamine release encodes effort- and delay-related costs. Biological Psychiatry, 68(3), 306–9. doi:10.1016/j.biopsych.2010.03.026 Evenden, J. L., & Ryan, C. N. (1996). The pharmacology of impulsive behaviour in rats: the effects of drugs on response choice with varying delays of reinforcement. Psychopharmacology, 128(2), 161–70. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/8956377 Fellows, L. K. (2007). Advances in understanding ventromedial prefrontal function: the accountant joins the executive. Neurology, 68(13), 991–5. doi:10.1212/01.wnl.0000257835.46290.57 Fellows, L. K., & Farah, M. J. (2005). Different underlying impairments in decision-making following ventromedial and dorsolateral frontal lobe damage in humans. Cerebral Cortex, 15(1), 58–63. doi:10.1093/cercor/bhh108 155  Fiorillo, C. D., Newsome, W. T., & Schultz, W. (2008). The temporal precision of reward prediction in dopamine neurons. Nature Neuroscience, 11(8), 966–973. doi:10.1038/nn.2159 Fiorillo, C. D., Tobler, P. N., & Schultz, W. (2003). Discrete coding of reward probability and uncertainty by dopamine neurons. Science, 299(5614), 1898–902. doi:10.1126/science.1077349 Floresco, S. B. (2007). Dopaminergic regulation of limbic-striatal interplay. Journal of Psychiatry & Neuroscience, 32(6), 400–411. Floresco, S. B. (2013). Prefrontal dopamine and behavioral flexibility: shifting from an “inverted-U” toward a family of functions. Frontiers in Neuroscience, 7(April), 62. doi:10.3389/fnins.2013.00062 Floresco, S. B., Blaha, C. D., Yang, C. R., & Phillips, A. G. (2001). Dopamine D1 and NMDA receptors mediate potentiation of basolateral amygdala-evoked firing of nucleus accumbens neurons. Journal of Neuroscience, 21(16), 6370–6. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/11487660 Floresco, S. B., McLaughlin, R. J., & Haluk, D. M. (2008). Opposing roles for the nucleus accumbens core and shell in cue-induced reinstatement of food-seeking behavior. Neuroscience, 154(3), 877–84. doi:10.1016/j.neuroscience.2008.04.004 Floresco, S. B., & Phillips, A. G. (1999). Dopamine and hippocampal input to the nucleus accumbens play an essential role in the search for food in an unpredictable environment. Psychobiology, 27, 227–286. Floresco, S. B., St. Onge, J. R., Ghods-Sharifi, S., & Winstanley, C. A. (2008). Cortico-limbic-striatal circuits subserving different forms of cost-benefit decision making. Cognitive, Affective & Behavioral Neuroscience, 8(4), 375–89. doi:10.3758/CABN.8.4.375 Floresco, S. B., Tse, M. T. L., & Ghods-Sharifi, S. (2008). Dopaminergic and glutamatergic regulation of effort- and delay-based decision making. Neuropsychopharmacology, 33(8), 1966–79. doi:10.1038/sj.npp.1301565 Floresco, S. B., West, A. R., Ash, B., Moore, H., & Grace, A. A. (2003). Afferent modulation of dopamine neuron firing differentially regulates tonic and phasic dopamine transmission. Nature Neuroscience, 6(9), 968–73. doi:10.1038/nn1103 Floresco, S. B., & Whelan, J. M. (2009). Perturbations in different forms of cost/benefit decision making induced by repeated amphetamine exposure. Psychopharmacology, 205(2), 189–201. doi:10.1007/s00213-009-1529-0 Gallagher, D. A., O’Sullivan, S. S., Evans, A. H., Lees, A. J., & Schrag, A. (2007). Pathological gambling in Parkinson’s disease: risk factors and differences from dopamine dysregulation. 156  An analysis of published case series. Movement Disorders, 22(12), 1757–63. doi:10.1002/mds.21611 Gan, J. O., Walton, M. E., & Phillips, P. E. M. (2010). Dissociable cost and benefit encoding of future rewards by mesolimbic dopamine. Nature Neuroscience, 13(1), 25–7. doi:10.1038/nn.2460 Ghods-Sharifi, S., St Onge, J. R., & Floresco, S. B. (2009). Fundamental contribution by the basolateral amygdala to different forms of decision making. Journal of Neuroscience, 29(16), 5251–9. doi:10.1523/JNEUROSCI.0315-09.2009 Gingrich, B., Liu, Y., Cascio, C., Wang, Z., & Insel, T. R. (2000). Dopamine D2 receptors in the nucleus accumbens are important for social attachment in female prairie voles (Microtus ochrogaster). Behavioral Neuroscience, 114(1), 173–183. doi:10.1037//0735-7044.114.1.173 Glimcher, P. W. (2011). Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis. Proceedings of the National Academy of Sciences of the United States of America, 108(3), 15647–15654. doi:10.1073/pnas.1115170108 Glimcher, P. W., & Rustichini, A. (2004). Neuroeconomics: the consilience of brain and decision. Science, 306(5695), 447–52. doi:10.1126/science.1102566 Goto, Y., & Grace, A. A. (2005a). Dopamine-dependent interactions between limbic and prefrontal cortical plasticity in the nucleus accumbens: disruption by cocaine sensitization. Neuron, 47(2), 255–66. doi:10.1016/j.neuron.2005.06.017 Goto, Y., & Grace, A. A. (2005b). Dopaminergic modulation of limbic and cortical drive of nucleus accumbens in goal-directed behavior. Nature Neuroscience, 8(6), 805–12. doi:10.1038/nn1471 Grace, A. A. (1991). Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: a hypothesis for the etiology of schizophrenia. Neuroscience, 41(1), 1–24. Grace, A. A. (1993). Cortical regulation of subcortical dopamine systems and its possible relevance to schizophrenia. Journal of Neural Transmission, 91, 111–134. Grace, A. A. (1995). The tonic/phasic model of dopamine system regulation: its relevance for understanding how stimulant abuse can alter basal ganglia function. Drug and Alcohol Dependence, 37(2), 111–29. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/7758401 Grace, A. A. (2000). The tonic/phasic model of dopamine system regulation and its implications for understanding alcohol and psychostimulant craving. Addiction, 95 Suppl 2(January), S119–28. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/11002907 157  Grace, A. A., & Bunney, B. S. (1983). Intracellular and extracellular electrophysiology of nigral dopaminergic neurons--1. Identification and characterization. Neuroscience, 10(2), 301–315. Grace, A. A., & Bunney, B. S. (1984). The control of firing pattern in nigral dopamine neurons: burst firing. Journal of Neuroscience, 4(11), 2877–2890. Grace, A. A., & Bunney, B. S. (1984). The control of firing pattern in nigral dopamine neurons: single spike firing. Journal of Neuroscience, 4(11), 2866–2876. Grace, A. A., Floresco, S. B., Goto, Y., & Lodge, D. J. (2007). Regulation of firing of dopaminergic neurons and control of goal-directed behaviors. Trends in Neurosciences, 30(5), 220–7. doi:10.1016/j.tins.2007.03.003 Grace, A. A., & Rosenkranz, J. A. (2002). Regulation of conditioned responses of basolateral amygdala neurons. Physiology & Behavior, 77(4-5), 489–93. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/12526988 Grieder, T. E., George, O., Tan, H., George, S. R., Le Foll, B., Laviolette, S. R., & van der Kooy, D. (2012). Phasic D1 and tonic D2 dopamine receptor signaling double dissociate the motivational effects of acute nicotine and chronic nicotine withdrawal. Proceedings of the National Academy of Sciences of the United States of America, 109(8), 3101–6. doi:10.1073/pnas.1114422109 Haluk, D. M., & Floresco, S. B. (2009). Ventral Striatal Dopamine Modulation of Different Forms of Behavioral Flexibility. Neuropsychopharmacology, 34(8), 2041–2052. doi:10.1038/npp.2009.21 Harvey, J., & Lacey, M. G. (1997). A postsynaptic interaction between dopamine D1 and NMDA receptors promotes presynaptic inhibition in the rat nucleus accumbens via adenosine release. Journal of Neuroscience, 17(14), 5271–80. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/9204911 Heimer, L., Alheid, G. F., de Olmos, J. S., Groenewegen, H. J., Haber, S. N., Harlan, R. E., & Zahm, D. S. (1997). The accumbens: beyond the core-shell dichotomy. Journal of Neuropsychiatry and Clinical Neurosciences, 9(3), 354–81. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/9276840 Hikosaka, O., Sesack, S. R., Lecourtier, L., & Shepard, P. D. (2008). Habenula: crossroad between the basal ganglia and the limbic system. Journal of Neuroscience, 28(46), 11825–9. doi:10.1523/JNEUROSCI.3463-08.2008 Hong, S., Jhou, T. C., Smith, M., Saleem, K. S., & Hikosaka, O. (2011). Negative reward signals from the lateral habenula to dopamine neurons are mediated by rostromedial tegmental nucleus in primates. Journal of Neuroscience, 31(32), 11457–71. doi:10.1523/JNEUROSCI.1384-11.2011 158  Hoover, W. B., & Vertes, R. P. (2011). Projections of the medial orbital and ventral orbital cortex in the rat. Journal of Comparative Neurology, 519(18), 3766–801. doi:10.1002/cne.22733 Jenkins, O. F., & Jackson, D. M. (1986). Bromocriptine enhances the behavioural effects of apomorphine and dopamine after systemic or intracerebral injection in rats. Neuropharmacology, 25(11), 1243–9. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/3796797 Jhou, T. C., Fields, H. L., Baxter, M. G., Saper, C. B., & Holland, P. C. (2009). The rostromedial tegmental nucleus (RMTg), a GABAergic afferent to midbrain dopamine neurons, encodes aversive stimuli and inhibits motor responses. Neuron, 61(5), 786–800. doi:10.1016/j.neuron.2009.02.001 Jhou, T. C., Geisler, S., Marinelli, M., Degarmo, B. A., & Zahm, D. S. (2009). The mesopontine rostromedial tegmental nucleus: A structure targeted by the lateral habenula that projects to the ventral tegmental area of Tsai and substantia nigra compacta. Journal of Comparative Neurology, 513(6), 566–96. doi:10.1002/cne.21891 Ji, H., & Shepard, P. D. (2007). Lateral habenula stimulation inhibits rat midbrain dopamine neurons through a GABA(A) receptor-mediated mechanism. Journal of Neuroscience, 27(26), 6923–30. doi:10.1523/JNEUROSCI.0958-07.2007 Kaufling, J., Veinante, P., Pawlowski, S. A., Freund-Mercier, M.-J., & Barrot, M. (2009). Afferents to the GABAergic tail of the ventral tegmental area in the rat. Journal of Comparative Neurology, 513(6), 597–621. doi:10.1002/cne.21983 Kheramin, S., Body, S., Ho, M.-Y., Velázquez-Martinez, D. N., Bradshaw, C. M., Szabadi, E., … Anderson, I. M. (2004). Effects of orbital prefrontal cortex dopamine depletion on inter-temporal choice: a quantitative analysis. Psychopharmacology, 175(2), 206–14. doi:10.1007/s00213-004-1813-y Kiss, J., Csáki, A., Bokor, H., Kocsis, K., & Kocsis, B. (2002). Possible glutamatergic/aspartatergic projections to the supramammillary nucleus and their origins in the rat studied by selective [(3)H]D-aspartate labelling and immunocytochemistry. Neuroscience, 111(3), 671–91. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/12031353 Kita, H., & Kitai, S. T. (1990). Amygdaloid projections to the frontal cortex and the striatum in the rat. Journal of Comparative Neurology, 298(1), 40–49. Knutson, B., Adams, C. M., Fong, G. W., & Hommer, D. (2001). Anticipation of increasing monetary reward selectively recruits nucleus accumbens. Journal of Neuroscience, 21(16), RC159. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/11459880 159  Knutson, B., Fong, G. W., Adams, C. M., Varner, J. L., & Hommer, D. (2001). Dissociation of reward anticipation and outcome with event-related fMRI. Neuroreport, 12(17), 3683–7. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/11726774 Knutson, B., Rick, S., Wimmer, G. E., Prelec, D., & Loewenstein, G. (2007). Neural predictors of purchases. Neuron, 53(1), 147–56. doi:10.1016/j.neuron.2006.11.010 Kobayashi, S., & Schultz, W. (2008). Influence of reward delays on responses of dopamine neurons. Journal of Neuroscience, 28(31), 7837–46. doi:10.1523/JNEUROSCI.1600-08.2008 Kuhnen, C. M., & Knutson, B. (2005). The neural basis of financial risk taking. Neuron, 47(5), 763–70. doi:10.1016/j.neuron.2005.08.008 Lader, M. (2008). Antiparkinsonian medication and pathological gambling. CNS Drugs, 22(5), 407–16. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/18399709 Lammel, S., Lim, B. K., Ran, C., Huang, K. W., Betley, M. J., Tye, K. M., … Malenka, R. C. (2012). Input-specific control of reward and aversion in the ventral tegmental area. Nature, 491(7423), 212–7. doi:10.1038/nature11527 Lecourtier, L., Defrancesco, A., & Moghaddam, B. (2008). Differential tonic influence of lateral habenula on prefrontal cortex and nucleus accumbens dopamine release. European Journal of Neuroscience, 27(7), 1755–62. doi:10.1111/j.1460-9568.2008.06130.x Lecourtier, L., & Kelly, P. H. (2005). Bilateral lesions of the habenula induce attentional disturbances in rats. Neuropsychopharmacology, 30(3), 484–96. doi:10.1038/sj.npp.1300595 Lecourtier, L., & Kelly, P. H. (2007). A conductor hidden in the orchestra? Role of the habenular complex in monoamine transmission and cognition. Neuroscience and Biobehavioral Reviews, 31(5), 658–72. doi:10.1016/j.neubiorev.2007.01.004 Lecourtier, L., Neijt, H. C., & Kelly, P. H. (2004). Habenula lesions cause impaired cognitive performance in rats : implications for schizophrenia. European Journal of Neuroscience, 19, 2551–2560. doi:10.1111/j.1460-9568.2004.03356.x Lejuez, C. W., Read, J. P., Kahler, C. W., Richards, J. B., Ramsey, S. E., Stuart, G. L., … Brown, R. A. (2002). Evaluation of a behavioral measure of risk taking: The Balloon Analogue Risk Task (BART). Journal of Experimental Psychology: Applied, 8(2), 75–84. doi:10.1037//1076-898X.8.2.75 Levant, B. (1998). Differential distribution of D3 dopamine receptors in the brains of several mammalian species. Brain Research, 800(2), 269–74. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/9685676 160  Llewellyn, D. J. (2008). The psychology of risk taking: toward the integration of psychometric and neuropsychological paradigms. American Journal of Psychology, 121(3), 363–76. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/18792715 Lodge, D. J., & Grace, A. A. (2006). The hippocampus modulates dopamine neuron responsivity by regulating the intensity of phasic neuron activation. Neuropsychopharmacology, 31(7), 1356–61. doi:10.1038/sj.npp.1300963 Lodge, D. J., & Grace, A. A. (2006). The laterodorsal tegmentum is essential for burst firing of ventral tegmental area dopamine neurons. Proceedings of the National Academy of Sciences of the United States of America, 103(13), 5167–72. doi:10.1073/pnas.0510715103 Lokwan, S. J., Overton, P. G., Berry, M. S., & Clark, D. (1999). Stimulation of the pedunculopontine tegmental nucleus in the rat produces burst firing in A9 dopaminergic neurons. Neuroscience, 92(1), 245–54. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/10392847 Madden, G. J., Petry, N. M., & Johnson, P. S. (2009). Pathological gamblers discount probabilistic rewards less steeply than matched controls. Experimental and Clinical Psychopharmacology, 17(5), 283–90. doi:10.1037/a0016806 Manes, F., Sahakian, B., Clark, L., Rogers, R., Antoun, N., Aitken, M., & Robbins, T. (2002). Decision-making processes following damage to the prefrontal cortex. Brain, 125(Pt 3), 624–39. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/11872618 Marsh, A. A., Blair, K. S., Vythilingam, M., Busis, S., & Blair, R. J. R. (2007). Response options and expectations of reward in decision-making: the differential roles of dorsal and rostral anterior cingulate cortex. NeuroImage, 35(2), 979–88. doi:10.1016/j.neuroimage.2006.11.044 Martin, J. H., & Ghez, C. (1999). Pharmacological inactivation in the analysis of the central control of movement. Journal of Neuroscience Methods, 86(2), 145–59. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/10065983 Matsumoto, M., & Hikosaka, O. (2007). Lateral habenula as a source of negative reward signals in dopamine neurons. Nature, 447(7148), 1111–5. doi:10.1038/nature05860 Matsumoto, M., & Hikosaka, O. (2009). Representation of negative motivational value in the primate lateral habenula. Nature Neuroscience, 12(1), 77–84. doi:10.1038/nn.2233 Matsumoto, M., & Hikosaka, O. (2011). Electrical stimulation of the primate lateral habenula suppresses saccadic eye movement through a learning mechanism. PloS One, 6(10), e26701. doi:10.1371/journal.pone.0026701 161  Matthews, S. C., Simmons, A. N., Lane, S. D., & Paulus, M. P. (2004). Selective activation of the nucleus accumbens during risk-taking decision making. Neuroreport, 15(13), 2123–7. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/15486494 McDonald, A. J. (1987). Organization of amygdaloid projections to the mediodorsal thalamus and prefrontal cortex: a fluorescence retrograde transport study in the rat. Journal of Comparative Neurology, 262(1), 46–58. Meng, H., Wang, Y., Huang, M., Lin, W., Wang, S., & Zhang, B. (2011). Chronic deep brain stimulation of the lateral habenula nucleus in a rat model of depression. Brain Research, 1422, 32–8. doi:10.1016/j.brainres.2011.08.041 Mobini, S., Body, S., Ho, M.-Y., Bradshaw, C. M., Szabadi, E., Deakin, J. F. W., & Anderson, I. M. (2002). Effects of lesions of the orbitofrontal cortex on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology, 160(3), 290–8. doi:10.1007/s00213-001-0983-0 Nicola, S. M. (2007). The nucleus accumbens as part of a basal ganglia action selection circuit. Psychopharmacology, 191(3), 521–50. doi:10.1007/s00213-006-0510-4 Nicola, S. M., & Malenka, R. C. (1997). Dopamine depresses excitatory and inhibitory synaptic transmission by distinct mechanisms in the nucleus accumbens. Journal of Neuroscience, 17(15), 5697–710. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/9221769 Nicola, S. M., Surmeier, D. J., & Malenka, R. C. (2000). Dopaminergic modulation of neuronal excitability in the striatum and nucleus accumbens. Annual Review of Neuroscience, 23, 185–215. Nowend, K. L., Arizzi, M., Carlson, B. B., & Salamone, J. D. (2001). D1 or D2 antagonism in nucleus accumbens core or dorsomedial shell suppresses lever pressing for food but leads to compensatory increases in chow consumption. Pharmacology, Biochemistry, and Behavior, 69(3-4), 373–82. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/11509194 O’Donnell, P., & Grace, A. A. (1994). Tonic D2-mediated attenuation of cortical excitation in nucleus accumbens neurons recorded in vitro. Brain Research, 634(1), 105–12. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/8156380 Ouagazzal, A. M., & Creese, I. (2000). Intra-accumbens infusion of D(3) receptor agonists reduces spontaneous and dopamine-induced locomotion. Pharmacology, Biochemistry, and Behavior, 67(3), 637–45. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/11164096 Overton, P. G., Tong, Z. Y., & Clark, D. (1996). A pharmacological analysis of the burst events induced in midbrain dopaminergic neurons by electrical stimulation of the prefrontal cortex in the rat. Journal of Neural Transmission, 103, 523–540. 162  Pais-Vieira, M., Lima, D., & Galhardo, V. (2007). Orbitofrontal cortex lesions disrupt risk assessment in a novel serial decision-making task for rats. Neuroscience, 145(1), 225–31. doi:10.1016/j.neuroscience.2006.11.058 Parker, J. G., Zweifel, L. S., Clark, J. J., Evans, S. B., Phillips, P. E. M., & Palmiter, R. D. (2010). Absence of NMDA receptors in dopamine neurons attenuates dopamine release but not conditioned approach during Pavlovian conditioning. Proceedings of the National Academy of Sciences of the United States of America, 107(30), 13491–6. doi:10.1073/pnas.1007827107 Pattij, T., Janssen, M. C. W., Vanderschuren, L. J. M. J., Schoffelmeer, A. N. M., & van Gaalen, M. M. (2007). Involvement of dopamine D1 and D2 receptors in the nucleus accumbens core and shell in inhibitory response control. Psychopharmacology, 191(3), 587–98. doi:10.1007/s00213-006-0533-x Paxinos, G., & Watson, C. (2005). The Rat Brain in Stereotaxic Coordinates (4th ed.). San Diego (CA): Academic Press. Pennartz, C. M., Groenewegen, H. J., & Lopes da Silva, F. H. (1994). The nucleus accumbens as a complex of functionally distinct neuronal ensembles: an integration of behavioural, electrophysiological and anatomical data. Progress in Neurobiology, 42(6), 719–61. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/7938546 Perrotti, L. I., Bolaños, C. A., Choi, K.-H., Russo, S. J., Edwards, S., Ulery, P. G., … Barrot, M. (2005). DeltaFosB accumulates in a GABAergic cell population in the posterior tail of the ventral tegmental area after psychostimulant treatment. European Journal of Neuroscience, 21(10), 2817–24. doi:10.1111/j.1460-9568.2005.04110.x Pezze, M.-A., Dalley, J. W., & Robbins, T. W. (2007). Differential roles of dopamine D1 and D2 receptors in the nucleus accumbens in attentional performance on the five-choice serial reaction time task. Neuropsychopharmacology, 32(2), 273–83. doi:10.1038/sj.npp.1301073 Price, J. L. (2007). Definition of the orbital cortex in relation to specific connections with limbic and visceral structures and other cortical regions. Annals of the New York Academy of Sciences, 1121, 54–71. doi:10.1196/annals.1401.008 Pugsley, T. A., Davis, M. D., Akunne, H. C., Mackenzie, A. G., Georgic, L. M., Demattos, S. B., … Groningen, A. W. (1995). Neurochemical Preferentially and Functional Characterization Selective Dopamine D3 Agonist of the PD 128907. Journal of Pharmacology and Experimental Therapeutics, 275(3), 1355–1366. Quickfall, J., & Suchowersky, O. (2007). Pathological gambling associated with dopamine agonist use in restless legs syndrome. Parkinsonism & Related Disorders, 13(8), 535–6. doi:10.1016/j.parkreldis.2006.10.001 163  Rangel, A., Camerer, C., & Montague, P. R. (2008). A framework for studying the neurobiology of value-based decision making. Nature Reviews Neuroscience, 9(7), 545–56. doi:10.1038/nrn2357 Roberts, C., Cummins, R., Gnoffo, Z., & Kew, J. N. C. (2006). Dopamine D3 receptor modulation of dopamine efflux in the rat nucleus accumbens. European Journal of Pharmacology, 534(1-3), 108–14. doi:10.1016/j.ejphar.2006.01.014 Roesch, M. R., Calu, D. J., & Schoenbaum, G. (2007). Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature Neuroscience, 10(12), 1615–24. doi:10.1038/nn2013 Rogers, R. D., Everitt, B. J., Baldacchino, A., Blackshaw, a J., Swainson, R., Wynne, K., … Robbins, T. W. (1999). Dissociable deficits in the decision-making cognition of chronic amphetamine abusers, opiate abusers, patients with focal damage to prefrontal cortex, and tryptophan-depleted normal volunteers: evidence for monoaminergic mechanisms. Neuropsychopharmacology, 20(4), 322–39. doi:10.1016/S0893-133X(98)00091-8 Rogers, R. D., Ramnani, N., Mackay, C., Wilson, J. L., Jezzard, P., Carter, C. S., & Smith, S. M. (2004). Distinct portions of anterior cingulate cortex and medial prefrontal cortex are activated by reward processing in separable phases of decision-making cognition. Biological Psychiatry, 55(6), 594–602. doi:10.1016/j.biopsych.2003.11.012 Salamone, J. D. (1988). Dopaminergic modulation of activational aspects of motivation: effects of haloperidol on schedule induced activity, feeding and foraging in rats. Psychobiology, 16, 196–206. Salamone, J. D. (1992). Complex motor and sensorimotor functions of striatal and accumbens dopamine: involvement in instrumental behavior processes. Psychopharmacology, 107(2-3), 160–174. doi:10.1007/BF02245133 Salamone, J. D., Arizzi, M. N., Sandoval, M. D., Cervone, K. M., & Aberman, J. E. (2002). Dopamine antagonists alter response allocation but do not suppress appetite for food in rats: contrast between the effects of SKF 83566, raclopride, and fenfluramine on a concurrent choice task. Psychopharmacology, 160(4), 371–80. doi:10.1007/s00213-001-0994-x Salamone, J. D., & Correa, M. (2012). The mysterious motivational functions of mesolimbic dopamine. Neuron, 76(3), 470–85. doi:10.1016/j.neuron.2012.10.021 Salamone, J. D., Correa, M., Mingote, S. M., & Weber, S. M. (2005). Beyond the reward hypothesis: alternative functions of nucleus accumbens dopamine. Current Opinion in Pharmacology, 5(1), 34–41. doi:10.1016/j.coph.2004.09.004 Salamone, J. D., Cousins, M. S., & Bucher, S. (1994). Anhedonia or anergia? Effects of haloperidol and nucleus accumbens dopamine depletion on instrumental response selection 164  in a T-maze cost/benefit procedure. Behavioural Brain Research, 65(2), 221–9. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/7718155 Salamone, J. D., Cousins, M. S., Maio, C., Champion, M., Turski, T., & Kovach, J. (1996). Different behavioral effects of haloperidol, clozapine and thioridazine in a concurrent lever pressing and feeding procedure. Psychopharmacology, 125(2), 105–12. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/8783383 Salamone, J. D., Steinpreis, R. E., McCullough, L. D., Smith, P., Grebel, D., & Mahan, K. (1991). Haloperidol and nucleus accumbens dopamine depletion suppress lever pressing for food but increase free food consumption in a novel food choice procedure. Psychopharmacology, 104(4), 515–521. doi:10.1007/BF02245659 Sartorius, A., & Henn, F. A. (2007). Deep brain stimulation of the lateral habenula in treatment resistant major depression. Medical Hypotheses, 69(6), 1305–8. doi:10.1016/j.mehy.2007.03.021 Sautel, F., Griffon, N., Lévesque, D., Pilon, C., Schwartz, J. C., & Sokoloff, P. (1995). A functional test identifies dopamine agonists selective for D3 versus D2 receptors. Neuroreport, 6(2), 329–332. Schmidt, H. D., Anderson, S. M., & Pierce, R. C. (2006). Stimulation of D1-like or D2 dopamine receptors in the shell, but not the core, of the nucleus accumbens reinstates cocaine-seeking behaviour in the rat. European Journal of Neuroscience, 23(1), 219–28. doi:10.1111/j.1460-9568.2005.04524.x Schoenbaum, G., Chiba, A. A., & Gallagher, M. (2000). Changes in functional connectivity in orbitofrontal cortex and basolateral amygdala during learning and reversal training. Journal of Neuroscience, 20(13), 5179–89. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/10864975 Schultz, W. (2010). Dopamine signals for reward value and risk: basic and recent data. Behavioral and Brain Functions, 6, 24. doi:10.1186/1744-9081-6-24 Schultz, W. (2013). Updating dopamine reward signals. Current Opinion in Neurobiology, 23(2), 229–38. doi:10.1016/j.conb.2012.11.012 Schultz, W., Dayan, P., & Montague, P. R. (1997). A Neural Substrate of Prediction and Reward. Science, 275(5306), 1593–1599. doi:10.1126/science.275.5306.1593 Shumake, J., Ilango, A., Scheich, H., Wetzel, W., & Ohl, F. W. (2010). Differential neuromodulation of acquisition and retrieval of avoidance learning by the lateral habenula and ventral tegmental area. Journal of Neuroscience, 30(17), 5876–83. doi:10.1523/JNEUROSCI.3604-09.2010 165  Simon, N. W., Gilbert, R. J., Mayse, J. D., Bizon, J. L., & Setlow, B. (2009). Balancing risk and reward: a rat model of risky decision making. Neuropsychopharmacology, 34(10), 2208–17. doi:10.1038/npp.2009.48 St. Onge, J. R., Abhari, H., & Floresco, S. B. (2011). Dissociable contributions by prefrontal D1 and D2 receptors to risk-based decision making. Journal of Neuroscience, 31(23), 8625–33. doi:10.1523/JNEUROSCI.1020-11.2011 St. Onge, J. R., Ahn, S., Phillips, A. G., & Floresco, S. B. (2012). Dynamic fluctuations in dopamine efflux in the prefrontal cortex and nucleus accumbens during risk-based decision making. Journal of Neuroscience, 32(47), 16880–91. doi:10.1523/JNEUROSCI.3807-12.2012 St. Onge, J. R., Chiu, Y. C., & Floresco, S. B. (2010). Differential effects of dopaminergic manipulations on risky choice. Psychopharmacology, 211(2), 209–21. doi:10.1007/s00213-010-1883-y St. Onge, J. R., & Floresco, S. B. (2009). Dopaminergic modulation of risk-based decision making. Neuropsychopharmacology, 34(3), 681–97. doi:10.1038/npp.2008.121 St. Onge, J. R., & Floresco, S. B. (2010). Prefrontal cortical contribution to risk-based decision making. Cerebral Cortex, 20(8), 1816–28. doi:10.1093/cercor/bhp250 St. Onge, J. R., Stopper, C. M., Zahm, D. S., & Floresco, S. B. (2012). Separate prefrontal-subcortical circuits mediate different components of risk-based decision making. Journal of Neuroscience, 32(8), 2886–99. doi:10.1523/JNEUROSCI.5625-11.2012 Stamatakis, A. M., & Stuber, G. D. (2012). Activation of lateral habenula inputs to the ventral midbrain promotes behavioral avoidance. Nature Neuroscience, 15(8), 1105–7. doi:10.1038/nn.3145 Stopper, C. M., & Floresco, S. B. (2011). Contributions of the nucleus accumbens and its subregions to different aspects of risk-based decision making. Cognitive, Affective & Behavioral Neuroscience, 11(1), 97–112. doi:10.3758/s13415-010-0015-9 Stopper, C. M., & Floresco, S. B. (2014). What’s better for me? Fundamental role for lateral habenula in promoting subjective decision biases. Nature Neuroscience, 17(1), 33–5. doi:10.1038/nn.3587 Stopper, C. M., Green, E. B., & Floresco, S. B. (2014). Selective involvement by the medial orbitofrontal cortex in biasing risky, but not impulsive, choice. Cerebral Cortex, 24(1), 154–62. doi:10.1093/cercor/bhs297 Stopper, C. M., Khayambashi, S., & Floresco, S. B. (2013). Receptor-specific modulation of risk-based decision making by nucleus accumbens dopamine. Neuropsychopharmacology, 38(5), 715–28. doi:10.1038/npp.2012.240 166  Sugam, J. A., Day, J. J., Wightman, R. M., & Carelli, R. M. (2012). Phasic nucleus accumbens dopamine encodes risk-based decision-making behavior. Biological Psychiatry, 71(3), 199–205. doi:10.1016/j.biopsych.2011.09.029 Tobler, P. N., Fiorillo, C. D., & Schultz, W. (2005). Adaptive coding of reward value by dopamine neurons. Science, 307(5715), 1642–5. doi:10.1126/science.1105370 Tong, Z. Y., Overton, P. G., & Clark, D. (1996). Stimulation of the prefrontal cortex in the rat induces patterns of activity in midbrain dopaminergic neurons which resemble natural burst events. Synapse, 22, 195–208. Tversky, A., & Kahneman, D. (1974). Judgment under Uncertainty: Heuristics and Biases. Science, 185(4157), 1124–31. doi:10.1126/science.185.4157.1124 Ungless, M. A., & Grace, A. A. (2012). Are you or aren’t you? Challenges associated with physiologically identifying dopamine neurons. Trends in Neurosciences, 35(7), 422–30. doi:10.1016/j.tins.2012.02.003 Van Gaalen, M. M., van Koten, R., Schoffelmeer, A. N. M., & Vanderschuren, L. J. M. J. (2006). Critical involvement of dopaminergic neurotransmission in impulsive decision making. Biological Psychiatry, 60(1), 66–73. doi:10.1016/j.biopsych.2005.06.005 Wade, T. R., de Wit, H., & Richards, J. B. (2000). Effects of dopaminergic drugs on delayed reward as a measure of impulsive behavior in rats. Psychopharmacology, 150(1), 90–101. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/10867981 Wanat, M. J., Kuhnen, C. M., & Phillips, P. E. M. (2010). Delays Conferred by Escalating Costs Modulate Dopamine Release to Rewards But Not Their Predictors. Journal of Neuroscience, 30(36), 12020–12027. doi:10.1523/JNEUROSCI.2691-10.2010 West, A. R., & Grace, A. A. (2002). Opposite influences of endogenous dopamine D1 and D2 receptor activation on activity states and electrophysiological properties of striatal neurons: studies combining in vivo intracellular recordings and reverse microdialysis. Journal of Neuroscience, 22(1), 294–304. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/11756513 Winstanley, C. A., Theobald, D. E. H., Dalley, J. W., & Robbins, T. W. (2005). Interactions between serotonin and dopamine in the control of impulsive choice in rats: therapeutic implications for impulse control disorders. Neuropsychopharmacology, 30(4), 669–82. doi:10.1038/sj.npp.1300610 Wise, R. A. (1978). Catecholamine theories of reward: a critical review. Brain Research, 152, 215–247. 167  Zack, M., & Poulos, C. X. (2004). Amphetamine primes motivation to gamble and gambling-related semantic networks in problem gamblers. Neuropsychopharmacology, 29(1), 195–207. doi:10.1038/sj.npp.1300333 Zahm, D. S. (1999). Functional-anatomical implications of the nucleus accumbens core and shell subterritories. Annals of the New York Academy of Sciences, 877, 113–28. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/10415646 Zeeb, F. D., Robbins, T. W., & Winstanley, C. A. (2009). Serotonergic and dopaminergic modulation of gambling behavior as assessed using a novel rat gambling task. Neuropsychopharmacology, 34(10), 2329–43. doi:10.1038/npp.2009.62 Zeeb, F. D., & Winstanley, C. A. (2011). Lesions of the basolateral amygdala and orbitofrontal cortex differentially affect acquisition and performance of a rodent gambling task. Journal of Neuroscience, 31(6), 2197–204. doi:10.1523/JNEUROSCI.5597-10.2011 Zeeb, F. D., & Winstanley, C. A. (2013). Functional disconnection of the orbitofrontal cortex and basolateral amygdala impairs acquisition of a rat gambling task and disrupts animals’ ability to alter decision-making behavior after reinforcer devaluation. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 33(15), 6434–43. doi:10.1523/JNEUROSCI.3971-12.2013 Zweifel, L. S., Parker, J. G., Lobb, C. J., Rainwater, A., Wall, V. Z., Fadok, J. P., … Palmiter, R. D. (2009). Disruption of NMDAR-dependent burst firing by dopamine neurons provides selective assessment of phasic dopamine-dependent behavior. Proceedings of the National Academy of Sciences of the United States of America, 106(18), 7281–8. doi:10.1073/pnas.0813415106  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0167375/manifest

Comment

Related Items