An Animal Model of Human Gambling Based on Pigeon Suboptimal Choice

Thomas R Zentall

An Animal Model of Human Gambling Based on Pigeon Suboptimal Choice

Thomas R Zentall^*

Department of Psychology, University of Kentucky, Lexington, KY

*Corresponding Author:: Thomas R Zentall
Department of Psychology
University of Kentucky, Lexington, KY
Tel: (859) 257-4076
E-mail: zentall@uky.edu

Received Date: 16/02/2017; Accepted Date: 03/04/2017; Published Date: 14/04/2017

Visit for more related articles at Research & Reviews: Neuroscience

Abstract

When humans gambling they choose an outcome that has high value but a low probability of occurrence over a more favourable high probability outcome of lower value (not gambling). Similarly, pigeons show a preference for an alternative that occasionally provides a signal for reinforcement over a more optimal alternative that always provides a signal for a lower probability of reinforcement. Two mechanisms appear to be responsible for this suboptimal behaviour: the signal for non-reinforcement (losing) appears to result in little or no inhibition and the probability of the signal for reinforcement is relatively unimportant. Human gambling behaviour appears to be controlled by similar mechanisms. Also, we have found that as with human gambling, pigeons that are more motivated to win choose less optimally. Furthermore, pigeons exposed to an enriched environment choose more optimally than those normally housed. Similar to humans, individual differences in impulsivity by pigeons predict attraction to the suboptimal alternative by pigeons. These findings may have implications for the treatment of humans who have problems with gambling behaviour.

Keywords

Gambling, Optimal choice, Conditioned reinforcer, Conditioned inhibition, Pigeons

Introduction

It is well known that humans often make bad economic decisions or suboptimal choices [1], for example, when they engage in commercial gambling (games relying solely on chance like playing slot machines or buying lottery tickets). They do so supposedly because gambling is thought to be a form of entertainment [2]. The entertainment value appears to be related to the prospect of winning a large sum of money but in commercial gambling the expected value is negative (almost always a loss).

In this regard, humans are thought to be different from other animals because according to behavioral ecologists, animals have been selected by evolution to forage for food optimally [3,4]. Entertainment should not be a consideration. Furthermore, experimental psychologists have argued that even in somewhat unnatural laboratory contexts, such as laboratory learning tasks, if one allows for sufficient experience with the procedure and the conditions of reinforcement are discriminable, animals should be sensitive to the probability of reinforcement associated with their choices. That is, they should learn to choose optimally [5]. If this analysis is correct, animals should learn to choose optimally under conditions similar to those of commercial gambling tasks.

In contrast to prevailing thought, we and others have found that animals often choose sub optimally, just as humans do. The implication of these findings is that the mechanism that supports suboptimal gambling may be similar in humans and other animals and animals will do so because although winning has great positive value, losing does not have the negative value that it ought to have to inhibit the suboptimal response. Evidence will show that although winning is the motivation that drives suboptimal choice, surprisingly, the probability of winning appears to play little role in the choice. In this review, I will first demonstrate conditions under which suboptimal choice occurs in pigeons. I will then attempt to identify the mechanism involved in the suboptimal choice. I will then examine the relation between this behavior in pigeons and gambling in humans to conclude that similar processes are very likely involved. Finally, I will examine several subject related demographics that we have found affect suboptimal choice in pigeons and are also thought to affect the tendency to engage in similar behavior by humans.

The Added Value of Information

Our research on suboptimal choice began with a somewhat different question. Would pigeons prefer to choose an alternative that provides information (sometimes a cue that signals reinforcement, “good news,” sometimes a cue that indicates the absence of reinforcement, “bad news”) over an alternative that provides no news (i.e., the probability of reinforcement was equally likely, 50%, for both alternatives)? The answer was clearly that providing pigeons with a cue that signaled reinforcement and a different cue that signaled the absence of reinforcement was preferred over an ambiguous cue that signaled that reinforcement might occur [6] (Figure 1). This finding is consistent with information theory [7] which proposes that any stimulus that reduces ambiguity will provide valued information. According to information theory, however, the amount of information provided by cues that signal the outcome should be maximum when the outcome is most ambiguous (i.e., when either outcome has a 50% chance of occurrence). If a positive outcome is more likely to occur or less likely to occur, less information should be provided by the cues. When the probability of reinforcement was increased equally to both alternatives (87.5%) the preference for the good-news/badnews alternative decreased.

Figure 1: Roper and Zentall’s design. Choice of one alternative was followed by a stimulus (e.g., red) 50% of the time always followed by reinforcement or a different stimulus (e.g., green) 50% of the time never followed by reinforcement. Choice of the other alternative (i.e., right) was followed by either of two stimuli (blue or yellow) both of which were followed by reinforcement 50% of the time. Spatial location and colors were counterbalanced over subjects. Inconsistent with information theory, when the probability of reinforcement was decreased equally to both alternatives (12.5% reinforcement for both alternatives) the preference for the good-news/bad-news alternative actually increased.

Furthermore, when the response requirement for the good-news/bad-news alternative increased, pigeons still preferred it, in spite of the fact that the increased response requirement increased effort and delayed the scheduled reinforcement. This finding suggested to us that pigeons could be induced to choose suboptimally when it meant obtaining less food.

The Suboptimal Choice Experiment

To test this idea, pigeons were asked if they would prefer the informative good-news/bad-news alternative over the uninformative alternative even if the choice was suboptimal (resulted in lass food). We found that when pigeons were provided with a choice between a 20% chance of signaled reinforcement and a 50% chance of unsignaled reinforcement, the informative alternative was strongly preferred (almost 100%) over the optimal 50% reinforcement alternative [8,9] (Figure 2).

Figure 2: Design of the Stagner and Zentall experiment. Pigeons choose between two alternatives. Choice of one alternative (e.g., left) is followed sometimes (20% of the time) by a stimulus (e.g., red) that is always followed by reinforcement or by at other times (80% of the time) by a different stimulus (e.g., green) that is never followed by reinforcement. Choice of the other alternative (i.e., right) is followed by blue or yellow each of which is followed by reinforcement 50% of the time. Spatial location and colors were counterbalanced.

Avoidance of Optimal Alternative Ambiguity?

In the experiment, perhaps it was not so much the desirability of the signal for good-news that attracted the pigeons to the suboptimal alternative but the aversiveness of the optimal alternative ambiguity associated with 50% reinforcement [8]. To test this hypothesis, the magnitude of reinforcement was manipulated rather than its probability [10]. Pigeons were given a choice between one alternative that provided a signal for 10 pellets of food 20% of the time or a signal for no food 80% of the time and another alternative that provided two signals that each predicted 3 pellets of food (Figure 3). In this case, in spite of the fact that both alternatives provided a reliable signal for food, pigeons preferred the alternative that provided a “jackpot” of 10 pellets (but only on 20% of the trials) over a consistent 3 pellets.

Figure 3: The Zentall and Stagner design. Pigeons chose between a vertical and a horizontal line. Choice of one alternative was followed either by a stimulus (e.g., red) on 20% of the trials that was always followed by 10 pellets of reinforcement or by a different stimulus (e.g., green) on 80% of the trials followed by the absence of reinforcement. Choice of the other alternative was followed by blue or yellow stimuli followed by 3 pellets of reinforcement. S Colors and spatial location were counterbalanced.

The results of these experiments suggested that the stimulus signaling the absence of food (the S- stimulus) was ineffective in inhibiting choice of the suboptimal alternative. Instead, it was the predictive value of the food signaling stimulus (the S+) that determined choice. Thus, it follows that the frequency of the S+ stimulus played little role in choice of the suboptimal alternative. Manipulation of the magnitude of reinforcement indicates the generality of the suboptimal choice phenomenon and the magnitude of reinforcement manipulation is more like human gambling behavior in which humans wager a certain amount of money (the optimal alternative) in the hope of gaining a larger amount [11,12] (for similar results with monkeys).

The Role of the Frequency of the Suboptimal S+

In spite of the fact that the S- stimulus appeared 80% of the time in both experiments, it did not seem to inhibit choice of the suboptimal alternative. Rather it appears that it was the predictive value of the S+ stimulus not its probability of occurrence that determined choice. To test this hypothesis more directly, we pitted the probability of reinforcement against the signaling value of the stimuli that followed the choice [13]. In this experiment, pigeons chose between one alternative that provided an S+ stimulus on 20% of the trials or an S- stimulus on 80% of the trials and a second alternative that provided an S+ on 50% of the trials or an S- on 50% of the trials (Figure 4). In keeping with the hypothesis that the frequency of the S+ stimuli played little role in choice between the two alternatives, the pigeons were indifferent between them [14,15].

Figure 4: Design of Stagner et al. Pigeons chose between two alternatives that were distinguished by discriminative stimuli (a vertical or a horizontal line). Choice of one alternative was followed either by a stimulus (e.g., red) on 20% of the trials that was always followed by reinforcement or by a different stimulus (e.g., green) on 80% of the trials that was never followed by reinforcement. Choice of the other alternative was followed either by a stimulus (e.g., blue) on 50% of the trials that was always followed by reinforcement or by a different stimulus (e.g., yellow) on 50% of the trials that was never followed by reinforcement. Spatial location and colors were counterbalanced.

The stimulus value hypothesis can account for the results of the preference for “information” experiment reported by Roper and Zentall in which as the probability of reinforcement decreased, the preference for the discriminative stimulus alternative actually increased [6]. The stimulus value hypothesis suggests that the probability of the occurrence of the signal for reinforcement is of relatively little importance, only its predictive value. In their experiment, as the probability of reinforcement was reduced from 87.5% to 50% to 12.5% the predictive value of the signal for reinforcement remained the same (100%). However, the predictive value of the signal for reinforcement that followed choice of the other alternative decreased from 87.5% to 50% to 12.5%. Thus, as the probability of the occurrence of the signal for reinforcement decreased, the relative predictive value of that signal increased.

As further evidence of the relative unimportance of the frequency of the signal for reinforcement, Vasconselos, Montiero, and Kacelnik repeated the Stagner and Zentall experiment in which pigeons were given a choice between a signaled 20% reinforcement and an unsignaled 50% reinforcement using starlings as subjects and found similar results [8,16]. They then reduced the frequency of the S+ stimulus that followed choice of the suboptimal alternative from 20% to 10%, then 5%, and finally 0%. Reducing the frequency to 10% resulted in no reduction in choice of the suboptimal alternative. Reducing the frequency to 5% resulted in a small drop in preference for the suboptimal alternative, to about 70% but it was not until the S+ frequency dropped to 0% that the starlings showed a clear preference for the optimal alternative.

A recent experiment by Smith and Zentall tested the stimulus value hypothesis under conditions involving a choice between signaled 50% reinforcement and signaled 100% reinforcement [17] (Figure 5). Although the optimal alternative always provided food and provided it twice as often as the suboptimal alternative, because both S+ stimuli predicted reinforcement equally, as predicted by the stimulus value hypothesis, the pigeons were indifferent between the two alternatives. Earlier research suggested that pigeons often showed a strong preference for the suboptimal alternative but that research [18,19].

Figure 5: Smith and Zentall design. Pigeons chose between two alternatives. Choice of one alternative (e.g., the plus) was followed by a red stimulus 50% of the time always followed by reinforcement or a green stimulus 50% of the time that was never followed by reinforcement. Choice of the other alternative (i.e., the circle) was followed by a yellow stimulus followed by reinforcement 100% of the time.

Always used alternatives that differed only spatially and when pigeons were indifferent between two alternatives they often adopted a spatial preference. Smith and Zentall, however, used a visual discrimination for which the discriminative stimuli changed location randomly from trial to trial so if the pigeons developed a spatial preference it would not show up as a preference for either alternative [17]. Consistent with this hypothesis, in each of the previous experiments some pigeons showed a strong suboptimal preference, whereas others showed a strong optimal preference [20]. Thus, the large preference for the suboptimal alternative demonstrated by some subjects with this design was likely influenced by strong spatial preference unrelated to the stimuli or reinforcement contingencies that followed.

It is unlikely, however, that all of the preference for the suboptimal alternative found when the initial-link discrimination was spatial can be attributed to spatial preferences unrelated to actual preferences for the suboptimal alternative. First, in most of the reported research in which a spatial discrimination was used, more pigeons chose suboptimally than chose optimally.

Second, Belke and Spetch, reversed the contingencies associated with the initial stimuli for the three (of the eight) pigeons that showed a strong preference for the suboptimal alternative and found that the spatial preferences reversed as well [18]. Third, when a 5-s gap was inserted between the offset of the initial stimulus and the suboptimal conditioned reinforcer it resulted in choice of the optimal alternative, however, when a similar gap was inserted between the offset of the initial stimulus and the optimal conditioned reinforcer, it had little effect on the preference for the suboptimal. Furthermore, it is likely that variables other than the value of the terminal link stimulus affect choice of the suboptimal alternative. For example, when the terminal links are signaled, Dunn and Spetch found that short duration initial links produced suboptimal choice but no longer ones. They found suboptimal choice only when choice involved a single peck. When it took more time to get to the terminal link (from variable interval 10 s to variable interval 80 s) the pigeons preferred the optimal alternative. On the other hand, in general, the duration of the terminal link did not affect the preference for the signaled terminal links [19] Spetch, found a preference for the optimal alternative when the terminal link duration was quite short.

The Conditioned Reinforcer Value

The research described raises questions about what is responsible for the suboptimal choice effect. In most research, the delay to reinforcement on each trial following choice of the alternative is carefully controlled. That is, the outcome that was scheduled to occur following each initial choice always occurs a fixed time following choice [21]. A fixed delay was used because it is well known that any differential delay to reinforcement can result in considerable discounting of the reinforcer. The fact that the frequency of the signal for reinforcement is relatively inconsequential in the preference for the suboptimal alternative suggests that the probability of reinforcement associated with each initial link (the primary reinforcing value) is relatively unimportant and choice is determined by the secondary (or conditioned) reinforcing value of the stimulus that follows the choice [22]. Although all primary reinforcers occur the same time after choice, the conditioned reinforcers typically follow choice immediately. Thus, there is little discounting of the conditioned reinforcers whereas there is considerable discounting of the primary reinforcers. This may explain why the predictive value of the conditioned reinforcers determines choice. If this interpretation is correct it offers a possible explanation for the suboptimal choice. That is, when pigeons appear to prefer 20% reinforcement over 50% reinforcement they are really showing a preference for a reliable signal for reinforcement (100%) associated with choice of the suboptimal alternative over the less reliable signal for reinforcement (50%) associated with choice of the optimal alternative. Similarly, when pigeons appear to prefer an average of 2 pellets over 3 pellets they are actually showing a preference for 10 pellets over 3 pellets. Furthermore, when the S+ stimuli that follow choice have equal predictive value, this hypothesis can account for indifference between the two alternatives, independent of the frequency of those stimuli [13,17,23].

Alternatives to the Stimulus Value Hypothesis

Mazur proposed that preference for initial-link stimuli is determined by the conditioned reinforcers that follow and the value of the conditioned reinforcers is inversely related to the total time spent in their presence prior to primary reinforcement. In the case of 20% signaled reinforcement vs. 50% unsignaled reinforcement (Figure 2), the suboptimal alternative would be preferred because reinforcement would follow the unsignaled conditioned reinforcer only 50% of the time, whereas reinforcement would follow the signaled conditioned reinforcer 100% of the time [24]. In the case of 50% signaled reinforcement vs. 100% reinforcement (Figure 5) both conditioned reinforcers are followed by reinforcement 100% of the time. Thus, pigeons should be indifferent between them. This theory is very similar to the stimulus value hypothesis.

A somewhat different interpretation of suboptimal choice by pigeons was proposed by Stagner and Zentall. They proposed that the preference for 20% signaled reinforcement over 50% unsignaled reinforcement, resulted from the change in expected value of the initial link (20% reinforcement) to the signal for reinforcement (100% reinforcement) a change that should produce strong positive contrast (but little or no negative contrast between the initial link 20% reinforcement and the stimulus that signals 0% reinforcement). Most important, for the 50% reinforcement alternative, there should be no contrast between the initial link 50% reinforcement and the terminal link stimulus that signals 50% reinforcement. The contrast account could also account for indifference between the suboptimal alternative and optimal alternative when the choice was between 50% signaled reinforcement and 100% reinforcement reported by Smith and Zentall because the positive contrast that occurred following choice of the suboptimal alternative upon presentation of the S+ stimulus (50% expected, 100% received) would be reduced by the negative contrast found upon presentation of S- stimulus (50% expected, 0% obtained).

To account for the results found with both designs, McDevitt et al. suggested that preference for the suboptimal alternative could be explained by the reduction in delay to reinforcement signaled by the appearance of the S+ that follows the suboptimal choice (similar to the contrast account) [25]. They called this the Signal for Good News hypothesis. According to this hypothesis, in the case of 20% signaled reinforcement vs. 50% unsignaled reinforcement, there would be a large reduction in the delay to reinforcement signaled by appearance of the suboptimal S+, whereas there would be no reduction in the delay to reinforcement signaled by appearance of the optimal S+. Thus, the suboptimal alternative would be preferred. But what about the case of 50% signaled reinforcement vs. 100% reinforcement. In this case as well, there would be a large reduction in the delay to reinforcement signaled by appearance of the suboptimal S+, whereas there would be no reduction in the delay to reinforcement signaled by appearance of the optimal S+. But as noted no reliable preference was found. To account for the absence of a preference with that design, McDevitt et al. proposed that there was also an effect of the difference in primary reinforcement between the two alternatives and it was assumed that in this case the two effects must cancel out. The difference in delay reduction and primary reinforcement between the two alternatives can also account for the absence of preference between 25% signaled reinforcement (suboptimal) vs. 75% unsignaled reinforcement (optimal) and 50% signaled reinforcement (suboptimal) vs. 75% unsignaled reinforcement (optimal) found by Zentall et al [26]. In that case, there would be greater delay reduction (or contrast) associated with the appearance of the signaled reinforcement that appeared following choice of the 25% reinforcement alternative than the greater primary reinforcement associated with the 50% reinforcement alternative but the greater primary reinforcement associated with the 50% reinforcement alternative would make the 25% and 50% signaled reinforcement alternatives comparable. The problem with this hypothesis is that with two opposing preference inducing mechanisms, the theory can account for almost any outcome.

Is there Inhibition Associated with the Stimulus that Predicts the Absence of Food?

If the frequency of the signal for reinforcement has little effect on the choice between the two alternatives, it suggests that the signal for the absence of reinforcement is associated with little inhibition. Laude et al. tested this hypothesis more directly by using the magnitude of reinforcement design (Figure 3) developed by [27,28]. To assess the development of inhibition, Laude et al. tested for inhibition using a compound cue test, early in training or late in training, after preference for the suboptimal alternative had stabilized [29]. The combined cue test assesses inhibition by presenting a presumed inhibitory stimulus together with a known S+ and noting the decrease in responding to the compound. Results indicated that early in training there was a significant reduction in responding to the S+ stimulus when the S- was presented in compound with it (i.e., there was significant inhibition) but not late in training. Thus, paradoxically, what started out as significant inhibition early in training dissipated with further training and Stagner, Laude, and Zentall showed that this effect did not result from the ability of the pigeons to turn away from the S- stimulus when it appeared [10].

A theory of human gambling based on the absence of conditioned inhibition to losses also has been proposed to account for human gambling. Breen and Zuckerman, for example, reported that habitual gamblers have been found to attend to their infrequent wins but much less to their considerably more frequent losses than occasional gamblers [30]. Similarly, problem gamblers are less sensitive to aversive conditioning which should also serve to inhibit behavior [31].

The results of these pigeon experiments are consistent with human gambling research that has found that conditioned reinforcers play an important role for problem gamblers, whereas conditioned inhibitors exert very little control over their decisions to gamble [32-36]. Furthermore, problem gambling in humans is clinically recognized as an impulse control disorder in which people show impaired behavioral inhibition and a failure to consider the long-term consequences of the decisions they make [37]. Thus, much like pigeons, problem gamblers have a strong attraction to the signal for a large or highly probably reward without regard for the generally suboptimality of their choice.

The Bias for Certainty over Uncertainty

The Allais paradox or the certainty effect has shown that humans show paradoxical choice behavior [38,39]. For example, given a choice between a 100% chance of earning $5 or an 80% chance of earning $10, most people choose the certain $5, although the average return on the 80% chance of earning $10 is higher ($8). But if one reduces both of the probabilities by one half (i.e., a choice between a 50% chance of earning $5 and a 40% chance of earning $10), the opposite preference is typically found. According to expected utility theory, the results of the second choice should be the same as the first choice but they are not [40]. Subjects often report that they prefer the certain $5 because they would be especially disappointed if they chose the 80% chance of $10 and lost, whereas their preference for the 40% of obtaining $10 is that they might almost as easily have lost had they chosen the 50% chance of obtaining $5.

If avoiding the possibility of a loss is why humans choose suboptimally, it could also explain why pigeons choose the alternative that provides the conditioned reinforcer that predicts 100% reinforcement over the alternative that provides a conditioned reinforcer that predicts 50% reinforcement. To test this possibility, we conducted an experiment much like that of Stagner and Zentall in which all of the reinforcement associated with the conditioned reinforcers were reduced by 20% [8]. That is, the stimulus that predicted reinforcement occurred on only 20% of the trials, however, on those trials, reinforcement occurred only 80% of the time. Thus, reinforcement was no longer certain. Once again, however, the pigeons showed a strong preference for the suboptimal alternative [41]. Thus, uncertainty associated with the conditioned reinforcer that followed choice of the suboptimal alternative did not deter the pigeons from choosing suboptimally. It may be that if the percentage of reinforcement associated with the low probability, high payoff stimulus was reduced still further the pigeons’ choice would have reversed their preference and chosen optimally. Certainty, however, it does not appear to be the mechanism responsible for suboptimal choice in the experiment in which magnitude of reinforcement was manipulated because the conditioned reinforcers associated with both alternatives predicted reinforcement 100% of the time [28]. One difference between what we did with pigeons and the procedures used with humans is the that the pigeons experienced the probabilities whereas the humans are told what they are. Harman and Gonzalez found that when humans choose based on experience rather than being told the probabilities of the outcomes they are more likely to choose optimally [42].

The Immediacy of the Terminal Link Stimuli

According to the stimulus value hypothesis, indifference between the optimal and suboptimal alternatives results from the similar value of the terminal link stimuli that predict reinforcement (both predict reinforcement 100% of the time). However, there may be some differences between the two conditioned reinforcers. McDevitt et al. gave pigeons a choice between 50% signaled reinforcement and 100% reinforcement [21]. When they inserted a dark 5-s gap prior to the onset of the S+ stimulus that followed choice of the suboptimal alternative it resulted in a large reduction in the preference for that alternative. That is reasonable because delaying the onset of the conditioned reinforcer diminishes its effectiveness. However, when a similar gap was inserted prior to the onset of the S+ stimulus that followed choice of the optimal alternative, it had little effect on the preference for the suboptimal alternative. McDevitt et al. reasoned that the resolution of uncertainty enhances the value of the stimulus that resolves it. Although the S+ stimulus that follows the suboptimal alternative resolves uncertainty, the S+ stimulus that follows the optimal alternative does not (the expected probability of reinforcement does not change) [21]. However, according to this hypothesis, Smith and Zentall should have found a preference for the suboptimal alternative but instead they found indifference [17].

The Relation between Suboptimal Choice and Impulsivity

It is well accepted that the rate at which rewards are discounted with increasing delay is a measure of the impulsivity of the

organism [43]. If delay discounting is the mechanism responsible for the suboptimal choice, one would expect to see a correlation between the slope of the discounting function in a delay discounting task and the development of a preference for the suboptimal alternative in the suboptimal choice task. Laude et al. fit pigeons’ delay discounting data to the hyperbolic function [V = A/(1 + kD)] in which V is the value of the reinforcer, A is a measure of the magnitude of reinforcement, D is the delay between the choice response and reinforcement, and k is a free parameter that determine the rate at which V decreases with increases in D, or it can be described as the slope of the discounting function [44]. They then trained pigeons on the suboptimal choice task using the Zentall and Stagner procedure involving a choice between a 20% chance of obtaining 10 pellets and a 100% chance of obtaining 3 pellets [28]. A significant positive correlation (r=0.84) was found when suboptimal choices for each pigeon were compared with the mean k value from the discounting task for each pigeon. That is, choice of the suboptimal alternative and the slope of the delay discounting function were highly related. Thus, although all reinforcers on a trial were equally delayed, the S+ stimuli that signaled their appearance bridged that delay to the extent that they were valid predictors of reinforcement or its magnitude and their ability to bridge the delay determined their suboptimal alternative preference.

Human Performance on the Suboptimal Choice Task

Although there are differences between the procedures involved in human gambling decisions and the procedures used with pigeons, we hypothesize that the underlying processes may be quite similar. Molet et al. tested this proposition with a modified version of the suboptimal choice task used by Zentall and Stagner in which pigeons preferred the suboptimal choice of a 20% chance of obtaining 10 pellets over a 100% chance of obtaining 3 pellets [28,45]. The human experiment involved a video game in which subjects chose between two planetary systems each involving two planets that were distinguished by their color. Each planet was being invaded by aliens and the subjects were to move the mouse over the invaders and click to fire at their space ships. The purpose of the video game was to keep the subjects attentive during the 10 s between choice and the end of each trial. If they chose the suboptimal alternative, 20% of the time they were sent to a planet where they could obtain 9 - 11 points or 80% of the time to a planet when they could not obtain any points. If they chose the optimal alternative, they were always sent to one of two planets where they could obtain 2 - 4 points. Thus, choice of the suboptimal alternative provided an average of 2 point per trials whereas choice of the optimal alternative always provided them with 3 points per trials. Subjects were instructed to try to obtain as many points as they could. It was found that humans who reported that they regularly engaged in commercial gambling chose the suboptimal alternative significantly more than non-gamblers. These results suggest that mechanisms found to be involved in suboptimal choice by pigeons may also be applicable to human gambling.

Task Differences

When humans gamble, the choice can be thought of as a go/no-go decision because humans can choose to gamble with money that they have or they can refrain from gambling. Pigeons, however, choose between an optimal and a suboptimal outcome, both of which involve obtaining resources that they do not already have. Although the procedures differ, it should make it more likely that humans would not gamble because not only do they have a choice between a probabilistic and a sure outcome but the sure outcome would be immediate (money in their pocket) whereas the probabilistic outcome would be delayed by the time it takes to gamble and learn about the outcome. This may help explain why only a small percentage of humans are problem gamblers. In fact, if the suboptimal outcome is delayed for pigeons, relative to the optimal outcome, we have that the pigeons begin to choose optimally [41].

As humans choose to gamble with money they have, unlike pigeons, their losses are money they must give up, rather than the absence of reinforcement. This distinction may be important because Kahneman and Tversky have found that although gains that are certain are preferred over proportionally larger gains that are probabilistic (the certainty effect), losses that are certain are avoided over proportionally larger losses that are probabilistic (the reflection effect) [46]. That is, there is a stronger bias to win back losses than to obtain gains, an effect that typically encourages gamblers to keep gambling when they lose [45]. Although it would be difficult to create a task in which pigeons, like humans, can choose to gamble with a reinforcer that they already have, as already noted, self-reported gamblers were found to be more likely to choose suboptimally than self-reported non-gamblers [45]. Thus, the go/no-go choice provided by commercial gambling and the two-alternative choice provided by our analog task appear to be comparable and the difference does not appear to be responsible for the suboptimal choice by pigeons. Furthermore, although one might view human gambling losses as the loss of an investment, pigeons’ suboptimal choice represents a real opportunity cost. The major difference being that humans can gamble until they have no more money (but of course many gamblers then borrow money) whereas the pigeons can gamble indefinitely.

The Role of Conditioned Reinforcers in Human Gambling

The suboptimal choice task that we have used with pigeons uses the appearance of conditioned reinforcers following choice but prior to the appearance of the outcome. The results of a simple thought experiment suggest that conditioned reinforcers are also present when humans engage in commercial gambling. The three reels on a slot machine, for example, can be thought of as conditioned reinforcers. The question is would people engage in gambling if the reels on the slot machine were obscured. That is, if the only outcome of money inserted in the machine would be either nothing or money falling into the coin tray, gambling might be much less likely. A similar argument can be made for other games of chance (e.g., roulette and blackjack). Thus, although there may be some procedural differences between the pigeon suboptimal choice task and human commercial gambling, the important elements of the two are actually quite similar.

The Near-Hit Effect: When Humans and Pigeons Differ

One way in which pigeons appear to differ from humans in their preference for the suboptimal alternative is in the effect of outcomes that indicate a loss but appear to come close to winning, a near hit (sometimes paradoxically referred to as a near miss). An example of a near hit outcome can best be described using a three-reel slot machine. A winning outcome consists of lining up three of the same symbols, one on each reel (e.g., three cherries). Any mixture of different symbols represents a loss but not all losses are considered equal by human subjects. For example, two reels with cherries followed by a reel with a bell represent a loss that to many gamblers is judged to be closer to winning than when the reel with the bell comes between the two reels with cherries [47]. When MacLin et al. gave subjects a choice among three machines, one that gave near hit trials 15%, 30%, or 45% of the time, the subjects preferred the machine that gave near hit trials most often [48]. Griffiths proposed that near hits encourage further game play because even though subjects are still losing, they feel that they must be doing something right [49]. Langer proposed that the near hit outcomes give gamblers the illusion of control [50]. That is, getting close to winning suggests that there may be skill involved in this game of chance. In games involving skill, such as shooting basketballs, near hits can provide feedback on how to modify behavior to be more successful in the future but in games of chance, such feedback has little effect on the likelihood of future winning.

Although it has been proposed that rats, like humans, show a preference for near hit trials, the effect is actually quite different because with three successive lights signaling a win (111), the rats responded just as much to any two lights, irrespective of their order (110, 101, and 011). For humans, on the other hand, 110 would be considered a near hit, whereas 101 and 011 would be considered clear losses [15].

Recently, Stagner et al. asked if pigeons preferred near hit trials over clear loss trials when the probability of reinforcement was equated (Figure 6). Not only did they find that pigeons preferred a clear loss over a mixture of clear loss and near hit trials but in a follow-up experiment they also found that the later in the trial that the near hit occurred, the more they avoided the alternative with the near hit trials. Thus, as already noted, the preference for near hit outcomes by humans may result from a generalization from the large number of skill tasks in which humans often engage.

Figure 6: Design of Stagner et al. Pigeons chose between two alternatives. Choice of one alternative (e.g., the plus) resulted in a red stimulus 50% of the time always followed by reinforcement, or a red stimulus 25% of the time that after 5 s changed to green that was never followed by reinforcement (the near hit outcome), or a green stimulus 25% of the time that was never followed by reinforcement (the clear loss outcome). Choice of the other alternative (e.g., the circle) was resulted in either a blue stimulus 50% of the time always followed by reinforcement or a yellow stimulus 50% of the time never followed by reinforcement (the clear loss outcome). Thus, both alternatives were associated with an equal probability of reinforcement.

The Demographics of Suboptimal Choice and Human Gambling

The relation between suboptimal choice and level of food restriction

Although humans often describe gambling as a form of entertainment, the fact that people of lower socio-economic status tend to gamble proportionally more than those of higher socio-economic status. Lyk-Jensen suggests that entertainment is not the primary motivation for gambling [51,52]. If the suboptimal choice by pigeons is a good analog of human gambling, then one might expect that the level of pigeons’ food restriction would be related to their degree of suboptimal choice. Consistent with this hypothesis, Laude et al. found that pigeons that were normally food restricted showed the typical suboptimal choice, whereas those that were minimally food restricted tended to choose optimally and thus, paradoxically, they obtained more food [53,54]. Such a finding might be considered consistent with risk sensitive foraging in which birds on a negative energy budget may be inclined to be more risk prone if a fixed option is not sufficient for the animal to survive but as Kacelnik and Batson have noted, relatively large birds like pigeons trained under the present conditions are not likely to be on a negative energy budget [55]. In any case, the view of human gambling as a form of investment is analogous to the pigeons’ choice as opportunity cost [16,56].

The relation between housing and suboptimal choice

Research with rats suggests that several extra-experimental environmental factors such as social and nonsocial enrichment can affect a rat’s tendency to self-administer drugs [57]. Rats that are housed in an enriched environment (a large cage with other rats and novel objects) are significantly less likely to self-administer drugs than rats that are individually (normally) housed. The mechanism responsible for the reduced self-administration of drugs by environmental enrichment has been hypothesized to be a reduction in impulsive behavior [58]. A similar mechanism has been suggested to be involved in the reduced effectiveness of conditioned reinforcers [59]. Impulsivity has also been implicated in human gambling behavior and there is evidence that similar physiological mechanisms underlie compulsive gambling and drug addiction [60,61].

Pattison et al. attempted to determine the effect of housing conditions on suboptimal choice by giving one group of pigeons experience in an enriched environment (a large cage with four other pigeons for 4 h a day), while the control pigeons remained in their normal one-to-a-cage housing [62]. When they exposed the pigeons from both groups to the gambling-like suboptimal choice task they found that the enriched pigeons were much slower to learn to choose the suboptimal alternative. Thus, enriched housing appears to retard the development of suboptimal choice, even for a relatively short 4 h a day. This finding has implications for the treatment of humans who are problem gamblers. It suggests that one might be able to reduce the attraction of gambling by exposing human gamblers to an environment that is socially and physically enriched.

Conclusion

The suboptimal choice task provides a reasonable analog to human commercial gambling. The mechanism responsible for this suboptimal behavior appears to be the relative lack of effectiveness of non-reinforcement in reducing the likelihood that they will choose the suboptimal alternative, even when the non-reinforcement occurs on almost every trial [44,50,61]. Furthermore, the relative probability of reinforcement associated with the choice appears to be relatively unimportant. Instead, the predictive values of the stimuli that follow that choice appear to be the primary determinant of the initial preference. Similarly, for most humans who gamble, it is the potential reward rather than the odds of winning that influences the tendency to gamble. It may also be that positive contrast between the expected probability of reinforcement and that obtained with the appearance of the S+ stimulus following choice of the suboptimal alternative plays a role as well [62].

References

Kahneman D and Tversky A. Subjective probability: A judgment of representativeness. Cogn Psychol. 1972;3:430-454.
Dandurand L and Ralenkotter R. An investigation of entertainment proneness and its relationship to gambling behavior: The Las Vegas experience. J Travel Res. 1985;23:12-16.
Pyke GH, et al. Optimal foraging: A selective review of theory and tests. Q. Rev. Biol.. 1977;52:137–154.
Stephens DW and Krebs JR. Foraging theory. Princeton University Press, Princeton, NJ, 1986.
Thorndike EL. Animal intelligence. Macmillan, New York. 1911
Roper KL and Zentall TR. Observing behavior in pigeons: The effect of reinforcement probability and response cost using a symmetrical choice procedure. Learn Motiv. 1999;30:201-220.
Brillouin L. Science and Information Theory. Dover Publications, New York. 1956.
Stagner JP and Zentall TR. Suboptimal choice behavior by pigeons. Psychol Bull. 2010;17:412-416.
Gipson CD, et al. Preference for 50% reinforcement over 75% reinforcement by pigeons. LearnBehav. 2009;37:289-298.
Stagner JP, et al. Sub-optimal choice in pigeons does not depend on avoidance of the stimulus associated with the absence of reinforcement. Learn Motiv. 2011;42:282-287.
Blanchard TC, et al. Orbitofrontal cortex uses distinct codes for different choice attributes in decisions motivated by curiosity. Neuron. 2015;85:602-614.
Bromberg-Martin ES and Hikosaka O. Midbrain dopamine neurons signal preference for advance information about upcoming rewards. Neuron. 2009;63:119-126.
Stagner JP, et al. Pigeons prefer discriminative stimuli independently of the overall probability of reinforcement and of the number of presentations of the conditioned reinforcer. J ExpPsycholAnimBehav. 2012;38:446–452.
Pisklak JM, et al. When good pigeons make bad decisions: Choice with probabilistic delays and outcomes.J Exp Anal Behav. 2015;104:241–251
Winstanley CA, et al. Dopamine modulates reward expectancy during performance of a slot machine task in rats: evidence for a “near-miss” effect. Neuropsychopharmacology. 2011;36:913–925.
Vasconcelos M, et al. Irrational choice and the value of information. Scientific Reports. 2015;5:13874.
Smith AP and Zentall TR. Suboptimal choice in pigeons: Choice is primarily based on the value of the conditioned reinforcer rather than overall reinforcement rate.J ExpPsycholAnim B. 2016;42:212-220.
Belke TW andSpetch ML. Choice between reliable and unreliable reinforcementalternatives revisited: Preference for unreliable reinforcement. J Exp Anal Behav.1994;62:353-366.
Belke ML, et al. Suboptimal choice in a percentage-reinforcement procedure: Effects of signal condition and terminal link length. J Exp Anal Behav.1990;53:219-234.
Spetch ML, et al. Determinants of pigeons’ choice between certain and probabilistic outcomes. AniLearn Behav. 1994;22:239–251.
McDevitt MA, et al. Contiguity and conditioned reinforcement in probabilistic choice. J Exp Anal Behav. 1997;68:317–327.
Spetch ML, et al. Suboptimal choice in a percentage-reinforcement procedure: Effects of signal condition and terminal link length. J Exp Anal Behav.1990;53:219-234.
Laude JR, et al. Suboptimal choice by pigeons may result from the diminishing effect of nonreinforcement. J ExpPsycholAnim Learn Cogn. 2014;40:12-21.
Mazur JE. Choice with certain and uncertain reinforcers in an adjusting delay procedure. J Exp Anal Behav.1996;66:63-73.
McDevitt MA, et al. When good news leads to bad choices. J Exp Anal Behav. 2016;105:23-40.
Zentall TR, et al. Suboptimal choice by pigeons: Evidence that the value of the conditioned reinforcer rather than its frequency determines choice. Psychol Rec.2015;65:223-229.
Laude JR, et al. Impulsivity affects suboptimal gambling-like choice by pigeons. J ExpPsycholAnim Learn Cogn. 2014;40:2-11.
Zentall TR and Stagnr J. Maladaptive choice behavior by pigeons: An animal analogue and possible mechanism for gambling (sub-optimal human decision-making behavior). ProcBiol Sci. 2011;278:1203–1208
Hearst E, et al. Inhibition and the stimulus control of operant behavior. J Exp Anal Behav. 1970;14:373–409.
Breen RB and Zuckerman M. 'Chasing' in gambling behavior: Personality and cognitive determinants. PersIndivid Dif. 1999;27:1097-1111.
Brunborg GS, et al. The relationship between aversive conditioning and risk-avoidance in gambling. J Gambl Stud. 2003;26:545-559.
Crockford DN, et al. Cue-Induced brain activity in pathological gamblers. Biol Psychiatry. 2005;58: 787–795.
Field M and Cox WM. Attentional bias in addictive behaviors: A review of its development, causes, and consequences. Drug Alcohol Depend. 2008;97:1–20.
Franken IHA, et al. Neuropsychological evidence for abnormal cognitive processing of drug cues in heroin dependence.Psychopharmacology. 2008;170:205–212
Holst RJ, et al. Why gamblers fail to win: A review of cognitive and neuroimaging findings in pathological gambling. NeurosciBiobehav Rev.2010;34:87–107.
Tversky A and Kahneman D. Judgment under uncertainty: Heuristics and biases. Science. 1974;185:1124–1131.
American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders(5th ed.). American Psychiatric Association, Arlington, VA. 2013.
Allais, M. The Rational Man's Behavior in the Face of Risk: Criticism of the Postulates and Axioms of the American School. Econometrica. 1953;21:503-546.
Shafir S, et al. Perceptual accuracy and conflicting effects of certainty on risk-taking behaviour. Nature. 2008;453:917-921.
von Neumann J and Morgenstern O. The theory of games and economic behavior. Princeton University Press. 1944.
Zentall TR and Stagner J. Sub-optimal choice by pigeons: Failure to support the Allais paradox. Learn Motiv. 2011;42:245–254.
Jason LH, et al. Allais from experience: choice consistency, rare events, and common consequences in repeated decisions. J BehavDecisMak. 2015;28:369–381.
Odum AL. Delay discounting: I’m a k you’re a k. J Exp Anal Behav. 2011;96: 427–439.
Mazur JE. An adjusting procedure for studying delayed reinforcement. In: Commons ML, Mazur JE, Nevin JA, Rachlin H (Eds.). Quantitative analyses of behavior: The effect of delay and of intervening events on reinforcement value. Erlbaum. 1987;5:55–73.
Molet M, et al. Decision-making by humans as assessed by a choice task: Do humans, like pigeons, show suboptimal choice? Learn Behav.2012;40:439-447.
Kahneman D and Tversky A. Prospect theory: An analysis of decision under risk. Econometrica. 1979;47: 263-291
MacLin OH, et al. Using a computer simulation of three slot gambling machines to investigate a gambler’s preference among varying densities of near-miss alternatives. Behav Res Methods. 2007;39:237-241.
Reid RL. The psychology of the near miss. J Gambl Behavior. 1986;2:32–39.
Griffiths M. Gambling technologies: Prospects for problem gambling. J Gambl Stud.1999;15:265-283.
Langer EJ. The illusion of control. J PersSoc Psychol. 1975;32:311-328.
Lyk-Jensen SV. New evidence from the grey area: Danish results for at-risk gambling. J Gambl Stud. 2010;26:455-467
Worthington AC. Implicit finance in gambling expenditures: Australian evidence on socioeconomic and demographic tax.Public Finance Rev. 2001;29:326-342.
Zentall TR and Wasserman. Oxford handbook of comparative cognition. Oxford University Press, New York. J InorgBiochem. 2012;153:1–12.
Laude JR, et al. Hungry pigeons make suboptimal choices,less hungry pigeons do not. Psychon Bull Rev. 2012;19:884–891.
Kacelnik A and Bateson M. Risky theories: The effects of variance on foraging decisions. American Zoologist. 1996;36:402-434.
McNamara JM and Houston AI. Risk sensitive foraging: a review of the theory. Bull Math Biol. 1992;54:355-378.
Stairs DJ and Bardo MT. Neurobehavioral effects of environmental enrichment and drugabuse vulnerability. PharmacolBiochemBehav.2009;92:377-382.
Perry JL and Carroll ME. The role of impulsive behavior in drug abuse. Psychopharmacology. 2008;200:1-26
Jones GH, et al.Increased sensitivity to amphetamine and reward-related stimuli following social isolation in rats: possible disruption of dopamine-dependent mechanisms of the nucleus accumbens. Psychopharmacology. 1990;3:364-372.
Steel Z and Blaszczynski A. Impulsivity, personality disorders and pathological gambling severity. Addiction. 1998;93:895-905
Potenza MN. The neurobiology of pathological gambling and drug addiction: an overview and new findings. Phil Trans R Soc B. 2008;363:3181-3189
Pattison KF, et al. Social enrichment affects suboptimal, risky, gambling-like choice by pigeons. AnimCogn. 2013;16:429-434.