Overview: In the four major sections of this chapter, we examine the links between the topics of generalization and discrimination, and attention and conceptual knowledge. The first section starts with the traditional topics of generalization and discrimination. The Hull and Spence Algebraic Summation Theory approach to generalization is examined, and contrasted with Sutherland and Mackintosh's Attentional Approach, based on Krechevsky's notion of Hypothesis Learning. We will see much evidence favoring a claim we have already met in earlier chapters: Attention plays a central role in learning. The second section follows up on this theme by examining attentional processes in humans. Filter and Capacity Models of Attention will be presented, and the section will close with an account of Attention and Automaticity. In the third section, we will look at categorization in humans, starting with a description of how Hypothesis Theory (H Theory) applies to single-feature categories, and moving on to more complex categories. We will examine the evidence for and against the claim that humans have well-defined rather than fuzzy (or probabilistic) categories. Finally, the last section will present experiments on categorization in animals.
Another possibility is that generalization is an innate feature of an animal's perceptual apparatus. We saw one such mechanism by which this may occur when we discussed Pavlov's theory of generalization: He believed stimulus centers in the brain were organized topographically on a principle of similarity, so that objects that resemble one another would excite regions in the cortex that were near one another. On this account, spill-over of excitation or activation from one center to neighboring centers would yield a generalization gradient. Like Pavlov, Hull believed that generalization was innate, although being a good behaviorist, he did not discuss internal causes (such as brain topography) underlying this principle.
A third possibility is that an animal responds to a complex bundle of stimuli, rather than a single object, so that the response becomes a probabilistic function of the number of stimuli present that were part of the original bundle. We saw something like this approach when we discussed Guthrie's theory. If we regard this bundle as a list of features or attributes, then we can claim that the more of this list we find in a given situation, the more probable the response becomes. If we train a pigeon to peck at the letter R for example, we might expect some pecking to the letter P because it shares some of the same attributes (a vertical line; a half-oval closed to the left), but less or no pecking to V (which shares only a line slanting down to the right, although the slanted line for R is a half line rather than a full line).
Finally, perhaps generalization gradients inform us about animal's categories and hypotheses about their world. Human categories tend to exhibit something very like a generalization gradient in that we judge some things to be better examples of the category than others. Perhaps we are teaching animals categories, so that their generalization gradients let us know which objects they regard as good examples of the category (i.e., having high generalization), and which they regard as poor examples (i.e., having low intermediate generalization).
These are the issues that will concern us in this chapter. We start with two very different approaches to explaining generalization. One, Algebraic Summation Theory, arises from the work of Hull and Spence, and was mentioned earlier when we discussed effective reaction potential. This approach claims that generalization is innate and need reflect absolutely nothing about an animal's categories or concepts. Rather, it claims that there is a basic principle of similarity that governs habit strength much as there is a basic principle of gravity that governs planetary orbits. Once a habit is formed, this principle states that physically similar stimuli will acquire the ability to trigger the habit in proportion to the degree of their similarity. The other, Attentional Hypothesis Testing, arises originally from the work of Krechevsky and of Lashley. It claims that generalization is a measure of the success of discrimination learning, which in turn is based on a hypothesis-testing approach: The animal selects certain stimulus values to attend, and generalization gradients in part reflect its knowledge of and familiarity with stimulus differences.
Before going on to these
theories, however, let us start with some basics of generalization and
discrimination.
A variant approach concerns the nature of the stimuli present. In most of the cases we have discussed so far, we have implicitly assumed novel stimuli that are presumably equally complex to the original stimulus. Thus, in the Guttman and Kalish experiment discussed in Chapter 4, pigeons were trained with one color, and then tested with other colors. But we need not restrict ourselves to such an approach. Often, we teach an animal to respond to a complex stimulus, and then systematically change components of that stimulus (or present just the components) to assess what it was the animal responded to. So, we might present a face as a stimulus, and then change the eyes or the nose or the hairline or the mouth to see whether the animal was responding to the whole complex, or to just one of its elements. You saw an example of this earlier in Chapter 6, when we discussed the experiment by Welker and McAuley: Rats extinguished more rapidly when their experimental environment or mode of transportation changed than when the only difference involved removing the food they got for bar pressing.
Some rather interesting questions may be asked using this technique, as we can see from a further example by Povinelli and Eddy. They trained chimpanzees to obtain a reward by holding a hand out in front of an experimenter. They subsequently presented the animal with two experimenters, only one of whom was looking at the chimp. The question they asked was whether chimpanzees would realize that it made sense just to 'beg' in front of the person who could see them. (That is, do chimpanzees understand that other entities can see what they are doing only if the eyes are oriented towards them?) They found that chimps begged randomly, and appeared not to know that the person looking away could not see their hand gesture. In subsequent generalization tests, chimps chose at random between someone covering his or her head with a bucket and someone holding a bucket at the side; someone with a blindfold over her eyes and someone with a blindfold over the mouth, etc. In fact, the only condition in which the chimps were able to correctly discriminate without further training involved a person facing the chimp versus one facing away. If chimpanzees are aware that we, like they, pick up information through our eyes, they failed to use that information in choosing a likely culprit to beg a reward from.
As the last two examples illustrate, generalization tests may be used to answer a variety of questions concerning what an animal pays attention to, and what its beliefs may be. That is a very far cry indeed from the earlier tests we looked at that used generalization to assess stimulus similarity. Nevertheless, many experiments still explore the issue of stimulus similarity and its likely influence on responding. For those experiments, several issues become particularly important. One is how to measure stimulus similarity, and a second is how to measure response similarity.
Concerning stimulus similarity, do we look at physical similarity or psychological similarity? Things that may be physically similar will not necessarily have the same level of psychological or perceptual similarity. Knowing which of these to track is certainly important, if we wish to come up with strong models of generalization. In the realm of music, for example, two notes one octave apart strike people as more similar than notes less than an octave apart, even though physically they represent a bigger difference in frequency.
And finally, concerning response similarity, a host of issues arise, the most compelling of which involves whether to track absolute or relative generalization gradients. In an absolute gradient, we directly measure some physical quality of the response such as the number of responses per unit time, the strength of a response, its latency, etc. In a relative gradient, on the other hand, we measure relative proportions. Thus, if we train a pigeon to peck at, say, a red light, we could compare the numbers of pecks per ten-minute session with red and orange lights if we want to use absolute gradients, or we can ask what percentage of the pecks are given to red (and what to orange) if we wish to look at relative gradients. Which we use turns out to be important, as different results are sometimes obtained.
Below, we will generally restrict ourselves to findings concerning absolute gradients that, for the sake of convenience, are measured on the basis of physical similarity of the underlying stimulus dimension or dimensions. As you read about generalization findings elsewhere, keep in mind that these distinctions are important, and be aware of the choices the researchers made in reporting their findings.
In a gradient of excitation, three features are characteristic: (1) the greatest responding (the peak) normally occurs with the trained stimulus; (2) the gradient typically exhibits a relatively smooth drop-off of responding as novel stimuli become more different (so that the gradient is shaped a bit like a tent or a bell-shaped curve); and (3) the gradient is symmetrical (the left half of the curve looks like a mirror image of the right half). These features may be seen in the generalization gradients Guttman and Kalish found (see Chapter 4, Figure 1). Gradients of inhibition display similar characteristics except that instead of a peak, of course, we look for a valley: the point at which the least responding occurs. A gradient of inhibition would normally look like a gradient of excitation turned upside down.
In gradients of excitation, the width and the height of the
curve tell us something about the extent of generalization. A bell-shaped
curve that extends over more stimuli (i.e., that is wider) indicates
greater generalization: The animal is obviously responding to more
stimuli. At the same time, a steep curve indicates a greater discernment:
With steep curves, animals are responding strongly to some stimuli,
but weakly to others (as opposed to responding moderately strongly
to all stimuli). All things being equal, less generalization is typically
associated with steeper, narrower curves. Figure 1 illustrates this by
showing a wide, shallow curve and a narrow, steep curve. In each case,
we have assumed a pigeon that is emitting 250 responses. But as you can
see, most of these responses in the case of the narrow curve are being
given to just a few stimuli. For purposes of comparison, Figure 2 graphs
exactly the same findings using a relative rather than an absolute
generalization gradient. Relative gradients tend to smooth out a bit the
differences found in absolute gradients. Thus, the low generalization curve
in Figure 2 no longer looks quite as steep or narrow as it did in Figure
1.
Although we have discussed several features that are normally found with generalization gradients, you ought to be aware that there are occasionally clear exceptions to these characteristics. In particular, as you will see below, discrimination training in which the animal is reinforced in the presence of one stimulus but not another will sometimes change a curve's symmetry, and result in a peak at a different spot (a phenomenon termed peak shift). In normal generalization tests in which novel stimuli are presented centered around the original reinforced stimulus, the curve will tend to be 'squashed up' on the side containing the S-, and the new peak will tend to occur on the other side. More about this below, as it has been the subject of much theorizing.
Jenkins and Harrison conducted a famous study relevant to this issue. One group of pigeons (the discrimination learning group) was explicitly trained to discriminate between two sounds: They were reinforced for pecking in the presence of a 1000 Hz tone, but not a 950 Hz tone. A second group simply had approach or appetitive learning, in which only the 1000 Hz tone was presented. In this latter group, the pigeons were placed in a Skinner box, but their pecking was prevented at certain periods of time because the lights would be off. When the lights came on, so would a 1000 HZ tone, and in these periods, pecking at a key resulted in reinforcement.
Both groups, of course, were subsequently exposed to a variety of sounds ranging from 300 Hz to 3500 Hz. What happened was that the discrimination learning group (the one exposed to two sounds) showed the tent-shaped gradient, but the approach learning group did not: They pecked equally regardless of the sound in the background. Jenkins and Harrison's results thus fit in with Lashley and Wade's learning theory, rather than Hull's innateness claim.
Unfortunately, if you compare Jenkins and Harrison's experiment with Guttman and Kalish's study, you will notice a contradiction. Guttman and Kalish's pigeons also constituted an approach learning group, since they were reinforced in the presence of a given color, and had no discrimination training in which another color failed to yield reinforcement. Nevertheless, Guttman and Kalish's pigeons showed the classic tent-shaped gradient, instead of the flat-line gradient of Jenkins and Harrison's approach learning group. So why the contradiction?
A number of theorists investigated one possible explanation for this contradiction. On this explanation, pigeons and other animals have had plenty of experience learning to tell colors apart (particularly insofar as we know that belongingness effects in the taste aversions paradigm seem to involve visual appearance for birds) long before they get to an experiment such as that run by Guttman and Kalish. So, perhaps we ought to see what happens when we prevent animals from having experience with colors.
And that is what Riley and Leuin did: They raised birds in monochromatic environments. In one experiment, their subjects were in an environment where filters allowed only light of a certain wavelength (589 nanometers) to get through. Ten days after hatching, the birds were trained to peck at a disk lit up at this wavelength. Then a week later, the birds were given a generalization test involving a disk lit up at 569 nanometers, and another disk at 550 nanometers. Quail, ducks, goslings, pigeons, etc., all pecked more to the 569 nanometer disk than to the 550 nanometer disk, consistent with a tent-shaped gradient. Thus, in this and many similar experiments (some involving fitting birds with filtering contact lenses!), non-flat gradients to color occurred even when there had been no prior exposure to color differences. Such a finding fits in with the claim that graded (i.e., tapering or tent-shaped) generalization is innate.
But that leaves us with the problem of accounting for the flat-line gradients Jenkins and Harrison found in their study. Kerr, Ostapoff, and Rubel repeated Riley and Leuin's method, except that they raised birds in single-sound environments before testing for generalization to different sounds. And they verified the Jenkins and Harrison results: These animals displayed flat gradients in the absence of learning to discriminate.
And to complicate matters further, flat gradients may be found with color under certain circumstances. Peterson also raised ducklings in monochromatic environments (589 nanometers), and compared their responding on a generalization test to ducklings raised in a normal, color-filled world. In this study, unlike the Riley and Leuin study, the generalization gradients for the monochromatic group were relatively flat, especially when compared to very impressive and steep gradients for the control group. However, Peterson used a much wider range of colors on the generalization test, and a careful examination of these data suggests that there may have been some reduced responding to the colors that Riley and Leuin tested, consistent with their claim.
Thus, graded gradients may have multiple causes. They may sometimes arise from innate factors, and may at other times reflect experience. But in any case, whatever their initial causes, there is no doubt that perceptual experience and further training may influence the shape of a gradient. We turn next to a brief discussion of the effects of factors such as these.
Group Amount of Training
1
2 50-minute sessions
2
4 50-minute sessions
3
7 50-minute sessions
4
14 50-minute sessions
Following training, each group was put through the same generalization test in which they were exposed to 9 disks that differed in orientation of the line. The gradient for Group 1 was quite broad, and showed only slight curvature around the original upright line. But, as training increased, the gradient became considerably steeper. In Group 4, for example, there was more than a three-fold increase in responding from a horizontal to a vertical line, but in Group 1, the increase was only about 50%.
If we define training in terms of number of reinforced responses, then the same relationship holds when we compare different partial reinforcement schedules. As the interval or ratio decreases, generalization tends to steepen. But decreased schedules, of course, yield more reinforced trials, all things being equal. We saw in Chapter 6 that one set of theories of partial reinforcement effects claimed they were due to generalization: Schedules with high ratios or intervals more nearly resemble extinction than schedules with low ratios or intervals. Here we find some further evidence supporting such a claim.
DELAY OF TESTING. Another factor that will influence the shape of the generalization gradient concerns the delay between training and the generalization test. Thomas and his colleagues have conducted a number of experiments on this relationship, and they generally find that as this interval increases, generalization increases, as well (i.e., the gradient broadens rather than narrows). Although the following result does not always occur (as may be seen by comparing Thomas & Lopez's results against those of Thomas et al.), there is some evidence that a broader gradient occurs because the animal increases its responding to other novel stimuli (rather than decreases its responding to the original stimulus with which it was trained). Such a result would be consistent with forgetting some of the features of the training stimulus that distinguish it from surrounding stimuli. For purposes of comparison, consider how your ability to tell Guthrie and Hull apart might differ right before versus two weeks after a test on their theories. With the passage of time, loss of information results in a memory becoming less distinct, and so, more easily confused with something else that has some overlap of features.
(Similar results occur with training involving an aversive outcome and gradients of inhibition. We concentrate here on reinforcement-based training.)
DIFFERENTIATION. Is differential reinforcement (in which one stimulus is preferentially marked due to an outcome) necessary for generalization to decrease? The answer appears to be "no." In fact, several studies suggest that simple exposure to stimuli prior to discrimination training will steepen gradients. This is in part an outcome of work done by E. Gibson involving perceptual learning. Similar in some respects to observational learning, perceptual learning involves the notion that organisms learn from their perceptual experiences. Reinforcement is not needed in such an approach; the learning occurs because of its adaptational value: An animal that leans about the perceptual constancies in its environment is better able to survive. Thus, the theory assumes attention to perceptual features in the absence of specific rewards.
One aspect of this theory relevant to our current concerns involves stimulus differentiation. An animal that is more familiar with certain stimuli ought to be able to discriminate them more easily (assuming non-trivial stimuli). Because of perceptual learning, the animal ought to pick up more and more features that characterize stimuli, and make them different. As a common-sense example, think of the many new faces you saw in this class on its first day. After meeting someone for the first time, people sometimes have difficulty recognizing that person on a subsequent occasion; there is a moderate level of familiarity about the face, but of insufficiently high a degree to trigger a confident "hello." But as you repeatedly encounter these same individuals, their faces become both more familiar and more distinct, and so, easier to recognize in other contexts.
The Peterson experiment mentioned above may be taken as one example consistent with the claims of stimulus differentiation. Another comes from Gibson, Walk, and Tighe. They set up a design with rats as follows:
Group Phase 1 Phase 2
experimental
triangles & circles in cage discrimination
control
no triangles & circles
discrimination
In this experiment, the experimental group was exposed to the triangles and circles for 90 days before being put through discrimination training in which it had to respond to one of these, but not the other. Gibson et al. found faster learning of the discrimination in this group, presumably because it had already learned something about the differences between these symbols (i.e., reduced generalization).
Although there is much evidence for stimulus differentiation, we ought to note two caveats. First, differentiation tends to be more successful if non-differential reinforcement is present. That is, hanging these symbols above where rats eat or drink is a good idea, presumably because the food (associated equally with both symbols) helps call attention to the surrounding context. It acts as a motivator for the animal to note the features of its environment. And second, stimulus differentiation in the design presented above seems to require that the context in Phase 2 differ from the context in Phase 1. If you think back to the chapters on classical conditioning, you will see one reason why this may be so. Earlier, we found that pre-exposing a stimulus led to latent inhibition. Thus, while pre-exposure can result in the animal learning about the perceptual features of a given stimulus, it may also result in the animal learning that this stimulus ought to be irrelevant for important biological consequences, since it has predicted no such consequences so far, in this particular environment. But a move to a different environment allows us to assess the extent to which differentiation has occurred when the stimulus's predictive value is not negated.
DISCRIMINATION TRAINING. We have already noted above that discrimination training (involving differential reinforcement whereby some stimulus or stimuli fail to be paired with an outcome) will change the shape of a gradient. The gradient may become non-symmetric, and may exhibit peak shift. It will also steepen. A good example of this is the Jenkins and Harrison experiment mentioned earlier. They actually ran several different discrimination groups. In one, pigeons had to discriminate between a 1000 Hz and a 950 Hz tone. In the other, pigeons had to discriminate between a (1000 Hz) tone present and a tone absent condition. A much narrower gradient was found in the condition involving two tones.
The Jenkins and Harrison finding regarding type of discrimination training fits in nicely with some of the discussion of stimulus differentiation. When the animal discriminates between presence and absence of a tone, that becomes the most salient perceptual feature. Hence, other tones are similar to the reinforced tone in representing a distinctive departure from the non-reinforcing silence. But when the animal has to distinguish among tone frequencies, then a different level of perceptual feature comes into play. Such findings exhibit aspects of relational learning (see below): learning in which the animal compares stimuli with one another to find a distinguishing feature or characteristic that will enable telling them apart. Which stimuli are presented along with a training stimulus may help determine which features are first found or attended.
As peak shift is an important test case for Hull's and Spence's theory, however, we defer further discussion of discrimination training effects to the section on algebraic summation theory.
DEPRIVATION. Another finding that we discussed earlier ought to be recalled. Animals that have been deprived of an outcome tend to exhibit steeper generalization gradients. Deprivation is a good motivator for paying attention to the various stimuli, and trying to 'get things right.'
STIMULUS
FACTORS. Finally, as the Guttman and Kalish
experiment makes clear (see Figure 1 in Chapter 4), training with different
stimuli can result in different gradients. How steep or narrow a gradient
will be will depend in part on an animal's perceptual systems. We are not
equally sensitive to all physical differences. Thus, the effects of stimulus
factors are in part bound up with issues of psychological or psychophysical
similarity, and with features of a given stimulus that may make it inherently
more salient than other stimuli.
Normally, the animal will initially respond to both stimuli, and thus discover that one of these consistently fails to yield the desired outcome. Over the course of training, then, we will see responding first rise to both stimuli, and then drop off to the S-. This is normal discrimination training. In a variant of this procedure termed errorless discrimination training, however, we typically start with the S- being so reduced in intensity compared to the S+ that it is virtually non-salient. Over the course of a large number of trials, we gradually increase the intensity of the S- until it equals that of the S+. If this is done slowly enough, then we may discover that our animal has never actually made a false response to the S- (hence the name errorless).
An alternative approach is to present just one stimulus at a time. In this case, we have a successive discrimination. Successive discrimination techniques open up several different possibilities, however. If the successive discrimination is like the simultaneous discrimination in that only one stimulus is associated with a reinforcer, then we have what is termed a go-no go situation. In this case, you go (respond) when the correct stimulus (the S+) is present, but do not go (withhold a response) when any other stimulus is present. In successive discriminations, however, we also open up the possibility of requiring several different responses, depending on which stimulus is present. A triangle, for example, could be used to signal pressing a left-hand button, whereas a circle could be used to signal pressing a right-hand button. In the simplest version of this situation, a reinforcer can be obtained on each trial (assuming you are using a continuous reinforcement schedule) so long as the animal knows which response to make to which stimulus. This type of set-up is referred to as a choice situation. Unlike the simultaneous condition or the go-no go situation, each stimulus in the choice situation could, in principle, be associated with the same degree of excitation or inhibition. Or put another way, in a choice situation, the stimulus acts as an occasion setter indicating which response is appropriate.
There is a certain looseness about the word "stimulus" in the above presentations. Successive discrimination is also possible when multiple stimuli are present, so long as those multiple stimuli together act as a stimulus complex that signals a single response rather than separate responses or a choice between them. Thus, whether more than one stimulus is present is not really the distinguishing characteristic of successive versus simultaneous discriminations. Rather, successive discriminations involve selecting an appropriate response to make to the given stimulus ensemble, whereas simultaneous discriminations involve choosing which of several stimulus ensembles to respond to.
To illustrate the point about a stimulus ensemble, let us consider yet another type of discrimination, conditional discrimination, involving successive discrimination. In conditional discrimination, at least two stimuli (or stimulus dimensions) are typically present. The reaction to one will depend on the presence of the other. For example, in an experiment by Nissen, chimps were trained to discriminate between a large square and a small square. Only one of these yielded a reinforcer, but which one that was depended on a second stimulus characteristic. Because this was a successive discrimination procedure, each trial involved just one of the four complex stimuli below:
Stimulus Outcome
large black square
reinforcement
small black square
no reinforcement
large white square
no reinforcement
small white square
reinforcement
In this example, color was critical in signaling the large square or the small square needed the go response: If the squares were black then large was the S+ and small the S-; but if the squares were white, this assignment reversed. So, color in this case was the occasion setter informing the animal about meaning of size.
In Nissen's experiment, color and size were combined in the same stimulus as different dimensions of that stimulus. One could argue that these actually represent four different stimuli with four different associative links (though see below for evidence that animals do learn about individual dimensions such as size or color). However, conditional discrimination does not depend on such a set-up. More commonly in conditional learning, separate stimuli are presented. Thus, we could have conducted the Nissen experiment in the following way:
Stimulus 1 Stimulus 2 Outcome
1000 Hz tone large square
reinforcement
1000 Hz tone small square
no reinforcement
900 Hz tone
large square
no reinforcement
900 Hz tone
small square
reinforcement
In this design, unlike Nissen, we now have an animal make different responses in the presence of the same physical object. That is, what the animal ought to do when faced with the second stimulus (a large or a small square) will be conditional on what the first stimulus was (the 1000 Hz or 900 Hz tone). Rats, pigeons, and chimps can learn these types of conditional discriminations.
A final way of looking at the ease and success of discrimination training involves transfer tasks. In a transfer task, we ask whether or how prior discrimination training on one problem affects later discrimination training with another problem. Sometimes the problems are very similar. Indeed, a favorite type of problem involves a reversal shift, in which the animal learns to do the exact opposite of what it did earlier. So, if it was trained to respond to a square but not a triangle, in a reversal shift, it would have to respond to the triangle, but not the square. Sometimes the problems are very different. Does training on the dimension of color in one situation help a color problem involving very different stimuli, and very different responses, for example? And sometimes the training involves combining problems. In a technique called acquired distinctiveness of cues, for example, we try to speed up the process of discrimination by compounding the two stimuli we want the animal to discriminate with very different stimuli that we know from earlier training are easily distinguished.
Transfer training per se isn't really a method of discrimination training, but it has sometimes been used as such. Thus, we might try to teach the animal a pattern of responding (a bit similar to Hulse's studies in Chapter 6 that looked at sensitivity to patterns in partial reinforcement schedules) by alternating a series of problems in a certain way. We may, for example, try to teach the animal to systematically switch to the other stimulus after every successful response, a discrimination procedure that requires multiple problems and a transfer set-up. Such work on learning sets (also called learning-to-learn) has been done by Harlow, and we will examine it more closely below.
In any case, let's look at some of the factors that influence how easily discriminations are learned.
From the point of view of Gibson's Perceptual Learning Theory, the more different two stimuli are, the easier it should be to find a perceptual dimension or feature that helps separate them. As a sample experiment, we might consider the results on training relative numerosity. In this paradigm, subjects typically choose between two displays that contain a number of items. Their discrimination may involve deciding which display has the smaller number of items, for example. Typically, as the number of items in the displays becomes more similar, the problem increases in difficulty, as evidence by training time, time to choose, and number of mistakes made in choosing. Kraemer, for example, finds that pigeons have more trouble with displays having similar numbers of items, and in the human literature, a well-known finding is that people generally take longer to decide which of two single-digit numbers (the numbers 1 through 9) is larger (see, e.g., Parkman) when the numbers are close together in value (5 and 6 will be a more difficult pair than 4 and 7, for example).
Closely related to stimulus similarity is the notion of feature or cue salience. In a discrimination involving complex stimuli in which some feature or cue has to be attended to in order to discriminate them, discrimination will occur more rapidly if the cue is salient or dominant. For example, if we try to teach humans to discriminate two types of artificial flowers, a problem involving colors or leaf shapes will be learned sooner than a problem involving the angles of branches going off of the main stem (Trabasso). The dimensions of color and leaf shape are dominant, salient dimensions for us in categorizing flowers (perhaps because these are important dimensions in discriminating real flowers). As such, they are the features we will first be drawn to; they will likely serve as our first hypotheses regarding how these artificial flowers are to be distinguished.
EXPERIENCE. Prior experience will also have a profound effect on learning, as may be evident from the example above regarding artificial flowers. You will see in a later section below that animals and humans tend to first pay attention to a dimension that has worked in the past. So, discrimination learning can be profoundly influenced by past experience, if the current problem seems at all similar to one encountered earlier. In this case, we say that there is transfer of training. If the transfer speeds the learning of the new problem, it is referred to as positive transfer (or facilitation). If it slows the learning of the new problem, we call it negative transfer (or interference).
The theoretical issue we will face here is whether such transfer can be accounted for solely through a principle of generalization.
In a sense, the rest of this major section will be devoted to the effects of experience. There are a number of findings here. For example, consistent with observational learning and perceptual learning, animals that observe other animals making a discrimination will learn that discrimination more rapidly (the Kohn and Dennis study mentioned in Chapter 5). Also, stimulus differentiation will speed up discrimination learning, as we saw in the Gibson, Walk, and Tighe study. However, for now, we will mention an additional finding: the easy-to-hard effect. Basically, this finding states that positive transfer can occur when a later discrimination involves a harder problem on the same dimension. An experiment by Marsh can serve to illustrate this. Here is the design:
Group Problem 1 Problem 2
1
easy brightness discrimination hard color discrimination
2
easy color discrimination
hard color discrimination
control
hard color discrimination
By comparing Groups 1 and 2 with our control group, we can assess the effects of prior discrimination learning. We will find reasonably good positive transfer in Group 2, but not Group 1.
DIFFERENTIAL ATTENTION TO S+. In a typical discrimination problem, we can ask whether the animal focuses more on the stimulus that works (the S+), or on the stimulus that doesn't (the S-). Is it more important to avoid the frustration of not getting a reward than it is to actually get the reward? Transfer studies can help us decide this issue, as well. Thus, in one experiment by Hearst, two groups of pigeons were taught an initial discrimination. For Group 1, the two stimuli were an empty circle and a circle with a small vertical line in it. The circle with the small line served as the S+ in this group. But for Group 2, the stimuli were an empty circle and a circle with a long vertical line in it, and this group had the empty circle as the S+.
Both groups were subsequently given a second discrimination problem in which the two stimuli were the circles with the lines in them. In this second problem, the small line-circle was the S+ and the large-line circle was the S-. What you should note about this procedure is that each of our two groups had been exposed to one of these stimuli before. Group 1 had learned about the small-line circle since that stimulus was also its S+, and Group 2 had learned about the large-line circle (which was Group 2's S-). In theory, then, each group could have had the same experience with the consequences of a given stimulus, which might have been expected to make the discrimination learning easy in Problem 2 because of positive transfer. The design for this study is given below (and I have used uppercase words to show you that each group in Problem 2 was being exposed to one stimulus-response association identical to what it had received in Problem 1):
Problem Group S+ S-
1
1
SMALL line empty circle
2
empty circle LARGE
line
2
1
SMALL line large line
2
small line
LARGE line
Hearst found that Group 1 learned the second problem more rapidly. There is a strong suggestion here that the animals learned more about the differentiated features of the S+ in initial acquisition.
We can point to several studies with humans that yield the same conclusion. Suppose we give people a discrimination in which they receive pairs of stimuli such as :
X T
There are actually quite a few features that may be relevant to forming the proper discrimination in this case. For example, will the correct reinforced answer be large letters? If so, our subject ought to choose the symbol on the left. Will it be Xs? Again, the choice then is to pick the object on the left. Will it be black? That also corresponds to choosing the left-hand object. And finally, perhaps our subject has the very simple idea that only things on the left result in reinforcement. Thus, as you can see from this example, four different features or hypotheses would lead our subject to choose the left-hand symbol. By the same token, four other features (small, right, orange, T) will lead to a decision to respond to the object on the right (if you are viewing this in black and white, the orange color appears as a very light grey). The task of the subject is thus to determine which of these eight possible features is the one that constitutes the S+.
In this type of experiment, Levine and others have found that how fast the person solves the problem depends on the relative proportion of positive versus negative feedback. Positive reinforcement (being told "yes" when you choose the correct object) will result in faster solutions than negative reinforcement (being told "no" when you pick the wrong object). Thus, in a sense, positive reinforcement is more informative; we are more successful in picking up information about the positive object.
That may strike you as a no-brainer. But if you go back to the example above and think about it, you will quickly realize that it doesn't matter whether we tell you "yes" or "no!" Choosing either object and getting feedback about it will be equally informative from a logical point of view. So, if you choose the X above and are told "yes," you know (1) that it's either X or black or large or left, and (2) that it can't be T or orange or small or right. But if you choose T above, instead, and are told "no," then you know exactly the same thing! In other words, right and wrong choices ought to be equally helpful in solving the problem. That they're not argues for a bias towards learning about the S+.
In another example, Craik and Tulving had people see a word and answer "yes" or "no" to a simple question about it (Is the word typed in blue?; does it rhyme with "weight?,"is it a type of fish?; etc.). Later, people were given a surprise memory test in which they were asked to recognize the words they had seen in the earlier part of the experiment. Consistent with our bias towards positive instances, Craik and Tulving found that the positive words (the ones people responded "yes" to) were better remembered than the negative words (the ones they responded "no" to).
REINFORCEMENT
DIFFERENTIATION. Finally (to reiterate a point
brought up in Chapter 5), in successive choice discriminations, learning
will be enhanced to the extent that the reinforcers associated with the
different choices also differ. That is, if we want our animals to make
different responses when seeing a circle and a triangle, then we would
be wise to provide them with different reinforcers such as food for the
response to the circle, and drink for the response to the triangle. Peterson's
group has done a lot of work on this issue (see the related Peterson experiment
on acquired stimulus equivalence discussed towards the end of Chapter
5), but Trapold gets the credit for bringing everyone's attention
to this effect. You may wish to consult the discussion on p. 154 (including
the findings of Carlson and Wielkiewicz) in which an explanation
of this finding is given in terms of expectancies in memory that are less
likely to be confused with one another.
There are several additional points to note about the theory. It claims that both inhibition and excitation grow continuously, so that each additional trial ought to have an effect on habit strength. The theory is thus sometimes referred to as Continuity Theory, to contrast it with some of the other theories that claim sudden or insight-based learning. And it also has nothing to say about the attentional capacities or mechanisms of an organism and how these might relate to learning. It is thus a non-attentional theory. It claims that learning, whether excitatory or inhibitory, involves an association with a specific, physical stimulus. It is thus (no surprise here!) a behaviorist and an associational theory. And it relies on physical similarity of stimuli to explain generalization gradients. By virtue of being a behaviorist, associational theory, it has to make such a claim. This is a result of the positivist, peripheralist approach to learning: Only observable features or events may be included in the theory's description of what goes on.
As you already know, Hull had claimed that generalization was innate, a claim that failed to hold up in a lot of the later work inspired by Lashley and Wade's claim. But whether generalization is learned or innate ought not to be a major sticking point for assessing algebraic summation theory. Even if generalization gradients change over the course of learning, so long as we can still figure out how much excitation and how much inhibition there is, the theory claims that we ought to be able to calculate the strength or probability of a response to a given stimulus.
With this as background, let's look at some of its successful predictions.
S+
Display:
1 2
3 4
5 6
7 8
9 10
Licks:
0 5
15 40
50 40
15 5
0 0
Let us now take a second group and inhibit licking to the 6-circle display by associating licking in the presence of that display with something unpleasant. Again, we will train these animals sufficiently to inhibit licking by 50 responses below baseline (i.e., 50 less licks than our animal would normally take). We will represent these decreases in responding by a negative sign. In this case, an idealized generalization gradient (a gradient of inhibition, of course) might involve the following values:
S-
Display:
1 2
3 4
5 6
7 8
9 10
Licks:
0 0
-5 -15
-40 -50 -40
-15 -5
0
Finally, let us take a third group and run them through discrimination training in which one stimulus (the 5-circle stimulus) is associated with reinforcement, and another stimulus (the 6-circle stimulus) is associated with the unpleasant outcome (which in this case could even be lack of an expected reinforcement). In other words, in Group 3 we combine the training procedures of Groups 1 and 2 above. According to the postulate of algebraic summation, we need simply combine the generalized inhibition and excitation to see what will happen. So, adding the respective values above ought to yield the following predicted generalization gradient for our discrimination group:
S+ S-
Display:
1 2
3 4
5 6
7 8
9 10
Licks:
0 5
10 25
10 -10 -25
-10 -5
0
What do the generalization gradients look like for Groups 1 and 3? Looking
just at excitatory responding (i.e., the gradients of excitation
tracking responding above the baseline level), we should find something
like the gradients presented in Figure 3. In this figure, the gradient
that is presented in a solid color corresponds to the condition in which
discrimination was taught (i.e., in which there was both excitation
and inhibition), whereas the other gradient corresponds to what would happen
with just excitation present. The discriminative gradient exhibits three
important features that are predicted by algebraic summation theory. The
first is a phenomenon we mentioned earlier called peak shift: The
discriminative gradient's peak is no longer at the original S+. Thus, the
peak is now at the 4-circle display in the solid gradient, even though
we reinforced our animals for responding to the 5-circle display. The reason,
of course, is that there has been much generalization of inhibition from
the 6-circle display to the 5-circle display, canceling out a lot of its
excitation.
Second, the solid gradient is no longer symmetrical. It is squashed up on the side where the S- is, due again to inhibition generalizing to the stimuli on that side.
And third, the peak has shifted to the side opposite from the S-.
There is actually a fourth prediction associated with algebraic summation, although it is not graphed in Figure 3: Generally speaking, the closer together the S+ and the S- are, the greater ought to be the peak shift.
So, do these results occur as predicted by the theory? Figure 4 presents
part of a famous experiment by Hanson on this question. Hanson taught
pigeons to peck at a certain wavelength (550 nanometers) for a reinforcement.
In the absence of discrimination training (i.e., when the 550 nanometer
key was the only key present during training), the generalization gradient
looked like the dark blue-line gradient in Figure 4. But when a second
group of pigeons was given discrimination training in which an S- of 590
nanometers was also present, then Hanson obtained the type of generalization
seen in the solid gradient. The peak for that gradient has clearly shifted
to the side opposite from 590 nanometers (the S-), and is now somewhere
around 540 nanometers. Hanson also tested several other groups with different
S- wavelengths closer to the S+; they all displayed peak shift. And although
the asymmetry is a bit hard to see in this example, it is obviously present
in some of the other gradients. Thus, the initial results don't appear
to be far off what algebraic summation would predict (although there is
a discrepant result here which we'll talk about later: Can you spot it?).
We've mentioned earlier that particularly strong tests of theories occur when people can get them to make unusual predictions; in that case, researchers place more emphasis on the results, and gain greater confidence in the theory. In fact, such a prediction is available from algebraic summation theory. It is this: A procedure that results in successful discrimination learning in the absence of any responding to the S- should result in no peak shift! The reason, from the point of view of Hull's and Spence's approach, is quite simple. In order for inhibition to occur, the animal must make a non-reinforced response (recall, for example, the relevance of the Seward and Levy experiment on latent extinction here). So, no response, no inhibition. And if there is no inhibition, then there can be no generalization of inhibition that causes the peak to change, or the gradient to become asymmetric.
But you already know such a procedure from our discussion of discrimination techniques earlier in this chapter: errorless discrimination training. To remind you, this technique involves starting with a non-salient S- that will not attract a response, and gradually increasing its salience until it finally matches the S+. Terrace has experimented with this technique, and reports that it is possible to train a discrimination in the absence of any responding to the S-. More to the point, Terrace reports failing to obtain a peak shift. The theory accordingly seems to correctly predict that peak shift depends on inhibition.
What if we can train a discrimination through the normal procedure (unsuccessful responding to S-) but somehow diminish the inhibition? The theory would also predict lessened or no peak shift in this instance. A clever study that seems to do this was conducted by Lyons, Klipec, and Steinsultz. Their study doesn't quite fit within the Hull-Spence framework, but it is certainly suggestive. They concentrated on the emotional aspects of inhibition, arguing that animals tend to avoid things that are emotionally frustrating or unpleasant, and connecting peak shift to this emotional component. So, to dampen the negative emotionality associated with making a wrong response, they had animals undergo discrimination training while under the influence of a tranquilizer (chlorpromazine). Although the animals made responses to the S- in the course of learning the discrimination, they did not display a peak shift.
The phenomenon of peak shift is also of interest because it is theoretically capable of providing an explanation for a study on relational learning by Köhler conducted on both birds and chimps. In this study, Köhler started by teaching a brightness discrimination. Animals had to learn to pick the brighter light. Then, for the generalization test, he presented the animals with a choice between the light they were trained on, and yet a brighter light. To illustrate this schematically, let us use levels of grey as our stimuli. A corresponding experimental design would then be something like the following:
Training: S- S+ Generalization Test
### ### ### ###
In this design, the animal might do several different things on the generalization test. Standard behaviorist theory would seem to predict that it ought to respond most to the same stimulus it was trained with, as that is where the greatest habit strength is to be found. Recall that associations form to individual stimuli in the standard approach. But Köhler argued instead that the animal was basing its response on comparing the two stimuli to determine their relationship to one another. During training, the comparison results in the animal responding to the lighter stimulus. Thus, during the generalization test, the animal should also compare the generalization stimuli, and respond to the lighter. If it does so, however, it will pick the novel stimulus (the light grey) over the formerly reinforced stimulus (the medium grey). And that is what Köhler's subjects did.
This type of finding is sometimes referred to as a transposition. For Köhler, who was a Gestalt psychologist, the pattern of stimuli was what was important, not their individual identities. Transpositions preserve the same abstract pattern. So, when a melody is played in two different keys, for example, we still recognize it as the same melody. But, the finding seems to present a problem for associational theorists who claim learning involves forging associations to single stimuli, and who do not wish to posit a mechanism by which animals may process abstract information different from the actual present physical features.
Köhler's view is an alternative to the algebraic summation account of what learning involves. But for the moment, do note that peak shift can explain his findings without the need to posit abstract information, or processing some sort of relationship between stimuli that animals can discover. Thus, according to algebraic summation theory, the dark grey stimulus (the S-) will cause the peak to move away from the S+ to the light grey stimulus. We can verify this by playing with the same numbers we used for the earlier demonstration of peak shift:
Stimulus: ### (S-) ### (S+) ###
Excitation Gradient:
40
50
40
Inhibition Gradient:
-50
- 40
-15
Algebraic Summation: -10 10 25
And as you can see from this example, there will be more net excitation to the light grey than to any other stimulus, so it should receive most of the responding.
Peak shift phenomena aren't the only successful predictions of the theory, but they are the most dramatic. There are other predictions as well. We will close out this section with two more. Both concern reversal shifts, transfer paradigms in which the go-no go decisions are reversed in a second discrimination problem. The first is quite straight-forward, and plays off of the finding that learning curves typically start off with a relatively flat section in which nothing systematic appears to be happening. This section is termed the presolution period, because there is as yet no evidence in the animal's behavior that it has changed its responding. However, Hull believed that there was a critical value of net excitation that was required to trigger a response. Until effective reactive potential reached this threshold, no systematic responding would be seen (accounting for the presolution period). That did not mean that no association was forming; the habit strength was certainly changing as the animal made random responses, some of which were correct, and others incorrect. On this account, reversing the stimuli in the presolution period ought to slow learning, since the habit strengths in the presolution period would be the opposite of what was required iafter the switch.
Let's make this example a bit more concrete by providing an experimental design:
Group Presolution Period Further Training On
experimental S+ = triangle
S+ = circle
S- = circle
S- = triangle
control
S+ = circle
S+ = circle
S- = triangle
S- = triangle
For the experimental group, the triangle in the presolution period ought to gain some excitatory habit strength, and the circle ought to gain some inhibitory habit strength. But these are the exact opposite associations from what will be required later, since we want our animals at the end to respond to the circle, and not the triangle. Consistent with Hull's theory, a number of people (including Sutherland and Mackintosh, who are definitely not Hullians!) find that presolution reversals cause negative transfer: The control group shows faster acquisition. Consistent with our earlier discussion of the learning-performance distinction, the lack of systematic performance in the presolution period cannot be taken as evidence that no learning is going on.
The second study is a bit more interesting. Suppose we train our animals on two discrimination problems. For our problems, we choose stimuli that differ on two dimensions (color and character), and we assign the outcomes as follows:
Problem 1 Problem 2
S+
#####
#####
S-
+++++
+++++
In this case, we have four different associations, according to a stimulus-response model of the sort discussed by Hull and Spence. Schematically, we can list these habits as:
Stimulus ##### ----> Approach Response
Stimulus ##### ----> Approach
Response
Stimulus +++++ ----> Avoidance Response
Stimulus +++++ ----> Avoidance Response
Given that our animals have
learned this first set of problems, let us now transfer them to a new set
that involves the same stimuli, but different mappings with the
responses. There are in fact two different types of mappings we might
try to put our animals through. In a reversal shift, we simply switch
the responses, so that each S- becomes the new S+, and vice versa. But
in a non-reversal shift, we keep the responses for one of the problems
identical while switching the responses for the other. The two new problems
can be diagramed this way:
Type of Shift
Problem 1
Problem 2
Reversal
S+ +++++
+++++
S- #####
#####
Non-Reversal S+
#####
+++++
S-
+++++
#####
The associations that have to be present to solve this second set of discrimination problems are as follows:
Shift Association
Reversal:
Stimulus ##### ----> Avoidance Response
Stimulus ##### ----> Avoidance
Response
Stimulus +++++ ----> Approach Response
Stimulus +++++ ----> Approach Response
Non-Reversal
Stimulus ##### ----> Approach Response
Stimulus +++++ ----> Approach Response
Stimulus ##### ----> Avoidance
Response
Stimulus +++++ ----> Avoidance Response
The question that numerous investigators asked was, which type of shift would be easier to learn? If you compare these associations, you will see that there appears to be an obvious answer: All of the associations are different from the original associations in reversal shifts, but only two associations are different in non-reversal shifts. Thus, according to models like Hull and Spence, non-reversal shifts ought to be easier. Kendler and Kendler did a number of experiments using this type of set-up with quite a few species of animals (including relatively young humans), and found precisely that result: Non-reversal shifts resulted in positive transfer, compared to reversal shifts.
You might be tempted to ask why this finding is important. It actually attempts to contrast a theory such as algebraic summation with a more cognitive, relational theory like Köhler's. Thus, Köhler would claim that the animal in initial learning (the very first two problems) is learning to notice shapes of things: It discovers that pound signs (#) are correct, and that plus signs (+) are not. A reversal shift still involves the same abstract difference, so a Köhlerian approach might be tempted to claim that reversal shifts ought to be easier than giving the animal a new abstract relationship (color: the dark things give reinforcement) that has to be learned. But of course, the Kendlers find that doesn't happen. Thus, we seem to have some competitive hypothesis testing that disconfirms a relational approach, and further supports a behaviorist claim that Köhler's results with relational learning must really have been due to algebraic summation causing peak shift.
At least, in animals and very young humans. Because, as a matter of fact, the Kendlers found that reversal shifts were solved more rapidly by older kids and adults. So, they obtained evidence of some major differences in the forms of learning humans and animals undergo. Is human and animal learning really all that different? We turn next to some problems with algebraic summation.
Let us start with the phenomenon of peak shift. There are actually a number of embarrassments for the algebraic summation theory here. Some of these surround the finding of peak shift during normal discrimination training, and others concern what is happening during so-called errorless discrimination training.
Terrace and others had claimed that errorless discrimination training resulted in canceling peak shift due to lack of inhibition (negative emotion in particular, in Terrace's case). Both of these claims have been challenged by others' findings. Several experiments, for example, have claimed a peak shift after errorless training, contrary to Terrace. Also, the question of inhibition and the question of what constitutes "errorless" responding have been raised. Even if pigeons do not ever actually peck the S- key in errorless discrimination, they do often orient towards it, bob towards it, and in other ways, make movements with respect to it that indicate something like a proto-response. Can we safely exclude such acts when we claim that animals make no "errors" in this paradigm? More to the point, Rilling finds that animals will learn a response whose only reinforcer is removal of the S- in an errorless discrimination task. The message of this finding is that there must be some inhibitory or negative emotional quality associated with the S-, since, otherwise, its removal ought not to act as a negative reinforcer.
What about normal discrimination training? The number of reinforced and non-reinforced trials in discrimination training ought to be the major determiner of excitatory and inhibitory associative strength, according to the Hull-Spence approach. However, it turns out the whether we get peak shift also depends on the order in which the S+ and S- trials are given. If the S- trials are blocked together rather than mixed up with the S+ trials, there is little or no evidence of peak shift, even though both conditions are given so that the animal has the same exposure to S+ and S-.
An even more interesting and damaging finding is that peak shift does not always move to the side opposite from the S-. If we present more novel stimuli on the S- side during the generalization test than on the S+ side, then the peak shifts to the S- side. Thomas's group has conducted a fair number of these experiments , in accord with an adaptation-level theory of generalization that Thomas has proposed (though be cautioned that many of these studies used human subjects who, as we have already seen in the work of the Kendlers, may not operate quite in the same way as other species). According to this theory, the subject is acquiring information during both acquisition and generalization about the average stimulus presented in an experiment (excluding stimuli deliberately associated with non-reinforcement). Responding will consequently tend to adapt to that average or central tendency (keep this is mind: It will become of particular relevance when we discuss human categories below!) One prediction of this theory is that normal peak shift occurs because taking out the S- means there are more stimuli on the S+ side (so the average slightly favors that side).
Unlike the algebraic summation theory, Thomas's theory claims that peak shift ought to be found even when there is no discrimination training! As a test of this idea, consider a study by Thomas and Jones. Their human subjects were all given reinforced training with a 525 nanometer stimulus. For the generalization test, people got different ranges of test stimuli, only one of which had 525 nanometers in the middle. In this experiment, the peak shifted with the range. If the test stimuli were generally lower than 525, then the peak shifted to a lower value. And if they were higher, then it shifted to a higher value.
We earlier saw the presolution reversals resulted in negative transfer, in accord with algebraic summation theory. A similar line of reasoning predicts that reversal shifts following massive training ought also to yield negative transfer. In fact, the more training there is with S+ and S-, the more interference there ought to be due to the incredibly strong habits that have been conditioned. However, people such as Reid and Mackintosh have shown that massive increased training paradoxically can have the reverse result: It can make learning a reversal shift easier! This finding, inconsistent with algebraic summation theory, is called the overlearning reversal effect.
The central tendency findings of Thomas and his group suggest that we may be doing relational processing after all. Maybe Köhler was right in claiming relational learning based on comparing a variety of stimuli. In fact, subsequent studies of relational learning have attempted to show such an effect when peak shift could not account for the results. Two experiments that are relevant here involve studies by Gonzalez, Gentry, and Bitterman, and by Lawrence and DeRivera.
Gonzalez et al. did a clever experiment in which they presented pigeons with three lit keys at once. Only the key with the middle color (550 nanometers), however, was the one that provided reinforcement. Thus, in this study, there were two S-'s (540 and 560 nanometers), one on either side of the S+. As you might expect, this resulted in a very narrow gradient when just colors centered around the S+ were presented for a generalization test. However, these experimenters subsequently gave their animals a generalization test involving different groups of three colors. The pigeons had apparently learned the relationship "central color" since their peak was generally the middle color, regardless of which three colors were used. Such a finding is not consistent with algebraic summation, since groups of colors above the 560 (or below the 540) nanometer key ought to show generalization primarily from the nearer negative stimulus. The algebraic summation theory in this case falsely predicts that the peak ought to be with the color furthest away from the S-, and not the middle color.
Lawrence and DeRivera, in
contrast, used a different approach. In their experiment, rats had to take
a left turn or a right turn depending on a complex stimulus consisting
of two shades of grey. The stimulus on any trial was actually composed
of any of seven different shade. In Figure 5, the top
row
presents stimuli similar to the ones the rats were trained with. Basically,
a middle shade of grey (Shade 4) was always on the bottom. If the
animal saw a lighter shade on top (as in the three stimuli on the
left), then it was supposed to turn to the right. But if it saw a darker
shade on top (as in the three stimuli on the right), then it had to turn
to the left for its reward.
One could argue that because the middle shade is present for both left and right, it will be irrelevant to the animal. Alternatively, following relational learning, one could argue that the animal is learning a relation between the two shades (darker on top; lighter on top), and is responding on the basis of that relation. To test these ideas out, Lawrence and DeRivera gave their rats test trials involving stimuli such as the two cards on the bottom. For the card on the left, both shades have been equally associated with turning to the right (since they're light shades); for the card on the right, of course, both shades have been equally associated with turning to the left (since they're dark shades). But waht Lawrence and DeRivera found was that rats presented with the left card turned left, but those presented with the other card turned right. These results are consistent with rats making a "darker than" or "lighter than" relational comparison, and responding on the basis of that decision. Unlike the earlier Köhler study, there was no S- here that could have accounted for these results in terms of inhibition, or of negative emotionality.
Finally, what about the Kendler's finding that adult humans and non-human animals displayed different types of learning preferences for reversal and non-reversal shifts? Mackintosh and Little examined a slightly different paradigm involving what are called intradimensional (IDS, for short) and extradimensional (EDS) shifts. The procedure for these is actually quite similar to what the Kendlers did, except that we won't use the same identical stimuli in the transfer part of the experiment. Thus, we start out by training animals to make a discrimination on multidimensional stimuli. If you refer to Figure 6, note that Group 1 has two problems to solve: discriminating between a blue rectangle and a yellow circle, and discriminating between a blue circle and a yellow rectangle. If you look at which stimulus in each of these problems is labeled S+, you will see that the animal is being rewarded for learning a discrimination involving color: Blue is correct, and yellow is not. On the other hand, in Group 2, the correct discrimination is shape: These animals are being rewarded for choosing the rectangle over the circle, regardless of what color each is.
So, we now take both groups and move them to the new set of problems
presented in the bottom half of Figure 6. These involve very different
shapes than earlier, and very different colors (precluding generalization
as an account of the results). But if you look at the S+ assignments, the
dimension or relation of shape is still relevant: The animal needs
to respond to the plus signs and avoid the triangles. For the animals placed
on this set of problems, whether something is red or green is irrelevant.
For Group 2, the transfer problem is an intradimensional shift:
The dimension stays the same, though new stimuli are presented.
But for Group 1, the transfer problem is an extradimensional shift:
They need to move from responding to the dimension of color to responding
to the dimension of shape. Reversal shifts are a type of IDS, of course,
and non-reversal shifts may be regarded as a type of EDS. In this situation
where stimulus generalization or overlap of stimulus associations will
not predict performance, both animals and humans generally
find IDS easier to learn. This result is consistent with relational learning
rather than algebraic summation theory if we add that whatever relation
has proven important in the past will be first tried out on a current problem,
if at all possible. Thus, IDS problems support a claim of learning about
abstract dimensions such as color or shape, rather than physical values
such as yellow or triangular.
I earlier asked you to find
a discrepancy with algebraic summation theory's claims in Hanson's peak
shift data. Did you find it? In Figure 4, the new, shifted peak is much
higher than the original, unlike what we would expect to happen (see Figure
3, where the new peak is much smaller than the old peak). Such a finding
is called a behavioral contrast. It is a common feature of peak
shift accompanying discrimination training. We may be able to get algebraic
summation theory to handle such a finding by talking about incentive
motivation (K), but clearly more is going on than was expressed
in the four simple postulates of algebraic summation theory. That doesn't
necessarily represent a strong disconfirmation of the theory. As a strategy,
many theorists first start out with an over-simplified theory, and push
it to see how far it will go in explaining phenomena. As it starts breaking
down, they add complexity to it. But the preference remains for simple
theories.
There are, of course, a number of major differences between this approach and that of Hull and Spence. We have already looked at some of these differences, particularly insofar as they involve Köhler's ideas on relational learning. But two additional ideas ought to be stressed at this point. One is that learning on this account will be intimately related to what the animal pays attention to. As in humans, animals are assumed to have a limited capacity for noticing or attending things in their environments (indeed, presumably a smaller capacity than we do). Thus, only what the animal pays attention to may be coded as a hypothesis. As a simplifying assumption, most theorists start with a model in which animals attend only a single dimension or feature. And second, learning as in Guthrie's theory should have an all-or-none quality (and recall that Voeks showed that all-or-none learning did occasionally occur!). Until the animal has selected the proper hypothesis, no contingent responding will be obtained. Once it has selected the proper hypothesis, however, its learning is essentially complete (although various factors may still influence performance). For this reason, this approach is sometimes labeled non-continuity theory, to contrast it with the gradualist, incremental continuity theories of Hull and Spence.
On their account, there are specialized attentional mechanisms called analyzers that process information about different stimulus dimensions. When a stimulus is presented, it presumably activates various analyzers that detect the stimulus's shape, orientation, size, color, etc. However, these analyzers don't all activate at the same levels. The salience of a given dimension will in part determine how strongly the analyzer kicks in. Thus, we have an account for how salience can influence discrimination learning in procedures such as errorless discrimination training. A very low level of salience will hardly activate an analyzer, so that its activity cannot dominate the animal's attention. But in addition to salience, experience will also affect the activity level of an analyzer. When an analyzer is consistently associated with reinforcement (or successful responding in general), its activity is boosted. Thus, what an animal pays attention to will depend on the relative activity of its analyzers.
The activity of an analyzer, of course, can be taken as mapping into the hypothesis an animal evaluates. When that hypothesis proves fruitful, that analyzer becomes even more dominant in its activity. But when the hypothesis fails to work, the analyzer's activity level is decreased, and another analyzer must then capture the animal's attention.
As you can see from this description, there is a very simple prediction to be made here: Whatever has worked in the past is what the animal will tend to try out in the present. Because something such as shape, for example, worked in the past, the shape analyzer is likely to be stronger than other analyzers. We already saw one verified prediction that fits this claim: the results with IDS and EDS (Mackintosh and Little). IDS involves the same analyzers (although different values of the dimension the analyzer is responding to), so that the animal being put through IDS training ought to have an advantage. It is already looking for specific hypothesis concerning shape.
And that brings us to the second stage. In this second stage, the animal has to focus on various features or values of the proper dimension, and has to acquire knowledge about which response it has to make. Presumably, by accidentally making a response (in the case of a naive animal first being put through training), it discovers that a consequence such as obtaining food occurs. So, hypothesis testing ensues in terms of developing an appropriate response, and connecting that response with the right stimulus. The Kohn and Dennis observational learning study discussed in Chapter 5 is relevant here. In one of their groups, rats observed other rats making a discriminative choice response to one of two stimuli. Observational learning in this case presumably activated the appropriate analyzers because of vicarious reinforcement. These animals started learning about both the correct stimulus and the correct response. But Kohn and Dennis played a nasty trick on this group: They had to learn a reversal shift! That is, they had to approach the stimulus they observed others avoiding. And as you may recall, this group showed negative transfer: They were the slowest to learn.
Thus, learning will be affected by factors in both the first and the second stage. Another group of rats in Kohn and Dennis's study who did not see the stimuli learned faster, even though they had to go through Stage 1 learning cold compared to our group above. So, as you can see from this type of analysis, paying attention to the proper dimension isn't the only important factor. If you form a specific but wrong hypothesis about a feature-response contingency, it can cause massive interference. Some such mechanism may account for why most animals (though primates are sometimes an exception!) Find reversal shifts so difficult. In reversal and non-reversal shifts, there is a trade-off between Stage 1 and Stage 2 learning.
Anyway, with this as a short introduction to attentional hypothesis testing, let us look at some of the findings.
Now, let's look at several possibilities regarding learning. According to the Hull-Spence theory of learning, the most likely result is that the animal forms an association to the stimulus complex, the red background/white triangle stimulus. If that is the case, a generalization gradient ought to show some reduced but still strong responding to a stimulus that has just the red color (it is somewhat similar to the S+), or one that has just an illuminated white triangle (again, it is somewhat similar). Alternatively, maybe one of these was a lot more salient than the others, in which case we might find much more generalization to the more salient stimulus. Color might be more salient than subtleties of shape for pigeons, for example, so we might find significantly more generalization to color for all pigeons that go through this experiment.
But, attentional theory makes a different prediction. First, on the assumption of limited attentional capacity, we would expect these birds to attend one dimension, and not both (the assumption of selective attention). So, we would predict the animal responds to red or to triangle, but that the combination of the two is not terribly important. We can call this winner-take-all processing: The strongest analyzer (the winner) draws all of the attention, so that the animal effectively engages in hypothesis testing (looking at one thing at a time). And second, if there is no prior reason to believe that the shape and color analyzers have become differentially activated due to experience or salience, then by accident some pigeons may have their attention dominated by the shape analyzer, and others by the color analyzer.
So what happened? Reynolds reports on two pigeons who were given generalization tests following training on the discrimination task. The tests involved exposing the pigeons to one of four buttons: red, green, lit triangle, and lit circle. In terms of pecks per minute, one pigeon simply avoided red, treating it as equal to triangle and circle (which were parts of the S-). And the other pigeon, showed a similar, pattern, except that it avoided triangle. Put simply, the first pigeon's hypothesis was clearly that it was supposed to peck at triangles, and the second pigeon's hypothesis was that it was supposed to peck at red things. This study thus displayed strong evidence of selective attention.
Curiously enough, the study also showed more responding for both pigeons to the complex (the red and the triangle) than to whichever element that pigeon responded to on later generalization trials. The fact that each bird ignored one of these elements when it was by itself, however, may suggest that this element in the complex served as an occasion setter, much as contextual cues will help remind us (by priming memory) of what we are supposed to do. Thus, for the bird with the hypothesis red, the presence of the triangle may have acted as a retrieval cue that helped the bird remember what it had done in the past when faced with red things. A red key, of course, might also help to prime, but the priming will be stronger when all cues are present.
A second experiment we can discuss in this context involved a clever study by Lawrence. In this experiment, Lawrence used a complex transfer design that avoided both stimulus generalization and response generalization. In a first phase, rats learned to jump to a chamber, whereas in a second phase they learned to run a T-maze. Running and turning to the left or right is a completely different response than jumping to a chamber, so no response generalization would be expected: Animals in the second phase have to learn a completely new response. And by the same token, chambers and T-mazes represent quite different stimuli, precluding generalization on that basis. If you earlier jumped to a black chamber instead of a white chamber, how would this help you decide what to do when placed in a black rather than a white T-maze?
Since Lawrence's design was rather complex, let's diagram part of his experiment:
Group Simultaneous Discrimination Successive Discrimination
1
black vs. white chamber
black vs. white T-maze
2
large vs. small chamber
black vs. white T-maze
3
rough vs. smooth floor chamber
black vs. white T-maze
Here, in the first phase, the rat had to jump to one of two chambers from an elevated stand. This, of course, is a choice discrimination in which jumping to the wrong chamber results in lack of reinforcement. In fact, animals who jumped to the wrong chamber were unable to get in to it, and fell into a supporting net beneath the chambers. From the point of view of algebraic summation theory, the two types of discrimination are so different that we ought to predict no generalization (i.e., no transfer) from one to the other.
But what about successive discrimination? In this transfer task, all of the animals had to learn that one color was associated with a turn to the left (because that is what resulted in reinforcement), and another was associated with a turn to the right. In fact, rather than the two T-mazes listed above, Lawrence actually put his animals in four mazes on various trials. Thus, there were two black mazes, one with a rough floor and the other with a smooth floor, and two white mazes, similarly distinguished by the roughness of the floor. The roughness of the floor was irrelevant to what the animal had to do to get food, however. Only color was contingent with location of the reinforcer.
You should be able to predict what happened in this latter task. Group 1's analyzer for color should have been strongly active, whereas Group 3's analyzer for texture should have been strongly active. Thus, placed in these mazes, these groups should have been initially attending very different things about the mazes. Specifically, Group 1 should properly have been checking out color as its first hypothesis, and that should have speeded Stage 1 learning. But Group 3 should have checked out the wrong hypothesis first (that texture is somehow linked with what the animal is supposed to do), and that should have slowed learning. The results were in line with these predictions: Group 1 learned the maze problem more rapidly than Group 3.
And what about Group 2? Their analyzers for size are active coming in to the maze problem, but since the mazes don't differ on size, that analyzer quickly becomes deactivated. Some of these animals would have subsequently checked out texture first, and others, color. Thus, averaging over the animals that chose the right hypothesis and those that chose the wrong hypothesis, we ought to find that Group 2's learning is somewhere in between Group 1 and Group 3. And that is what happened.
Let us take one more series of studies that fit in with attentional hypothesis testing. This series involves work done by Harlow on learning sets. In these experiments, Harlow effectively asks the question of how quickly and efficiently animals can learn to evaluate hypotheses. Harlow sets up his experiment so that his animal subjects (typically monkeys, but the same results have been found in a number of other species) have to learn a long series of simultaneous discrimination problems. Once they have acquired one problem, they are moved to the next. However, one of the questions Harlow asked was whether his animals would be capable of a win-stay lose-shift strategy? That strategy is in some sense the backbone of a hypothesis testing approach: As long as what you are doing is correct, stick with it (win-stay); when it leads to a wrong answer, change it (lose-shift). What Harlow found was that his animals were ca