Chapter 3: Representation-Level & Behavioral-Level Theories of Classical Conditioning(1)

 
Overview: This chapter is arranged in four major sections. The first briefly reviews two simple contiguity models of classical conditioning (the models of Hull and Pavlov), and their problems. The second introduces the work of Rescorla. It presents his original contingency approach, what it was designed to handle, and why Rescorla abandoned it for the subsequent Rescorla-Wagner theory . The major successes of this latter, still influential theory are discussed, and its problems considered. The third discusses the Comparator Model Approach to classical conditioning. Two comparator models are briefly presented, along with their differences (in terms of processes and predictions) from the Rescorla-Wagner model. Finally, the fourth section briefly examines several memory-based models of conditioning. Mackintosh's Attentional Model is included here, but the emphasis in this section is on Wagner's Rehearsal Theory. Themes that will prove important in this section include the status of inhibition, and the adequacy of a behavioral-level approach that fails to include centralist, cognitive processes in accounting for learning.
 

I. Two Simple Behavioral-Level Theories

        In this section, we briefly reconsider Pavlov's and Hull's theories of classical conditioning. In some sense, Pavlov's theory was a budding physiological-level account of the process of learning, although the actual procedures that were required for forming an association could be described in terms of behavioral-level units. Later accounts of classical conditioning tended to ignore Pavlov's theoretical claims, and substituted a behavioral-level approach that avoided talk of cortical brain centers. The resulting S-S and S-R theories proved inadequate to handle a number of findings.
 

A. The Simple Contiguity Models

        To remind you, Pavlov believed that classical conditioning involved forming a cortical connection between brain centers. Hull, on the other hand, thought that the incoming stimulation caused by the conditioned stimulus would acquire the ability to excite the motor muscle movements that made up the unconditioned response. While different in certain important respects, these theories made several similar assumptions. Four in particular deserve mention.

        The first assumption is the assumption of effective stimuli. So long as we have a CS or a UCS that is picked up by the animal's sensory system, classical conditioning will be possible. This assumption is meant to exclude failures of conditioning that are due to stimuli that simply do not register for an organism. Failure to condition a CS involving a certain sound wavelength in subjects incapable of hearing that wavelength is of no theoretical significance. (Notice, however, that we cannot use conditioning failure to decide that our CS or our UCS must have been an ineffective stimulus, because that would make the requirement of effective stimuli a completely circular claim that effective stimuli are whatever works. So, we need other procedures to convince ourselves that an animal notices or otherwise reacts to a stimulus. If we can show that, then we have an effective stimulus, and we ought therefore to be able to demonstrate classical conditioning with it.)

        The second of these may be termed the assumption of independence of conditioned stimuli. According to this assumption, the formation of an association with one CS ought to be completely unaffected by any prior training with another CS, or by the concurrent presence of other CSs. (Within reasonable limits, of course: a CS that is a blinding flash of light will certainly make another visual CS difficult to see. Cases such as these are excluded to some extent by the assumption of effective stimuli: The blinding light makes other visual stimuli ineffective.) In theory, then, an association may form between a UCS (or a UCR, in Hull's model), and numerous other stimuli.

        The third assumption, of course, is the assumption of temporal contiguity. All that is needed for an association to form is the presence of effective stimuli (CS and UCS) at about the same time. (This assumption sometimes explicitly specifies that spatial contiguity of some sort is also required. While you are reading this, someone somewhere in the world is whistling. You don't form that connection, of course. But then, whistling half way across the globe is not an effective stimulus, because that stimulus does not register on your sensory system. Spatial contiguity is thus implied by the requirement for effective stimuli.)

        The fourth assumption is the assumption of continuity: An association has continuously varying levels of strength in between 0% (no association) and 100% (maximum association). Normally, the process of classical conditioning (in which there are several pairings of effective stimuli) results in increases in level on each additional trial until the maximum strength is achieved. Thus, acquiring an association may be regarded as a matter of degree.

        Despite these similarities, there are also significant differences. There are, of courses, differences in what the CS is associated with. But more significantly, recall that Pavlov was a centralist who talked about the association forming in the brain. Hull was more of a peripheralist who talked about a link forming between stimulus and response. Both included a notion of inhibition in their systems, but the inhibition involved very different mechanisms located in very different places. For Hull, inhibition was associated with the making of a response. For Pavlov, in contrast, the CS center in the brain contained both an excitatory and an inhibitory mechanism. The inhibitory mechanism could become conditioned to 'turn off' or otherwise block the activation occurring as a result of the CS center being stimulated.

        In addition to the Pavlovian and Hullian models, we can also identify a behavioral-level version of the Pavlovian model that appealed to later behaviorists who didn't want to speculate about the physiological operations of the brain. In that model, two stimuli were associated, much as a stimulus and a response might be connected. Thus, some behaviorists came to treat classical conditioning as S-S learning, as opposed to the more traditional S-R approach. Regardless of which approach (behavioral versus physiological) was followed, however, theorists who did not adopt a Hullian account of classical conditioning tended to talk about learning as the CS substituting for the UCS: stimulus substitution theory.

        One more wrinkle: Behaviorists in general (and Skinner, in particular) were not too happy with the notion of inhibition. You can easily imagine why. Inhibition as applied in extinction and discrimination appears to involve an association between a physically present event (the CS) and a non-present event (the UCS). As pointed out earlier, that seems to violate temporal contiguity, although it can also be taken to violate the assumption of an effective stimulus (what is the stimulus in the case of the missing UCS?). Following an influential critique by Skinner that claimed that many effects supposedly due to inhibition could be accounted for in some other manner, behaviorist theorists generally stopped talking about inhibition. Within the context of classical conditioning, this meant an attempt to explain various phenomena as being due to competition of excitatory associations, an approach which has a modern-day descendant (although ironically, one with a distinctively cognitive flavor) in the comparator theories in Section III below.
 

B. Inadequacies

        We presented a number of inadequacies with these approaches in the previous chapter, so we need only briefly review them here. For the sake of convenience, we may group the inadequacies into the following categories: (1) problems with the assumption of contiguity; (2) problems with the assumption of effective stimuli; (3) problems with the independence of conditioned stimuli; (4) problems with a peripheral-level description of the events being associated; and (5) problems with a response-based description of learning. These are not always completely separable categories.

Problems With The Assumption Of Contiguity

        The assumption of contiguity really contains two claims: that temporal contiguity is necessary for an association to form, and that, given effective stimuli, it is sufficient (meaning that you need nothing other than temporal contiguity). Both claims seem at variance with the evidence. Thus, a number of studies show conditioning in the absence of temporal contiguity, contradicting the first claim. These include the work of Garcia, Ervin, and Koelling, along with that of Kalat and Rozin. In these studies, excitatory learning with delays of hours is possible (long-delay learning). Moreover, as Hinson and Siegel show, long-delay learning is not the exclusive province of taste aversions: They obtained inhibitory conditioning with a 10 sec delay between CS and UCS (not a huge delay, but certainly beyond the amount normally allowed for temporal contiguity). Moreover, the claim of sufficiency also appears at variance with the results. Thus, some studies show failure of conditioning despite close temporal contiguity. Among these are Rescorla 's demonstration that conditioning will not occur without contingency and Kamin's work on blocking and overshadowing (see also Rescorla and Wagner's blocking study). In blocking, to remind you, a stimulus that is paired with the UCS fails to bond with it because of the presence of another stimulus that has had more pairings. The UCS pre-exposure effect may also serve as an example of this problem. So, too, will Rescorla's demonstration that the explicitly unpaired procedure results in inhibition, as the CS and the UCS have never been presented together in this situation, resulting in the contiguity principle falsely predicting no learning.

Problems With The Assumption Of Effective Stimuli

        The assumption of effective stimuli implies that any effective CS and UCS should form a link, if paired. Thus, work showing that effective stimuli do not always bond is relevant here. Obviously, this work may certainly include the studies in the paragraph above on blocking, since we know from control conditions and unblocking studies that the blocked stimulus under other circumstances would forge a strong link with the UCS. But in addition to that type of finding (which depends on presence of other stimuli), we find conditions in which no blocking or overshadowing has taken place, but in which, nevertheless, conditioning fails to occur. The most striking examples of this involve the belongingness studies ( Garcia and Koelling; Wilcoxin et al.) that find that certain recombinations of CSs and UCSs simply do not work (though see Krane and Wagner). There are studies involving excitatory-appetitive conditioning that come to the same conclusion, however. Colavita in a study with dogs found no conditioned response to a tone when using a weak acid solution that triggers drooling. Finally, an effective stimulus if defined in terms of physical energies hitting the sensory organs would be somewhat embarrassed by Kamin's study on CS salience in which lack of noise proved to be an adequate CS.

Problems With The Independence Of Conditioned Stimuli

        A no-brainer, this one, but it deserves separate mention, as this was one of the major impetuses for the development of new models of classical conditioning (the other major impetus was the issue of inhibition). As the work by Kamin and by Rescorla and Wagner and by Dickinson et al. on blocking , unblocking , and overshadowing showed, how strong a bond would form between a given CS and the UCS on any given trial depended on what other CSs were present at the time. The contextual conditioning accounting for the UCS pre-exposure effect fits in here, as well.

Problems With A Peripheral-Level Description

        In behaviorist S-S and S-R models, a stimulus directly associates with another event. However, devaluation studies such as Holland and Rescorla showed that a changed value for an event after learning would nevertheless influence that learning. That is interpreted by many people as strong evidence of conditioning at least involving mental level events or representations. Moreover, Light and Gantt's study with the temporarily paralyzed dogs concludes that an overt response need not be made for conditioning to take place, a claim at variance specifically with the S-R view.

Problems With A Response-Based Description

        For Pavlov (and later S-S behaviorists), classical conditioning was about exciting a response that resembled the UCR because the response was in fact being indirectly triggered through the CS-UCS link. For Hull, classical conditioning was about exciting a response that resembled the UCR because the response was being directly connected to the CS. In either case, strong resemblance between responses was expected . But as we know, there are studies that fail to show this result. Thus, in antagonistic or compensatory conditioning (Obrist et al; Siegel), the CR may often be the opposite of the UCR. And as we know from studies like Holland's, different CSs associated with the same UCS may elicit very different CRs.

Other Issues

        There were many other issues, as well. For example, one-trial learning could be taken as negating the assumption of continuity. Although one-trial learning is the exception rather than the rule, it is found with the CER paradigm if the UCS is unpleasant enough, and it is most certainly obtained in the learned taste aversions paradigm. Latent inhibition and familiarity were also embarrassments: If classical conditioning is slowed or knocked out by familiar CSs, then how much of our learning could really be due to it? Certainly, for adult organisms, many stimuli in their environments are no longer novel, suggesting that most learning can occur only for juvenile organisms exposed to a still-unusual world (or for subjects brought into a laboratory and exposed to artificial stimuli such as saccharine-flavored water). And finally, a major issue swirled around the question of inhibition . How could procedures such as backwards conditioning or the explicitly unpaired procedure result in inhibition, as shown in Rescorla's work? In both of these, there were clear violations of the principles of temporal contiguity and effective stimuli (since the stimulus in both could be said to be absence of UCS).

        Thus, the time was right for a quite different approach.
 
 

II. The Rescorla-Wagner Model: A Neo-Contiguity Approach

        In the mid to late 60s, a new approach was spearheaded in large part by Kamin, Rescorla, and by Wagner. This approach took on a much more cognitive favor in its explanations of what learning was about. It was undoubtedly influenced by the work on memory and learning exploding in studies with humans (the 60s was the period of the cognitive revolution in experimental psychology), and by studies of animal working memory that were coming out in the late 60s and 70s. It culminated in a model, the Rescorla-Wagner model, that is still the mark against which other theories are assessed. Although that model itself turned out to have a number of problems as we will see, it has yet to be replaced by other models as precise in their predictions, and as capable of successfully predicting such a wide range of phenomena.
 

A. Rescorla's Contingency Principle

        In several articles beginning in the late 60s (and a very influential review in 1969), Rescorla resurrected the issue of inhibition in Pavlovian conditioning. One of his claims, in particular, upset the applecart. Many people up until then had used the explicitly unpaired procedure as a control condition for an experimental group that had the CS and the UCS paired together. The difference between the reactions to the CS in both groups could thus be taken to assess the strength or success of learning. But Rescorla demonstrated that this comparison was potentially flawed, because the control group was not a neutral group showing no learning. Rather, they developed a type of learning called inhibition. Thus, Rescorla in his review article proposed that the summation and retardation tests be used to verify whether lack of a response was due to no learning, or to inhibitory learning. But more to the point, he also realized that a theory needed to be developed that could account for inhibitory learning when one or the other of two stimuli (the CS, the UCS) was absent. His first approach was contingency theory , the notion that animals are sensitive to the co-occurrence between events . Such sensitivity to co-occurrence or correlation enables use of one event to predict something about the other. On a contingency view, then, learning is about predicting.

        A good introduction to the notion of contingency may be found in the following quote by Rescorla (1968, p. 1):
 

The notion of contingency differs from that of pairing in that it includes not only what events are paired but also what events are not paired. As used here, contingency refers to the relative probability of occurrence of US in the presence of CS as contrasted with its probability in the absence of CS. The contingency notion suggests that, in fact, conditioning only occurs when these probabilities differ.
Thus, in this early approach, the probability of successful conditioning could be described in terms of a two-dimensional contingency space in which the dimensions corresponded to (1) the probability of the UCS following the CS (this probability varied from 0 to 1, of course), and (2) the probability of the UCS following absence of the CS (also varying from 0 to 1). (This contingency space represents a number of simplifying assumptions, but most particularly including the assumption that CS and UCS do appear on some occasions, though not necessarily together.) Let us term these two probabilities Probability 1 and Probability 2, respectively. This being the case, we may identify three hypothetical conditions. In the first condition (involving a positive contingency ), Probability 1 (that UCS follows CS) is greater than Probability 2. This is precisely the circumstance under which Rescorla claimed that an excitatory CR should occur. The reason is that the CS is a better signal or predictor of the UCS than absence of CS is. In the second condition (involving zero contingency), Probability 1 is equal to Probability 2. As you may gather from the quote above, no learning having to do with the CS should occur. The CS doesn't matter here, because you get the UCS at the same rates regardless of whether the CS is there or not (a very different prediction than is made by contiguity theory, to remind you). And finally, in the third condition (involving negative contingency ), in which Probability 1 is smaller than Probability 2, Rescorla sought an explanation for inhibitory conditioning; conditioning that would pass the summation and retardation tests. If excitatory conditioning is the process of preparing for an upcoming UCS, then inhibitory conditioning is the process of preparing for its absence. And in this condition, the CS does a better job of signaling the UCS's absence, since the UCS is present more often when the CS is not there.

        The concept of contingency is important. Indeed, sometimes it is critical. Just how important it is may be seen from the hearings that were held to determine the cause of the Space Shuttle Challenger exploding. The explosion's cause was eventually tracked to a faulty seal or 0-ring that allowed leakage of explosive rocket fuel. The 0-ring had turned brittle on a fairly cool morning, so it no longer sealed properly. People had earlier concluded that 0-ring failure had little to do with cold temperatures. The reason was that a check of 0-ring problems in previous space shuttle flights showed no relationship with temperature: There were some failures both at high and at low temperatures. But this is like asking how often the UCS is there when the CS occurs. The UCS may be there 50% of the time, so that just looking at this figure seems to suggest the CS is uninformative. But if the UCS is never there when the CS is absent, then it becomes obvious that the CS does actually predict the UCS. And that is what happened in part with the 0-ring problem. No one had looked at temperatures when 0-rings didn't fail. If they had, it would suddenly have become clear that 0-rings generally did not fail in warm temperatures; they were much more likely to fail in cold temperatures (see, for example, Gleick, 1992, pp. 427-428). Contingency requires tracking two probabilities.

        The notion of contingency seems to suggest to many students an active animal doing some pretty high level thinking. But in fact, such need not be a requirement of this approach. Among humans, for example, there is evidence that we are extraordinarily sensitive to situational frequency (how often some event occurred in a given context). According to Hasher and Zacks, such sensitivity may be automatic, requiring relatively little attention or high-level cognition on our part. An animal sensitive to frequency information (including co-occurrence frequency) may thus have a built-in mechanism informing it about contingency.

        Regardless of what process underlies the acquisition of contingency information, however, Rescorla gave up a strictly contingent approach in 1972. There were a number of reasons for this. Second-order (or higher) conditioning, for example, ought to be inhibitory (instead of excitatory) on this account, since CS2 is never paired with the UCS. Also, many studies exist showing effects of temporal contiguity in addition to contingency. A study we mentioned in the last chapter by McAllister, for example, showed steadily worsening conditioning with increasing CS-UCS delays in the human eyeblink paradigm. And as is suggested by Hinson and Siegle's study involving 10 sec intervals resulting in inhibition, there is a complex interaction possible between contingency and contiguity. At what point is the interval so long that the UCS is signaled more by absence of the CS?

        Also, as we have seen, there is occasionally one-trial learning. The problem one-trial learning poses is that a contingency cannot have yet been formed, since the contingency requires tracking the relative probabilities of the UCS when the CS is there and when the CS is not! Moreover, how many trials are required for the animal to acquire a contingency? This approach suggests having first to collect a fair amount of data, so that it would seem to significantly delay learning.

        In short, it became clear that contingency by itself would not work. But the notion of predictability that lay behind the original insight into contingency was important, and it was firmly implanted in a neo-contiguity model Rescorla developed with Wagner. The model included a contingency-like mechanism for calculating what should happen when a stimulus appeared without a UCS, although in this later model, that mechanism depended in turn on processing contiguity of a sort. It was the Rescorla-Wagner Model.
 

B. The Rescorla-Wagner Model

        You will recall the standard Rescorlian definition of classical conditioning given in the previous chapter, the learning of relations among events. Before getting to the details of the Rescorla-Wagner model, let us stop to think a bit more about what that definition means, and how it differs from standard accounts. In a 1978 chapter, Rescorla pointed out that there were really three distinctions of concern in this definition. First among these was that two events may have any of several relations . Specifically, they may be related because of temporal contiguity (the type of relationship important in Pavlov or Hull), but they may also be related because of contingency. Moreover, other relations may also exist, and are worth studying. Similarity is one, for example, and we certainly know from the Rizley and Rescorla study on higher-order conditioning that similarity will influence what happens.

        Second, in contrast to previous definitions that focused on what type of response was being acquired through which type of procedure (S-S or S-R pairing), Rescorla's definition focuses on "the underlying learning from which that response stems " (1978, p. 16, italics added). This embraces the learning-performance distinction : Performance (the behavior of the animal) presumably depends on learning, but is not the same as learning. Thus, we implicitly require a mechanism by which the animal's motives along with its knowledge combine to create a response. In this sense, unlike in the other, older definitions of classical conditioning, learning is not the acquisition of a new trigger for some old response. It involves knowledge from which an appropriate response might be selected (similar to Tolman's claims regarding instrumental conditioning: see the next several chapters). So, any change following classical conditioning may be taken as evidence of success. (When we discuss theories of operant and instrumental conditioning in a later chapter, you will see that Hull also makes a learning-performance distinction. His distinction, however, has to do with whether there is enough activation to trigger the response, and not with what response the animal may choose to emit.)

        Third, if the animal is learning about the relations between events, it is also learning about the events themselves. Thus, specific characteristics or qualities of those events should be capable of influencing behavior. Such a claim can account for the results of devaluation studies such as Holland and Rescorla , or for the fact that different CSs paired with the same UCS sometimes result in different CRs (as in Holland).

        It is important that you be aware of these points because, in discussing the Rescorla-Wagner model, we will be making some simplifications that are not readily consistent with these. In particular, we will temporarily ignore the learning-performance distinction, and use performance as our indirect measure of learning. So, below, when we ask how strongly a CS has conditioned, we will answer this question loosely by speaking of how strong the conditioned response has become. That, of course, assumes that we know what a good conditioned response is to track in a given conditioning situation, and that we have set up the situation so that our animal is motivated to perform in appropriate fashion.

The Delta Rule

        The Rescorla-Wagner model appeared in 1972. Based on their interest in conceptualizing classical conditioning in terms of some cognitive notion of predictability, they developed a model that essentially centers around the degree to which a UCS is processed (translating roughly into some measure of the strength of a conditioned response). Basically, a UCS will be processed (and learning will ensue) proportional to how unpredictable or surprising the UCS is. An unsurprising UCS is one that needs no additional learning; the animal already knows about it. Accordingly, it is the surprising UCS that will activate the search for a predictor or signal. Surprisingness is thus one of the first characteristics we can point to in discussing when learning will occur.

        In addition to surprisingness, however, there is the issue of the informativeness of the CS. Put briefly, informativeness refers to the extent to which the CS acts as a good signal or predictor for a UCS. Informativeness itself comprises several different features. One of these is relative signal value: If there are several CSs present, is one more likely to be selected than another? We have briefly come across this issue in discussing blocking: A CS that is perfectly redundant with another already-attended CS will not be likely to condition. In thinking about signal value, you should wonder about contextual cues, which are generally present when the UCS occurs. What makes a CS a better signal that the background context? A second characteristic of informativeness encompasses the notion of contingency: Is the UCS more likely when the CS is there, or when it is absent?

        In any case, with this as background, the Rescorla-Wagner model was presented in terms of a mathematical formula that represented a claim about how surprisingness and CS informativeness would predict the formation of an association. That formula appears below (in slightly different form than was given in their paper, though):


 

        In this formula, we designate an individual CS and an individual UCS by the subscripts i and j, respectively. The symbol V can be taken to represent the strength of an association (which also means that, in practice, we will allow it to represent the strength of a conditioned response). This strength may be positive (indicating excitation), negative (indicating inhibition), or zero (indicating no association). Thus, Vi is the strength of the association between CSi and UCSj, or -- loosely speaking -- the strength of its conditioned response. The triangle symbol (the delta) in front of the Vi is a mathematical symbol that means "change in." So, what this formula will attempt to do is inform us about how much change there will be in the conditioned response (or association) on any given trial. For that reason, it is sometimes referred to as the delta rule.

        What about the V PresentCues? As we have seen from compound conditioning (and in particular, from our discussion of overshadowing), several CSs may sometimes be present on a given trial. Thus, there may be several CSs predicting the UCS on that trial. The VPresentCues is the extent to which all the cues present on a given trial are already predicting the UCS! So, in measuring a change in association on a trial, we will need to know how surprising the UCS is. If it is adequately predicted, then it won't be surprising at all, and no learning should occur.

        An important point here: In compound conditioning where several cues or CSs are present, we are actually conditioning multiple CRs! The formula will have to be used separately for each present CS to determine what its CR will be. As a simplifying assumption, the overall reaction to the compound cues will be the sum of their individual conditioned responses.

        How do we know whether the UCS is adequately predicted? Presumably, each UCS has a potential for causing a certain amount of learning that is closely related to its biological importance or significance. More important events have higher potentials, and may thus support higher levels of processing. So long as the UCS is being processed, additional learning will occur. A UCS's maximum potential is given by lambdaj in the formula above. This value may vary from 0 to the maximum potential. The value will be 0 during extinction trials , when an expected UCS is not there. Its maximum potential will depend on the individual UCS, of course, but will be very closely connected with UCS intensity: Higher levels of UCS intensity should result in higher maximum potentials.

        We can speak of the maximum potential as the maximum association a UCS can form with a given CS. When that level is reached, the UCS is no longer surprising. In terms of the concepts we have introduced, the maximum level in simple conditioning may be regarded as identical to the asymptote of a learning curve (see the previous chapter), and the degree to which a UCS is surprising (and thus processed, causing additional learning) will be equal to the maximum level minus the extent to which the UCS is already predicted. Or in other words, the quantity below represents the surprisingness of the UCS:


 

        Notice that when that quantity is equal to 0, there will be no learning. When the quantity is positive, there will be excitatory learning: The conditioned response should become stronger. But note also that it is theoretically possible that the quantity will be below 0. In this latter event, a conditioned response should decrease in strength and, in some interesting circumstances, may actually become negative, resulting in inhibitory learning.

        The final values in the formula, alphai and betaj , refer to the noticeability of the CS and the UCS respectively (normally, their intensity, although we have seen that for a CS the important issue is how big a change from background levels it represents). These are referred to as rate parameters: They determine the rate at which asymptote is approached, and therefore determine the speed of learning. These values will always be positive for effective stimuli (the values are at 0 for non-effective stimuli, and any 0 value in the formula will result in no learning occurring). The higher these values are, the bigger the change you will see on any given conditioning trial. The alpha parameter has the same subscript letter as the delta-V, to indicate that we have to use the rate parameter for the specific CS whose change we want to track; and for a similar reason, the beta parameter has the same subscript as the lamda. The VPresentCues has a different subscript because it is dependent on all the CSs present on that trial!

        Below, when we look at some of the predictions the model makes, we will use a simplifying (and wrong) assumption that betaj is equal to 1, so that all we need to talk about is the salience of each of the CSs appearing in an experiment, and the asymptotic level of association of the UCS. That is, our calculations will be based on the following formula:


 
 

Some Successful Delta Rule Predictions

        The appeal of the delta formula presented above is that it can make a number of predictions, including some that are at first glance counterintuitive. In order to run through the calculations behind some of these, however, we will set up a chart or worksheet of values to be filled in. For the moment, we will assume a maximum of three CSs that might be conditioned with our UCS. Given that constraint, our worksheet will look as follows:
 
 
Table 1

VpresentCues 
(fill in if present): 

CSA:        CSB:        CSc

 
 
TOTAL V
CSs 
 
CSA: 0       CS B: 0       CSc: 0
Trial 1


Trial 2


Trial 3


Trial 4


Trial 5


 

        On the left-hand part of the work sheet, we will track the strengths of the CRs for each of the CSs that appear on the current trial we are working with (Trial 1, Trial 2, Trial 3, etc.). In the middle, we will sum those CRs to obtain our Total V. As a quick approximation, the value for VpresentCues in the delta rule may be obtained by summing the separate CRs, and that will be the value for the middle column. You may then plug this value in the formula as the value for VpresentCues to obtain the results, which we will put in the columns on the right-hand part of the table. In those columns, we will have two values in each row for each conditioned stimulus: The value at the top of the row will be the figure we obtain from the formula, the change in CR strength. We will add that figure to the value directly above it to obtain what the new CR strength now is, and we will place that value at the bottom. That is why there are zeros in the very first row for our three CSs. We are assuming no conditioning, so their strengths are initially set to 0. When we add the changes, we obtain the new strengths. Hence, this table will allow you to track the newly forming associations using the following very simple rule:

            new CRi strength = previous CR i strength + change in CRi strength

        Let's start with a very simple example. We will assume a UCS with a maximum or asymptotic level of association being set to 100, and with the salience of the CSA being set to .5. (Salience, by the way, can never be over 1. In some sense, salience is how much attention the animal pays to a given stimulus or set of stimuli; it can't pay more than 100% of its attention!)

        In our first trial, when there has been no previous learning, the formula will give us the following change for our CS:

            change in CSA on Trial 1 = .5 ( 100 - 0) = 50

Note that VpresentCues is set to 0 in the formula above, because there has been no previous conditioning with this CS, so nothing is predicting the UCS on this trial. Hence, the change is equal to 50, and when we add that to the previous strength of this CS (which was 0), we get a new strength of 50 (indicated in bold type). That is how strong our conditioned response ought now to be. Filling in the table for this trial, we would put in the following figures:
 
 
Table 2a

VpresentCues 
(fill in if present): 

CSA:        CSB:        CSc

 
 
TOTAL V
 
CSs
  
CSA: 0       CS B: 0       CSc: 0
Trial 1 0 0     + 50 
    = 50
Trial 2


 

        I won't always do this on the other examples below, but I have added in a plus sign and an equals sign so that you can see that under the column on the right labeled CSA, I am adding the change (50) to the previous strength listed right above it (0) in order to get the new strength. And now, before we go on to Trial 2, we can already fill in one more part of the table, since we know Trial 2 will involve pairing the UCS with CSA. So, doing this, the table will now show the following:
 
 
Table 2b

VpresentCues 
(fill in if present): 

CSA:        CSB:        CSc

 
 
TOTAL V
  
CSs 

CSA: 0       CS B: 0       CSc: 0

Trial 1 0 0     + 50 
    = 50
Trial 2 50 50
 
 
        Looking at the Table above, we immediately see that our VpresentCues is 50, so we can now calculate what happens on Trial 2. The calculation will be:

            change in CSA on Trial 2 = .5 ( 100 - 50) = .5 (50) = 25

And filling in the table for this trial and preparing it for Trial 3 gives us the following:
 
 
Table 2c

VpresentCues 
(fill in if present): 

CSA:        CSB:        CSc

 
 
TOTAL V
CSs 

CSA: 0       CS B: 0       CSc: 0 

Trial 1 0 0     + 50 
    = 50
Trial 2 50 50     + 25 
    = 75
Trial 3 75 75
 

        If you look at the strength of CRA, it is the old strength (50: the last bolded value in the appropriate column on Trial 1) plus the change you have calculated on Trial 2 (25), and that gives you 75 (the bolded value on Trial 2). I have also placed this 75 in the CSA column on the left-hand side of the table (under VpresentCues) because this is the extent to which the UCS will be predicted by that CS on Trial 3. I have also put it in the middle column (under Total V) because that is the extent to which all CSs present on Trial 3 are predicting the UCS. (Since there are no other CSs involved at this point, that is not a terribly interesting situation. But I do want to get you used to thinking that you need to sum values on the left-hand side to get a middle value that will then be plugged into the formula).

        Let's now go through the calculations for the next three trials:

            change in CSA on Trial 3 = .5 ( 100 - 75) = .5 (25) = 12.5
            change in CSA on Trial 4 = .5 ( 100 - 87.5) = .5 (12.5) = 6.25
            change in CSA on Trial 5 = .5 ( 100 - 93.75) = .5 (6.25) = 3.125

The table values (and you should verify this for yourself!) will then be:
 
 
Table 2d

VpresentCues 
(fill in if present): 

CSA:        CSB:        CSc:  

 

TOTAL V

CSs 

CSA: 0       CS B: 0       CSc: 0 

Trial 1 0 0     + 50 
    = 50
Trial 2 50 50     + 25 
    = 75
Trial 3 75 75     + 12.5 
    = 87.5
Trial 4 87.5 87.5      + 6.25 
    = 93.75
Trial 5 93.75 93.75      + 3.125 
    = 96.875
 

And at the end of the fifth trial, as you can see from the table, we have a CR that is at 96.875. This is indeed very close to asymptote.

        If you were paying close attention, then you should have noticed that on each trial we conditioned half of the remaining association. That is because we had a rate parameter that was set to 50%. As you may gather from this, if our rate parameter would have been set to 10%, then on each trial we would have conditioned only 10% of the remaining association. In that case, although the asymptote would have remained the same, the conditioned response would have been much weaker after Trial 5, simply because we would still be so far away from asymptote. So, taking a second group of animals with no prior conditioning and using a CS with a salience of .10, we should obtain the following table (again, be sure to calculate the values for yourself to check that you understand how to use the formula and the table):
 
 
Table 3

VpresentCues 
(fill in if present): 

CSA:        CSB:        CSc

 

TOTAL V

CSs 

CSA: 0       CS B: 0       CSc: 0 

Trial 1 0 0     + 10 
    = 10
Trial 2 10 10      + 9 
    = 19
Trial 3 19 19      + 8.1 
    = 27.1
Trial 4 27.1 27.1      + 7.29 
    = 34.39
Trial 5 34.39 34.39      + 6.561 
    = 40.951
 

        You can see that the associative strength in this latter example is less than half way to asymptote.

        Figure 1 graphs the two learning curves for these examples. Although it is not perfectly obvious from the figure, both lines will end at the same asymptote. The data that are plotted in this figure, of course, are the CR strengths that appear as the bottom bolded values in the CSA column on the right-hand side of the worksheet. So, in terms of the first few successful predictions, we can point out that the model predicts the correct general shape of a learning curve (that is, diminishing returns). Moreover, it also successfully predicts the CS salience effect: More salient CSs will generally exhibit stronger conditioned responses then less salient CSs after the same number of CS-UCS pairings. Any model that attempts to handle classical conditioning findings will have to predict these effects at the least.

        Another prediction that every aspiring theory must handle involves extinction. In extinction, the CR appears to decrease. Let's see how the Rescorla-Wagner model handles extinction. There may be a surprise here, given what you already know. Thus, we will assume a CR that is at an asymptote of 100, and that has a salience of .5 (as in our first example, except that conditioning has now been completed). When we start presenting that CS without the UCS, the lambda of the UCS is set to 0. The calculations over five trials thus become:

            change in CSA on Trial 1 = .5 ( 0 - 100) = -50
            change in CSA on Trial 2 = .5 ( 0 - 50) = .5 (-50) = -25
            change in CSA on Trial 3 = .5 ( 0 - 25) = .5 (-25) = -12.5
            change in CSA on Trial 4 = .5 ( 0 - 12.5) = .5 (-12.5) = -6.25
            change in CSA on Trial 5 = .5 ( 0 - 6.25) = .5 (-6.25) = -3.125

The table values (and you should verify this for yourself!) will then be:
 
 
Table 4

VpresentCues 
(fill in if present): 

CSA:        CSB:        CSc

 
 

TOTAL V

CSs 

CSA: 100       CS B:0       CSc: 0 

Trial 1 100 100     - 50 
    = 50
Trial 2 50 50     - 25 
    = 25
Trial 3 25 25     - 12.5 
    = 12.5
Trial 4 12.5 12.5     - 6.25 
    = 6.25
Trial 5 6.25 6.25     - 3.125 
    = 3.125
 

        You should note that in the top row on the left-hand side of the table, we start out with a value of 100 for VpresentCues, since there has been some previous learning. By the end of the fifth trial, we have a CR that has very nearly disappeared. In fact, the asymptote in this case will be 0, so when the CS strength (or the CR) reaches 0, further learning will stop.

        And that is the surprise. Pavlov insisted that extinction involved inhibition, but we don't see that in the Rescorla-Wagner formula. Instead, negative changes eat away at the former excitation until we are back to a 0 level. Consistent with this, Recorla's 1969 article reported that the CS after extinction failed to pass the summation and retardation tests!

        Note too that in the Rescorla-Wagner model, extinction is exactly opposite to acquisition. The fact that we obtain negative changes on our extinction trials means that inhibition is building up. In this case, it continues to do so until the inhibition totally cancels out the excitation. But the conditioned response itself is never inhibited: That response never goes below zero, which is what we would need in order to obtain a CR capable of passing the summation and retardation tests. This is a very different view than is found in Pavlov, not only in terms of what is supposed to be happening in extinction, but also in terms of whether the same stimulus can have both inhibition and excitation attached to it, as Pavlov thought. The Rescorla-Wagner model does not allow both to occur as separate processes.

        We have so far looked at simple conditioning; let us now look at compound conditioning. In the next example, we will take two CSs, at saliences of .2 and .3, respectively, and pair them with a UCS whose maximum or asymptotic level is 100. As you work through this example, compare the results to the chart for the first example we worked through (Table 2d).

        Now, one point is important to keep in mind when we speak of compound conditioning: The two CSs are there at the same time, so it is impossible to tell which ought to condition first! Specifically, since both are present, we can think of them as engaging in a tug-of-war to capture whatever remaining association the UCS might have. Or to put this slightly differently, both condition at the exact same time . And that means that each will have exactly the same V presentCues . Thus, if CSA has a salience of .2 and CSB has a salience of .3, then our first trial will involve calculating the following values:

            change in CSA on Trial 1 = .2 ( 100 - 0) = 20
            change in CSB on Trial 1 = .3 (10 0 - 0) = 30

        Let me stress again that on any trial, all CSs conditioned on that trial will use the same value for VpresentCues in their calculations! It is very easy to assume that you have to include the change for the first CS before calculating the second (an error I see students doing all the time on exams), but you don't: The two CSs condition at the same time, so that is the reason why I have the same value (100-0, in this case) for how surprising the UCS is. Filling in the worksheet, we can add these values and prepare for Trial 2 (where they will both be present) as follows:
 
 
Table 5a

VpresentCues 
(fill in if present): 

CSA:       CS B:       CSc

 
 

TOTAL V

CSs 

CSA: 0       CS B: 0       CSc: 0 

Trial 1 0             0 0     + 20           + 30 
    = 20           = 30
Trial 2 20           30 50
 

        There are now some differences, if you compare this to Table 2b. We have moved the bolded CRs we obtained in Trial 1 to the appropriate columns under VpresentCues in Trial 2, and we have summed these to get the Total V, which we have placed in the middle column of the worksheet. This total will be the value we use for the next round of calculations.

        Here are the calculations for Trial 2:

            change in CSA on Trial 2 = .2 ( 100 - 50) = .2 (50) = 10
            change in CSB on Trial 2 = .3 ( 100 - 50) = .3 (50) = 15
 
There is another common trap a number of students fall into at this point: Instead of putting in Total V (50) in the formula, they use just the strengths for each individual CS. So, for the first, they subtract 20 from 100 to get 80 and then multiply this by 20%, and for the second, they subtract 30 from 100 to get 70 and multiply the 70 by 30%. Remember: The surprisingness of the UCS depends on how much all of the present cues are predicting it; you need to enter the total! Adding these values to the worksheet and preparing for Trial 3 gives us:
 
 
Table 5b

VpresentCues 
(fill in if present): 

CSA:        CSB:        CSc

 
 

TOTAL V

  
CSs 

CSA: 0       CS B: 0       CSc: 0

Trial 1 0               0 0     + 20           + 30 
    = 20           = 30
Trial 2 20            30 50     + 10           + 15 
    = 30           = 45
Trial 3 30            45 75
 
 
        And here are the remaining calculations for Trials 3 through 5:
 
             change in CSA on Trial 3 = .2 ( 100 - 75) = .2 (25) = 5
            change in CSB on Trial 3 = .3 (10 0 - 75) = .3 (25) = 7.5

            change in CSA on Trial 4 = .2 ( 100 - 87.5) = .2 (12.5) = 2.5
            change in CSB on Trial 4 = .3 (10 0 - 87.5) = .3 (12.5) = 3.75

            change in CSA on Trial 5 = .2 ( 100 - 93.75) = .2 (6.25) = 1.25
            change in CSB on Trial 5 = .3 (10 0 - 93.75) = .3 (6.25) = 1.875

        And the completed worksheet:
 
 
Table 5c

VpresentCues 
(fill in if present): 

CSA:        CSB:        CSc

 
 

TOTAL V

  
CSs 

CSA: 0       CS B: 0       CSc: 0

Trial 1 0              0 0     + 20            + 30 
    = 20            = 30
Trial 2 20           30 50     + 10            + 15 
    = 30            = 45
Trial 3 30           45 75      + 5            + 7.5 
    = 35          = 52.5
Trial 4 35           52.5 87.5      + 2.5         + 3.75 
    = 37.5       = 56.25
Trial 5 37.5        56.25 93.75     + 1.25        + 1.875 
    = 38.75     = 58.125
 
 
        If you compare the values in this table with those in Table 2d, you should notice two features: First, the Total V column has the exact same values in both tables. And second, if you add the current experiment's two CRs together on any given trial, you will obtain the same value as the CR in Table 2d on that same trial. The reason is that the same amount of conditioning is occuring in each table on each trial. Recall that how much conditioning occurs depends on the rate parameters. In Table 2d, there is a single rate parameter on each trial that is set to 50%, so 50% of the remaining association gets bound, to use the technical term. But in Table 5c, the rate parameters are 20% and 30%. When you add these together, they equal 50%, which is the same overall rate of learning as occurred in Table 2d.

        As this example illustrates, the Rescorla-Wagner model can handle compound conditioning, including the finding that the more salient CS will capture more of the association (one of the features of overshadowing). As salience is in part bound up with signal value, this model captures that aspect of signal value.

        But let's look at other characteristics of overshadowing. To do so, we will run an experiment involving five trials using a CSA with a salience set to .5 and a CS B whose salience has been set to .1. We will again use a value of 100 for the UCS's lambda. (This is an arbitrary value, by the way. Some students sometimes think that lambda is always 100; that is not so. Lambda depends on UCS intensity, so we can make it 250, 36, or whatever value we want. I am using 100 in this example to make the calculations a bit easier.) I won't provide the calculations here (you should attempt them yourself!), but here is the resulting worksheet for this example:
 
 
Table 6

VpresentCues 
(fill in if present): 

CSA:       CS B:       CSc

 
 

TOTAL V

CSs 

CSA: 0       CS B: 0       CSc: 0 

Trial 1 0              0 0     + 50            + 10 
    = 50            = 10
Trial 2 50           10 60     + 20             + 4 
    = 70            = 14
Trial 3 70           14 84      + 8            + 1.6 
    = 78           = 15.6
Trial 4 78           15.6 93.6      + 3.2          + 0.64 
    = 81.2        = 16.24
Trial 5 81.2        16.24 97.44      + 1.28       + 0.256 
    = 82.48     = 16.496
 
 
         The values for CSA ought to be compared with those in Table 2d, and the values for CS B ought to be compared with those in Table 3. The reason is that our first CS has the same salience as the CS presented by itself in Table 2d, and our second CS has the same salience as the CS presented by itself in Table 3. So, by comparing these, we can see what happens to each CS when it is presented by itself, and when it is compounded with another CS.

        This comparison also appears in Figure 2 (although I do want you to compare the actual values in the tables!). In this figure, the left-hand panel shows the high-salience CS both when it is presented by itself (the thick blue line with the squares, if you have a color monitor), and when it is presented compounded with a weaker CS (in this case, its strength is tracked through the dotted orange line with the circles). Even though it is the stronger CS, in this case it has a weaker CR than when presented on its own. The reason, of course, is that the weaker CS is stealing away some of the association so that on each trial, there is less to condition, and thus, less to add to the strength of that particular association. The same thing happens, of course, to the weak CS's conditioned response graphed on the right-hand panel. You can see that there is a bigger difference for the weaker CS presented by itself and compounded than there is for the stronger CS: The lines are further apart! And the reason for that is that the stronger CS on each trial is capturing more of the remaining association, leaving correspondingly less and less association for our weak CS. Thus, consistent with empirical results, the Rescorla-Wagner model successfully predicts that presence of another CS will cause overshadowing whereby the conditioned response is not as strong as when a CS is presented by itself. The model also predicts that the degree of overshadowing should depend on the relative strengths or saliences of the cues in the compound .

        There is also one more prediction that ought to be pointed out: The values or strengths for both CSs in the first trial of compound conditioning were exactly equal to the simple conditioning values: The lines in both the right-hand and left-hand panels of Figure 2 start at the same spot. Or put another way, overshadowing only starts on Trial 2. There is evidence consistent with this prediction, as well.

        You will recall from our discussion of Kamin's work on blocking that blocking is possible when a very weak CS is paired with a very strong CS. You can see something of the sort in Figure 2, as the dotted line on the right (our weak CS when it is in a compound with a strong CS) is clearly never going to get very strong (in fact, if you go back to Table 6, you will see that we are very close to asymptote, so that very little new learning can occur). A more dramatic example of blocking occurs when one CS is presented several trials before another. To demonstrate this phenomenon, we will take two cues that have equal saliences (both at .3), and present the first, CSA, for 4 trials. On the fifth and sixth trial, we will present the compound (that is, CS A together with CSB). We will again take a UCS with a lambda of 100. The design of this experiment is as follows:

                Trial                     CS paired with UCS

                1                                     CSA
                2                                     CSA
                3                                     CSA
                4                                     CSA
                5                                     CSA & CSB
                6                                     CSA & CSB

        You should work out the calculations for yourself to verify the numbers in the worksheet below (I will round to the nearest tenth, by the way):
 
 
Table 7

VpresentCues 
(fill in if present): 

CSA:       CS B:       CSc

 
 

TOTAL V

CSs 

CSA: 0       CS B: 0       CSc: 0 

Trial 1 0     + 30 
    = 30 
Trial 2 30  30     + 21 
    = 51 
Trial 3 51  51     + 14.7 
    = 65.7
Trial 4 65.7  65.7     + 10.3 
    = 76
Trial 5 76           0 76      + 7.2           + 7.2 
    = 83.2          = 7.2
Trial 6 83.2        7.2 90.4      + 2.9           + 2.9 
    = 86.1        = 10.1
 
 

        In this example (see also Figure 3), even though our two CSs are equally salient, the first captures the lion's share of the association. By the time the second is presented, there is too little association left to enable conditioning a strong response with it. And that is with only four prior trials involving just CSA. If we had presented more trials, of course, then the results for CSB would have been even more dramatic. Still, if you look at Trials 5 and 6 in Figure 3 and compare the two CRs, the results are dramatic enough: a much stronger conditioned response to the CS presented more often. The model thus correctly predicts blocking. In the current example, it is clearly sensitive to such issues as redundancy: A CS presented more often than another will have more opportunities to capture an association. (Though there can be interesting interactions when the CS presented less often is in fact significantly more salient. In this case, our two signal value mechanisms may compete against one another.)

        What about situations such as backwards conditioning or the explicitly unpaired procedure , in which we obtain inhibition (as verified through the summation and retardation tests)? To account for these, we need to add in one more idea. That is that contextual cues present when the UCS is there will also be weakly conditioned to yield an excitatory response (recall the study by Hinson and the control study by Balaz et al. mentioned in the previous chapter). So, when the UCS is absent but a new CS (say, salience of .5) is present, then VpresentCues will be positive due to the context predicting the UCS. Let us pick a weak value such as 20 for VpresentCues . In that case, the first few trials with our new CS (the one that should become inhibitory) will be as follows:

            change in new CS on its 1st Trial = .5 ( 0 - 20) = .5 (-20) = -10
            change in new CS on its 2nd Trial = .5 ( 0 - 10) = .5 (-10) = -5

So, as you can see, this stimulus is actually obtaining an increasingly negative value. The model has a mechanism for predicting inhibition in the proper circumstances. Namely, in a situation in which there is a negative contingency, slight residual excitation from the general context will result in VpresentCues being larger than lambda (0 when the UCS is absent), generating negative strengths.

        There are a number of other successful predictions that may be generated from this model (see, for example, the review article by Miller, Barnet, & Grahame, 1995), but let's concentrate on just a few more that involve this notion of VpresentCues being greater than lambda. Obviously, one situation in which this occurs involves absence of the UCS, as in extinction or backwards conditioning. But that need not be the only circumstance in which it occurs. Suppose, instead, that we condition a CS with a salience of .5 (and a UCS with a lambda of 100) for three trials and then, on the fourth trial, reduce the intensity of the UCS from 100 to 77.5, according to the following experimental design:

            Trial         CS-UCS Pairings                 UCS Lambda

            1                 CSA & UCS                             100
            2                 CSA & UCS                             100
            3                 CSA & UCS                             100
            4                 CSA & UCS                             77.5

        In this case, our calculations will be:

            change in CSA on Trial 1 = .5 (100 - 0) = 50
            change in CSA on Trial 2 = .5 ( 100 - 50) = .5 (50) = 25
            change in CSA on Trial 3 = .5 ( 100 - 75) = .5 (25) = 12.5
            change in CSA on Trial 4 = .5 (77.5 - 87.5) = .5 (-10) = -5

        Note particularly what happens on Trial 4: The value for lambda is now the new value of 77.5 (rather than the old value of 100), and since this is less than the VpresentCues, a bit of inhibition starts building up, causing the conditioned response to decrease from 87.5 in Trial 3 to 82.5 in Trial 4. Reducing the lambda establishes a new asymptote, and learning will now be driven towards that new asymptote. Thus, if the amount of association that had been bound up until that change was over the amount in the new asymptote, additional learning will involve decreases in the strengths of the conditioned responses. That result has indeed been verified. Buildup of inhibition need not require an absent UCS!

        Another demonstration of a similar result comes from a series of studies by Kremer. Kremer takes two CSs and pairs each singly with the UCS, until each is at or near asymptote. He then presents the CSs together in a compound. If you think about what should happen on this latter trial, you should conclude that inhibition will start building up. Let's take an example to demonstrate this. This time, we will use a maximum asymptotic value of 200, and two CSs that are each at a salience of .4. We will pair each 4 times with the UCS, and then present them together. So, to keep you straight, here is the design:

                Trial                     CS paired with UCS

                1                                         CSA
                2                                         CSB
                3                                         CSA
                4                                         CSA
                5                                         CSB
                6                                         CSB
                7                                         CSB
                8                                         CSA
                9                                 CSA & CSB

        In this design, the two CSs have been presented in a random order. For our purposes at the moment, order will not prove important for discussing what happens. Again, you would be wise to run through the calculations on your own; assuming you have, they should match those presented in the following worksheet (rounded to the nearest tenth):
 
 
Table 8

VpresentCues 
(fill in if present): 

CSA:        CSB:        CSc

 
 

TOTAL V

CSs 

CSA: 0       CS B: 0       CSc: 0 

Trial 1 0     + 80 
    = 80 
Trial 2                   0  0                       + 80 
                      = 80 
Trial 3 80  80     + 48 
    = 128
Trial 4 128  128     + 28.8 
   = 156.8
Trial 5                  80 80                       + 48 
                     = 128
Trial 6                 128 128                       + 28.8 
                     = 156.8
Trial 7                 156.8 156.8                       + 17.3 
                     = 174.1
Trial 8 156.8 156.8     + 17.3 
   = 174.1
Trial 9 174.1       174.1 348.2     - 59.3         - 59.3 
   = 114.8      = 114.8
 
 
        Here, on Trial 9, both CSs were predicting the UCS, since both were present. Thus, we had to sum their strengths, and that resulted in a huge Total V, one well past the UCS's lambda. So, on that trial, inhibition started developing: Both CRs weakened.

        Keeping track of the VpresentCues in the above worksheet may have been a bit difficult for you. As a guide, when I came to a given Trial, I first looked to see which CS was predicting on that trial. I then checked that CS's column on the right-hand side for the last entry, and moved that value into the VpresentCues column. So, for example, the design on Trial 8 called for CSA to be paired with the UCS. On the right-hand columns, the last value for this CS was 156.8 (calculated on Trial 4), and that was therefore the appropriate value to plug in (and to sum into the middle column).

        One more prediction based on Kremer's work, and then we're done with this section. On the last trial of the previous experiment, instead of just presenting CSA and CSB, let us also add in a new, third CS (of equal salience) to be paired in a triple compound with the UCS. What should happen?

        One of the strengths of mathematizing a model (particularly complex models) is that it can sometimes generate an unusual prediction. Based on what we've learned so far, a seemingly natural and commonsensical prediction would be that nothing should happen with the new CS: It is perfectly redundant with the other two CSs that are doing a good job of signaling the UCS. Alternatively, perhaps it might gain a bit of excitation by being paired with the UCS, assuming that blocking won't be perfect. But in fact, the delta rule predicts a radically different result. Let's go through the calculations for this new Trial 9 to see why:

            change in CSA on Trial 9 = .4 (200 - 348.2) = -59.3
            change in CSB on Trial 9 = .4 (200 - 348.2) = -59.3
            change in CSC on Trial 9 = .4 (200 - 348.2) = -59.3

        Not surprisingly, for our three equally salient CSs, there is the same identical change. But to see how strong the conditioned responses are, we need to add the change to the previous strength. And if we do that, we obtain the following:

            CRA after Trial 9 = old strength (174.1) + change (-59.3) = 114.8
            CRB after Trial 9 = old strength (174.1) + change (-59.3) = 114.8
            CRC after Trial 9 = old strength (0) + change (-59.3) = -59.3

        So here is the surprise: The new CS should become inhibitory! And that is exactly what Kremer finds. The reason, of course, is the same one mentioned above: If all the predictors are summating to an amount different from and above asymptote, then there will be negative change for each predictor until the association has settled back down to asymptotic level. And that means that a new predictor will go into the negative values at the same time that old predictors weaken, but stay positive.

        These are, all-in-all, a very impressive set of verified predictions. Based on them, you can see the appeal of a model that may be captured in a fairly simple formula. But as we will see in the next section, there are an equally impressive number of failures. Thus the model has been more successful than Pavlov's or Hull's models in terms of the number of findings it has accounted for, but it has not been successful enough.

Results At Variance With Rescorla-Wagner

        Let's start with extinction . If extinction is the process of reducing a CR to a zero asymptote, then we need to worry about why we get such findings as spontaneous recovery , disinhibition, and savings on relearning. Within the scope of the original Rescorla-Wagner model, there is no good explanation for these results. The model is also at variance with later work suggesting that extinction may really involve components of inhibition.

        A sample study of that later work involves a recent article by Rescorla. He trained rats to move a lever or pull a chain for different reinforcers (one of these responses was reinforced with a pellet of food, the other with a sip of sucrose). These two reinforcers then became UCSs in classical conditioning, with one UCS being paired with a light, and the other with a noise. These CSs ought to become conditioned excitors (much as CSs associated with an aversive UCS become conditioned suppressors), and indeed, Rescorla found that during later extinction of lever moving or chain pulling, the presence of a CS associated with the same outcome resulted in initially slower extinction. So, if a rat had learned to pull a chain for sucrose, presenting the CS associated with sucrose during chain pulling extinction caused more responding in the initial sessions of extinction, although by the end of extinction training, the same level of extinction was eventually reached by all groups. That so far ought not to surprise you: Both the chain and the CS have been associated with sucrose, so the sucrose is being signaled quite strongly at the start of extinction.

        Following extinction, Rescorla retrained his animals on these responses, but used the opposite reinforcers: If chain pulling had been reinforced with sucrose, it was now reinforced with a pellet. The animals relearned these quite rapidly. The question Resorla sought to answer was, what would happen when one or the other of these CSs occurred while the animal was making one or other of these responses. Rescorla found that the CS that had caused more responding during extinction now acted as a conditioned suppressor causing less responding in the situation where the animal was making the same response. So, if the CS caused more responding of chain pulling during extinction, it depressed chain pulling during later performance, even though the chain pulling was now associated with a different reinforcer than signaled by the CS. Thus, there was evidence both that a CS could become inhibitory, and that this inhibition was specific to the relationship between the CS and a particular response. Rescorla (1997, p. 249) concludes that :

The precise associative structure of these inhibitory associations remains to be fully worked out. We have described them in terms of simple binary associations between S and R. But it is equally possible that they involve a more elaborate hierarchical structure in which S signals that R will be followed by no event.
        Are you surprised that the processes involved in extinction are still a hot topic in the late 90s? Finding a satisfactory explanation of what happens during extinction is still a pressing goal.

        While on the topic of inhibition, note that the Rescorla-Wagner model claims inhibition ought to be the reciprocal of excitation. But, inhibition normally takes much longer to develop than excitation, and at least in some cases (compensatory responses, for example) appears to involve a process in addition to excitation, rather than a direct canceling out of the excitation (a view consistent with the results of Rescorla's more recent work above). Moreover, one occasionally sees excitatory CRs in conditions where inhibitory CRs should have been expected. Normally, these occur in situations in which a small number of conditioning trials have been given. Nevertheless, they fail to conform to the model. Why would a few trials of backwards conditioning, for example, result in excitation?

        Besides extinction, the delta rule provides little or the wrong guidance on higher-order conditioning: If contiguity is used to establish the existence of a positive contingency, then why do we obtain results suggesting a positive contingency between the second CS and an absent UCS? This was a problem for the contingency approach, and it really wasn't solved by the Rescorla-Wagner model. That model, of course, did not explicitly deal with the relationship of stimulus similarity in a way that would account for the Rizley and Rescorla results (though it did deal with stimuli as consisting of component elements, so that generalization could be described in terms of number of common elements conditioned to the UCS). Similarly, it wasn't geared towards handling the belongingness effects in the learned taste aversions paradigm, nor the interactions of belongingness with temporal contiguity found by Krane and Wagner .

        More seriously, the model failed to handle some compound conditioning results. Particularly prominent among these is potentiation, briefly described in the last chapter. The problem, of course, is that the model predicts overshadowing when two CSs are presented at the same time, but potentiation is actually the opposite: a weak CS becomes stronger in the presence of another, stronger CS. In addition, the model clearly predicts that overshadowing ought to occur starting with Trial 2. However, several studies report overshadowing with a single trial.

        Another problem in the area of compound conditioning concerned the phenomenon of unblocking . The Rescorla-Wagner model could correctly predict some unblocking results, but not all. Thus, in Kamin's unblocking experiment (discussed in the previous chapter), a light compounded with a tone unblocked when the UCS shock increased. To remind you of Kamin's study, the relevant conditions were as follows:

            Group                            Phase 1                   Phase 2

            blocking of light             tone & 1 mA         shock light, tone & 1 mA shock
            unblocking of light         tone & 1 mA         shock light, tone & 4 mA shock

This result is easy to explain, because the increase in shock means that lambda has increased. Thus, while the tone has captured most of the association so far, the model would correctly predict some new association being freed up that the tone and light could compete for. And if the light is more salient than the tone (see the previous discussion of this experiment), then it will now capture a fair chunk of those new resources. To see a mathematical illustration of this, let's assume that we start off with a UCS whose lambda is 100. Let's further assume that the tone has a salience of .2, and the light has a salience of .4. Let's also assume that increasing the shock to 4 mA effectively increases lambda to 200, and that our tone had a CR of 84 by the end of Phase 1. In those conditions (and you could have easily chosen other values to illustrate the same point), our first two trials of compound conditioning (call them Trials 61 and 62) in the unblocking group should give us the following:
 
 
Table 9a

VpresentCues 
(fill in if present): 
tone:        light: 
TOTAL V
CSs 

tone: 84        light: 0 

Trial 61 84             0 84       + 23.2         + 46.4 
     =107.2         = 46.4
Trial 62 107.2        46.4 153.6         + 9.3          + 18.6 
     =116.5         = 65
 

In contrast, Trials 61 and 62 for the blocking group would have yielded:
 
 
Table 9a

VpresentCues 
(fill in if present): 
tone:        light: 
TOTAL V
CSs 

tone: 84        light: 0 

Trial 61 84            0 84         + 3.2            + 6.4  
      = 87.2           = 6.4
Trial 62 87.2         6.4 93.6        + 1.3            + 2.6 
     = 88.5            = 9
 

The model thus correctly expects a much stronger CR for the new CS when UCS intensity increases in compound conditioning.

        But how can this model account for the Dickinson, Hall, and Mackintosh experiment that involved the following setup (again, see Chapter 2 for more details):

            Group                     Phase 1                                       Phase 2

            light blocking         click, 2 shocks 8 sec apart         light & click, 2 shocks 8 sec apart
            light unblocking     click, 2 shocks 8 sec apart         light & click, only 1 shock

The problem is that the unblocking condition involves what appears to be a reduction in lambda (one shock instead of two) which should result in an even smaller response to the click than in the blocking group. (As an exercise, you should try to play around with some figures to convince yourself that this is the case, even if the reduction now results in lambda being below VpresentCues!)

        Most seriously, perhaps, a number of people at the time became convinced that the model's failure to handle changes in the processing of the CS was its Achilles' Heel. Among the CS findings that could not easily be handled by the model were latent inhibition, learned irrelevance, and the finding that CS intensity or salience can apparently influence not only rate of learning, but also where the final asymptote will be (something that ought to be influenced just by the UCS in the Rescorla-Wagner model). We have already talked about latent inhibition; learned irrelevance had to do with Rescorla's claim that truly random control was the proper control for learning, rather than the explicitly unpaired procedure. In the truly random control , the CS and UCS are paired at random to yield a zero contingency, precisely the condition that the model claims ought to result in no learning. But, Rescorla found that the CS in this condition (and in latent inhibition) retards later excitatory learning, although it fails to pass the summation effect. Whatever is going on, then, it does not appear to involve true inhibition.

        There were other critiques and findings that did not fit, and you may wish to consult the Miller et al. review article for more examples and details. In addition, there have also been attempts to update the Rescorla-Wagner model in order to handle some of these problems (see, for example, Van Hamme & Wasserman, 1994).

        In any case, Wagner and his colleagues, in particular, developed a new generation of models to add in a component of CS processing along with UCS processing (as did other theorists such as Hall and Pearce). We will examine Wagner's model shortly. But first, let us briefly look at an alternative class of models (the comparator models) that attempted to account for findings in terms primarily of excitatory links.
 

III. The Comparator Approach

 
        The comparator approach involves a series of models that basically concern themselves with excitatory associations. As the name comparator implies, the animal will make its response on the basis of comparing a