Notes on Top-Down & Bottom-Up Processing


            So far, we have concentrated in this unit on what is called bottom-up processing (also known as data-driven processing ). In bottom-up or data-driven processing, what happens is a little bit like the Pandaemonium model we looked at in class. That is, the outside data drive the system. Sensory inputs get registered onto a sensory memory, and then are processed at higher and higher levels until a match is finally found with something in long-term memory. So, again using Pandaemonium as an example, light hits the rods and cones in the retina (the image demons ). At the next level up, these data are automatically analyzed in terms of features (the feature demons). At the next level above that, they partially activate many different patterns depending on the degree to which they resemble each pattern (the conceptual demons). And finally, the activations of these various patterns are compared to determine which pattern is likely to be the one currently in view (the decision demon ). This sounds like a lot, but it happens rapidly, and normally at a level below conscious awareness. We know the result, but we are not aware of the activity at the intermediate levels.


            But there is another pathway that is very important in perception and pattern recognition. It involves what is called top-down processing (also known as conceptually-driven processing). In top-down processing, your beliefs, cognitions, and expectations in part drive the pattern recognition process. Some of this may be conscious, but some may also be unconscious. Basically, if you are expecting to come across a certain pattern, then you are focusing your attention on looking for evidence consistent with that pattern, and not just automatically processing whatever is in view. This ties in to the next unit on attention, but one way to conceive of this is to think of the pattern recognition process being fine-tuned to look primarily for the features in the pattern you’re interested in, and to squelch or attenuate anything else.


            To see how this might operate, let’s look at a modified version of the Pandaemonium model. In the original model, all activity traveled up to the decision demon, so there was only one direction in which activation flowed. Let’s now add in links or paths going down as well as up. So, the decision demon can send information or activation down to the conceptual demons, and they can send information or activation down to the feature demons. Now, suppose you are in an experiment where you are supposed to scan a large list of letters for the letter R. How this would work is that the decision demon would send some activation down to the R conceptual demon, and the R demon would send some activation down to the feature demons coding for the features of an R such as a slanted line, a partial curve, etc. Since those features are partially activated, only a little bit of additional activation from the sensory input may be needed to get them activated past threshold, so that the pattern recognition process speeds up in recognizing an R. But of course, there is also a cost associated with this: You are not getting as complete information from the outside world as you normally would, so mis-recognitions and illusions become more probable. Or to put this more concretely, since the features of a P and an R are similar, if the R features are already partially activated, then seeing a P may be enough to send the R conceptual demon yelling and screaming that it has found an R.


            Now, this isn’t the only way for top-down processing to operate, but it certainly should give you a feel for what should happen in these cases. Another way for top-down processing to operate is in terms of a filter that is tuned to selectively let some information through while blocking other information. In either case, though, we might expect speeded processing, and we might also expect an increase in errors. So, if you have arranged to meet a friend somewhere at a certain time, you are likely to recognize your friend faster than if you came upon him or her by accident at the same time, even though the perceptual features are identical in each case. But at the same time, you are also very likely in the former case to mistake someone or something else for your friend. For example, I have cats. Some are indoor cats, and others are outdoor cats, and we try to keep the outdoor cats in the backyard, rather than in the front yard by the street. So, when we’re driving to or away from the house, we tend to scan for cats for safety’s sake, and we often have to shush a cat back into the back yard. A week or so ago, my wife and I were walking at night through the neighborhood. When we came back to our street, and while we were still a fair distance away, she said, “There’s one of the cats out in front of the garage.” I saw it, too, a black shape with a tail outlined against the light from the garage, and at the very edge. Only, as we both realized shortly after, it wasn’t a cat at all, but a planter out in front. But there was something about one of the plants that had a configuration similar to a tail sticking up out of a hunched mass, and since we were looking for cat-like features, that was enough to momentarily fool us into mistaking an inanimate object for one of our cats.


            That, of course, is anecdotal, and doesn’t provide very strong proof of top-down processing, so let’s get to the experimental evidence. To give you a feel for this phenomenon, I will briefly present two findings. The first is called the phoneme restoration effect, and the second (which is also presented in your text) is called the word superiority effect.


            Let’s start with the phoneme restoration effect. This involves studies by Warren and Warren and their colleagues. A phoneme is a sound that is meaningful in a given language in the sense that if you change it, you may also change the meaning of the word it is a part of. Different languages have different phonemes. Many Asian languages, for example, don’t make a distinction between the “r” and “l” sounds, whereas those are different phonemes for native English speakers. Part of the difficulty of learning to speak a foreign language well is being able to produce its phonemes, and indeed, even being able to hear them, which is sometimes difficult to do.


            In any case, in a preliminary study, the Warrens gave people a sentence such as the following:


“The state governors met with their respective legislatures convening in the capital city.”


They replaced the first s-phoneme in the word “legislatures” with a short cough. Twenty people were asked to listen to this and other sentences and indicate whether there was a missing sound, and if so, which sound was missing. Nineteen of these heard no missing sound in the sentence above. The 20 th person did claim a missing sound, but identified a different sound than the s-sound. So, even though the sound wasn’t there, people apparently restored it - the phoneme restoration effect. Why? On a bottom-up only account, this should never have happened: You can only extract the features, sounds, and patterns that are actually present. But on a top-down account, people extracted enough information from the sound and meaning of the sentence to believe that they heard the word “legislatures,” even though they really didn’t. And a good thing, too: As linguists and psycholinguists are fond of pointing out, our speech is full of extraneous sounds and false starts. In short, when you get enough information to start building a representation of what you’re hearing, that representation takes over, and now guides you into expecting more confirming evidence, and that is the top-down part.


            One more experiment by the Warrens. They gave their next group the following sentence:


“It was found that the wheel was on the axle.”


Before people heard that sentence, however, the Warrens electronically snipped out the “wh” sound from “wheel” and replaced it with a burst of static. All people had to do was write down exactly what they heard. And of course, they restored the phoneme: Everyone wrote down “wheel.”


            Now, you might see a potential flaw in this and the previous experiment. When we examine the actual speech of someone, one word flows into another; there aren’t sharp boundaries or pauses between words. (Think of trying to distinguish “ice cream” and ”I scream.”) In fact, that is one of the reasons why someone speaking a language we don’t understand typically sounds like he or she is speaking rapidly: we don’t know how to separate the speech stream in that language into words. But to get back to the potential problem, how do we know, in either this or the previous experiment, that the Warrens were successful in getting rid of all of the phoneme, if one word glides into another? Maybe there was enough of the s-sound in Experiment 1 (or the wh-sound in Experiment 2) to allow the appropriate pattern-recognition routines to do their thing bottom-up. To look at this possibility, the Warrens changed the last word in the above sentence. They electronically removed “axle” and replaced it with “orange,” at which point people write down “peel” instead of “wheel.” And you can play this game with other words such as “shoe” and “table,” which will result in people writing down “heel” and “meal.” So, it wasn’t that a bit of sound was left over; they were actually re-constructing what they believed they must have heard on the basis of the general context or meaning of the sentence. Note that this fits in with Gregory’s view: perception and pattern recognition are often re-constructive, based on hypotheses about an input that may be only partially confirmed.


            You’ll see that the word superiority effect is quite similar. In an early experiment, Reicher flashed either a single letter of a word on a screen. The subject’s task was to indicate whether a specific letter was present. What Reicher found was that people were faster at identifying the letter when it appeared in the word than when it appeared by itself (the word superiority effect or WSE). Now on first blush, this may seem a counter-intuitive finding. A common-sense model tells you that you first have to identify the letters before you can identify the word, so the reverse finding might be expected.


            There have been several explanations of the WSE over the years. One early explanation is that a word provides constraints on guessing. So, if you get the word “NURSE” flashed at you and are cued for what the second letter might have been, if you didn’t actually see the letter long enough to identify it, you might still guess “U” because there are only two possibilities, and “nurse” is a more familiar word in English than is “Norse.” In any case, you would not guess that the missing letter was a J. However, this doesn’t really provide much explanation for why people are likely to miss some information when they see a single letter on screen, but seem to get enough information when they get multiple letters forming a word.


            A second explanation was provided by Neal Johnson. He argued that words become so familiar to us that we recognize them by their overall shapes, instead of having to read the individual letters. There is certainly some evidence for this: People who study eye movements during reading have discovered that the eye is more likely to end a saccade by skipping over a short but familiar word, even though the word is far enough away that we know information about the individual letters is not being picked up. If the word is not familiar, however, then it is more likely that the saccade will end with the eye focused somewhere in the middle of the word. So, on this account, extraordinarily rapid recognition of a word can take place. And because we know which letters spell that word, we can readily access information about the component letters faster than when we look at an individual letter.


            Today, as your text points out, the favored model is something along the lines of the interactive activation model of McClelland and Rumelhart. It is very similar to the modified Pandaemonium model we talked about earlier. So, think of this model as a type of Pandaemonium in which features feed excitation or activation to the level of letters, and letters feed activation to the level of words. (B, A, and T are connected to the word “bat” and B, A, and R are connected to the word “bar.”) But there are also links going down: So, if you see some features consistent with a three-letter word, and the features start the B and A conceptual demons yelling, they will start the BAR and BAT and BAD conceptual demons yelling. These, in turn, feed activation back down the pipe to R, T, and D, and these letter demons feed activation back to feature demons such as slant, horizontal line, etc. So, if just a slant is now seen for the third letter, it is enough to let BAR win the race over BAT and BAD.


            Basically, on the interactive activation model, lower and higher levels are coordinating with one another in reaching a best guess about what is there. The lower level might be saying something like “I know there’s a slanted line” while the higher level might be saying “look for a slant or a horizontal line or a big curve, but don’t bother with anything else.” So, you have a combination of both top-down and bottom-up processing. On this account, having the additional level of word demons places faster constraints on what the possibilities might be at the feature and letter level, so that you can get faster identification.


            In addition to the word superiority effect, by the way, people have also found an object superiority effect by which component parts are recognized faster in objects than when they are presented on their own.


            So, is perception and pattern recognition mainly top-down or bottom-up? The answer is that we normally do both in combination. A very important place where this is seen is fluent reading: People who are good, rapid readers are anticipating to some extent what ought to be the next words or ideas they will get (for example - “The majestic hawk swooped down and carried off the struggling_______.” is very likely to suggest a limited range of final words to you, even though none has been presented. “Rabbit,” maybe?). But at the same time, relying too much on top-down processing can be a disaster; you do need to do some reality checking to make sure the expectations you have are consistent with the data coming in through the senses. In short, we can moderate the relative contributions of one or the other in different situations, but we generally do both. And once we start to get some conception of a pattern through partial evidence from bottom-up processing, top-down processes are then likely to activate and help speed up the pattern recognition process.