In this paper the issue of drawing inferences about biological
cognitive systems on the basis of connectionist simulations is
addressed. In particular, the justification of inferences based
on connectionist models trained using the backpropagation learning
algorithm is examined. First it is noted that a justification
commonly found in the philosophical literature is inapplicable.
Then some general issues are raised about the relationships between
models and biological systems. A way of conceiving the role of
hidden units in connectionist networks is then introduced. This,
in combination with an assumption about the way evolution goes
about solving problems, is then used to suggest a means of justifying
inferences about biological systems based on connectionist research.
Contacts: István S. N. Berkeley Ph.D.
Philosophy,
The University of Louisiana at Lafayette
P.O. Drawer 43770
Lafayette
LA 70504-3770
USA
Tel: (318) 482-6807.
The Appeal of Connectionism
Modeling with connectionist systems is now an integral part of
Cognitive Science. One of the putatively appealing features of
connectionist models is that they are supposed to be, in some
sense, more biologically plausible, or brain-like than models
which have their roots in what Haugeland (1985) terms the 'GOFAI'
(short for 'Good Old Fashioned Artificial Intelligence') tradition.
This justification of connectionism has found a strong hold in
the literature, especially the philosophical literature (see Clark
1989: p. 4, Bechtel and Abrahamsen 1991: p. 17, Churchland 1989:
p. 160, Dennett 1991: p. 239, Sterelny 1990: p. 175 and Cummins
1989: p. 155, for examples). However, over the years, it has become
increasingly clear that this justification is deeply suspect,
when applied to certain important sub-classes of connectionist
system.
Consider for example, the case of the backpropagation learning
procedure. Although this learning procedure has been widely deployed,
it has been well known for some time that the procedure is highly
biologically implausible, for a variety of reasons (Grossberg
1987 and Quinlan 1991). This being the case, the justification
for employing a connectionist model which employs the backpropagation
learning procedure on a particular problem, cannot be based upon
putative biological plausibility.
Although some network systems can be justified on biological
grounds, for example those which fall into the field often referred
to as 'neural computing' (Churchland and Sejnowski 1992: p. 14),
systems which employ backpropagation fall outside the scope of
this justification. The justification for backpropagation models
thus must come from other grounds. What then, are the reasons
for believing that backpropagation trained models can tell us
anything about biological cognition? Sketching an outline of an
answer to this question will be the main purpose of this paper.
To being with, it is worth briefly considering the relationship
between models and minds in general.
Back to Basics
As a minimal condition, any proposed model of some aspect of biological
cognition must have the appropriate input and output behaviors
to model the relevant aspect of cognition. That is to say, the
model had better be able to do more or less the same things as
the biological system which is putatively models. However, this
alone is not sufficient to justify a particular model as a basis
to draw inferences about biological cognitive function. This is
because there are many different ways to compute any particular
function.
Consider the example of multiplication. Whilst most of us are
taught in school to do multiplication using, what is known as
the 'classical multiplication algorithm', pocket calculators go
about multiplying two numbers in an entirely different way. Calculators
use an algorithm known as 'multiplication a la Russe' (see
Brassard and Bratley 1988: p. 2). Of course, there is no straightforward
way for the user of a calculator to know that the machine is calculating
in a different manner, as both means of calculating give the same
results. The reason that a calculator uses this algorithm is because
it is much easier and simpler to implement in digital circuits.
However, it would be a great error for a researcher to think that
they could learn something about the human ability to multiply
numbers by studying a calculator. This is the reason why the mere
fact that a model can apparently duplicate input and output behaviors
of some aspect of cognitive functioning is not sufficient to justify
inferences about biological cognitive agents. However, such duplication
of input and output behavior constitutes a necessary condition
for such inferences to be drawn.
Pylyshyn (1984) draws a distinction which is helpful in this context.
It is the distinction between computational systems which are
strongly equivalent to biological systems, verses those which
are weakly equivalent to such systems. Systems or models which
are merely weakly equivalent to some aspect of biological cognitive
functioning may have the same input and output behaviors as some
biological system in the relevant respects, however they will
go about producing this behavior in a different way. One consequence
of this is that although the set of behaviors being studied may
be the same, emergent behaviors (i.e. those behaviors which the
model is not explicitly designed for, which come 'for free' so
to speak) will most likely be different. So, to continue the example
above, a calculator doing multiplication a la Russe may
be weakly equivalent to a human being doing classical multiplication,
but as noted, this does not provide much of a basis for inference
about human cognitive functioning. When a model or system is strongly
equivalent to some biological system, by contrast, the system
or model not only has the same input and output behaviors, but
also computes the function in the same way. That is to
say, if one system is strongly equivalent to another system, then
the two systems are computing exactly the same algorithm, in the
same way. Moreover, a consequence of this will be that strongly
equivalent systems will have the same emergent behaviors. It is
only in the case of systems or models which are strongly equivalent
to biological systems that adequately justified inferences about
the biological systems can be drawn directly on the basis of the
non-biological ones.
There are grounds for believing that the study of systems which
are merely weakly equivalent may nonetheless be illuminating,
in an indirect sense, about biological cognition. Dennett (1978:
p. 113), for example has argued that it is possible to learn much
of significance to psychology and epistemology on the basis of
particular, though unrealistic (compared to natural cognitive
systems) models. Dennett's contention is that such models can
provide information about the general principles governing psychological
and epistemological (i.e. cognitive) systems. As it is highly
doubtful that many of the connectionist systems which have been
trained using the backpropagation learning procedure manage to
reach the standards required of strongly equivalent systems, it
is this potential utility of connectionist systems, as weakly
equivalent systems, which initially will be developed further
here. In order to do this, it is worth pausing briefly to consider
what a system trained using the backpropagation procedure has
to accomplish in order to reach convergence.
Problems and Properties
When a connectionist network is trained using backpropagation,
a number of input patterns, are presented to the network, usually
after the weights of the network have been randomized. For each
pattern, the network will produce some response which is then
compared to a desired response. This enables changes to be made
to the weights between the layers of processing units in the network.
The changes are made such that the network will respond more closely
to the desired response the next time it receives the same input,
or set of inputs. Assuming that a network successfully learns
a problem, the network will at least produce the desired response
when presented with every input pattern in the training set.
In order to learn a particular problem, a network has to find
regularities in the input data which will enable it to produce
the correct output for the problem being trained. The exact nature
and kind of regularities will depend upon the precise problem
being trained. However, for many interesting problems (those which
are non-linearly separable), finding the necessary regularities
requires some kind of recoding of the input information. In order
for a network to be able to do this, it requires hidden processing
units (Clark and Thornton, 1997). The role of the hidden units
is to recode the inputs, so as to make the solution to the problem
possible by the network. This enables the hidden units to be conceived
of as devices which serve to detect input properties which are
important to the solution.
It also turns out that for some tasks, we can know in advance,
a priori, what some of the input properties a network will
have to become sensitive to, for a particular problem set. Consider
the case of a network which has to learn to distinguish valid
from invalid instances of a set of simple arguments. If the training
set contains a range of connectives, then the network would have
to take into account the main connective of a problem, so as to
be able to distinguish, for example, an invalid instance of a
Modus Ponens inference, from an valid instance of a Disjunctive
Syllogism inference. A network successfully trained on just this
kind of problem has been described by Bechtel and Abrahamsen (1991).
Subsequent analysis of a network which was successfully trained
upon Bechtel and Abrahamsen's problem set revealed that the network
had indeed learned to become sensitive to exactly this input property
(Berkeley et al. 1995).
If we know that hidden units function as input property detectors
and, in some instances, we can even predict what the kinds of
property they will have to detect in order to solve certain problems,
then connectionist networks can be conceived of (in principle
at least) as offering the means to discover sets of properties
which, in combination, can solve particular problems. However,
given the discussion above of the potential strong/weak equivalence
relations which can hold between computational systems and biological
ones, there may be legitimate grounds for wondering exactly why
this conclusion should be taken as being of particular interest
to researchers interested in cognition. After all, why would there
be grounds for thinking that a network will find a solution to
a particular problem which is strongly equivalent, rather than
weakly equivalent?
On the one hand it has been argued that if we want to learn and
make justified inferences about biological cognition from computational
models, we really need to ensure that the models we draw inferences
from are strongly equivalent to the biological cognitive systems
of interest. On the other hand, it has been argued that we may
be able to draw inferences about the general class of systems
with certain apparently cognitive capacities, by studying only
weakly equivalent systems. It has also been argued that connectionist
networks, which are trained using the backpropagation training
procedure, offer a means of determining the sets of input properties
which are important to solving particular problems. Yet, as the
number of algorithms for solving particular problems is in principle
intractably large, it would seem likely that the properties discovered
by the hidden units of individual backpropagation networks, could
reveal very little about the space of plausible algorithms in
general. This does not seem to be a happy conclusion, for those
cognitive scientists who build and study backpropagation trained
networks. However, I now want to make a case that things may not
be as grim as they may at first appear.
Biology and Bias
In order for the argument to proceed further, it is necessary
to bring in another premise. Gould (1980: p. 26) cites Francis
Jacob as the source of the aphorism that "Nature is an excellent
tinkerer, not a divine artificer". The crucial point here
is that the evolutionary process is not one which produces perfect
solutions, in some sense, to particular problems. Rather, evolution
tends to develop solutions to problems which work, even if the
solutions themselves are sub-optimal from a design perspective.
Gould (1980) argues this point by citing the pseudo-thumb of the
Giant Panda, along with several other examples.
As there is a potential for some confusion at this point, it is
worth pausing briefly to consider two ways in which solutions
to problems can be evaluated. The first way to think of a solution
to a particular problem or set of problems, is from what I term
(for want of a better term) an 'design' perspective. Suppose some
divine artificer was to wish to provide a Giant Panda with a
means of holding bamboo shoots. Such an artificer would presumably
be in a position to design a solution which would be as perfect
as possible, in terms of simplicity, efficiency, robustness and
so on, given the constraints of the problem at hand. Note though
that the artificer (being divine)would not be constrained with
respect to the materials from which the solution could be fashioned.
Perhaps in the instance of the Panda's Thumb, an extra digit would
be the best way of solving the problem. Compare this to the actual
solution developed to the problem through the evolutionary process.
In the case of the Giant Panda, the pseudo-thumb is actually created
by the extension of a bone in the wrist. Such a solution respects
the fact that there are only certain resources available from
which the additional functionality can be derived. However, it
may well be the case that such a solution may not be as advantageous
as the option of simply adding an extra digit, and consequently
may be judged to be 'sub-optimal' when compared to a 'designed'
solution.
The relevance of this premise to the current issue is that it
suggests something about the kinds of models which are likely
to be strongly equivalent. Presumably, what is true of the evolution
of parts of the body, is also likely to be true of the mind and
brain (C.f. Cosmides and Tooby, 19**). If biological bodies are
quirky and sub-optimal in some respects when considered from a
design perspective, then it is not unreasonable to assume that
the structures which govern biological cognition are similarly
idiosyncratic. However, if this is the case, it would seem that
there is a very real problem which has to be faced by researchers
who are attempting to model cognition. The problem is to find
a way of generating models with the appropriate kinds of idiosyncrasies
(whatever they may be).
In practical terms, dealing with this problem is not too easy.
The reason for this is that the standard process of training which
researchers go through at the undergraduate and the graduate level
is antithetical to idiosyncrasy. When a person takes their first
class in programming, one of the first lessons learned is to always
try and find 'elegant' solutions to problems. Similarly, in a
logic class students are often penalized for deriving proofs which
are overly long or clumsy, even if the proofs themselves do not
contain any erroneous application of the rules of inference. Analogous
examples can be found easily enough in almost any discipline.
The point is, through the formal process of education, most researchers
are inured to do the exact opposite of what seems to be suggested
by the evolutionary record. We are trained to favor well designed
solutions over sub-optimal, kludgy ones. Thus, it is difficult
(or at least highly counter-intuitive) to figure out ways of constructing
models of cognitive function which are idiosyncratic.
It should be made plain though that the claim here is not that
researchers cannot produce cognitive models of the appropriate
kind. It is just that doing so is not a straightforward or obvious
process. There is one exception to this though. If a researcher
leaves the selection of key features of a model to some mechanical
process, then the tendency to avoid certain types of solutions
to problems (i.e. 'messy' solutions) can be overcome. Provided
that a model meets the minimum requirement, that it performs in
a manner which is at least weakly equivalent to the biological
system which it is supposed to emulate, then the optimality of
the solution deployed is not initially an issue. The proposal
I wish to make here is that this situation may, in fact, end up
favoring the kinds of solutions to cognitive problems discovered
by connectionist networks trained using the backpropagation learning
procedure. This is because when the hidden units of a network
trained using backpropagation becomes sensitive to certain input
properties whilst learning to solve a problems set, there are
no prior constraints upon the selected set of input properties,
other than the fact that they must serve to solve the problem
at hand.
There is an immediate an obvious objection to this proposal: "Doesn't
this end up putting cognitive scientists who train connectionist
networks using backpropagation into a position of effectively
looking for a proverbial needle in a haystack, when it comes to
finding models which can be informative about biological cognition?"
The fact noted earlier, that there are potentially a very large
number of algorithms for computing a particular function, seems
to suggest that this will be the case. Although this objection
seems plausible at first, there are prima facie reasons
to believe that, in practice, it may not actually present as much
of a barrier to progress as it initially appears.
The first response to this objection is to note that it is by
no means clear that there actually will be a large number of algorithms
for computing a particular cognitive function. It may turn out
to be the case that there are comparatively few, or even just
one. This is ultimately a type of question which needs to be treated
empirically. For example, Berkeley et al. (1995) trained
a network to, amongst other things, determine validity for a set
of logic problems, originally studied by Bechtel and Abrahamsen
(1991). The detailed analysis of this network revealed that the
network had developed 'rules' which were in many instances close
analogies to the classical rules of natural deduction (Cf. Bergmann,
Moor and Nelson 1990). Perhaps the traditional rules of inference
are the only way of successfully determining validity. Similar
circumstances may arise with other aspects of cognition too.
The second response to the objection depends upon understanding
the role of hidden units in trained networks as functioning as
detectors of input properties which are needed to solve the particular
set of problem at hand. If it is determined empirically that all
networks which learn to solve a particular set of problems are
sensitive to some particular set of input properties, then there
may be grounds for hypothesizing that biological cognitive agents
are sensitive to the same properties. This is just the kind of
hypothesis which could be (at least in principle) verified by
conducting studies on biological subjects. The network methodology
would act as a means of generating hypotheses about the particular
function in question.
The third response to the objection is that there are a number
of performance criteria, such as the ability of networks to generalize
to new data, which could easily be deployed in order to determine
the effectiveness of the solution to a problem found by a network.
This too would offer a ready and easy means of determining which
algorithms were worthy of further study and which were not. In
addition, all researchers, be they interested in connectionist
modeling, or modeling in other ways, have a duty to compare the
behaviors of their systems with the behaviors of biological systems,
before making claims about biological cognition on the basis of
their models. Moreover, the behaviors of the system should include
more than just the system's behavior on the task explicitly at
hand. That is to say, emergent behaviors of the systems should
also be considered and assessed. This, after all, is one of the
crucial (though regrettably, all to often overlooked) steps in
determining whether or not a model is strongly equivalent to biological
systems. This equivalence is required in order to justify direct
inferences based upon any kind of computational model. These kinds
of considerations would assist in determining which models provided
good evidence and which did not, thereby limiting the size of
the algorithmic space which needed to be investigated.
Of course, there is no guarantee that any of these responses would
be helpful in the case of all particular cognitive functions.
Whether or not this was the case, is a matter which would have
to be determined empirically. The point which needs to be appreciated
here is that the objection is not fatal to the proposed research
strategy, until such time as some studies have been done and some
evidence collected. Moreover, adopting this methodological strategy
provides a viable and adequately justified role for connectionist
research using backpropagation .
Conclusion
In the above, I have attempted to sketch a means of justifying
connectionist research using the backpropagation learning procedure.
The case has briefly been made that we may be able to use models
of this kind, when weakly equivalent, to discover salient facts
about cognitive systems in general. It has also been suggested
that it is possible that connectionist systems may be able to
do more than this. However, throughout the argument, there has
been one assumption made which needs to be made explicit. This
is the assumption that there is some viable means of determining
exactly which input properties the hidden layer of processing
units becomes sensitive to, when a network has learned to solve
a problem. At the current time, this is a controversial and problematic
issue (see McCloskey 1991), which cannot be discussed further
here. However, assuming that connectionists can use backpropogation
networks to recover sets of properties which can solve particular
cognitive problems of interest, then it seems that they have some
justification for their methodology.
Bechtel, W. and Abrahamsen, A. (1991) Connectionism and the
Mind, Basil Blackwell (Cambridge, Mass.).
Bergmann, M., Moor, J., & Nelson, J. (1990), The Logic
Book McGraw-Hill (New York).
Berkeley, I., Dawson, M., Medler, D., Schopflocher, D., and Hornsby,
L. (1995), "Density Plots of Hidden Unit Activations Reveal
Interpretable Bands", in Connection Science, 7, pp.
167-186.
Brassard, G. and Bratley, P. (1988), Algorithmics, Theory &
Practice, Prentice Hall (New York).
Churchland, P. M. (1989), The Neurocomputational Perspective:
The Nature of Mind and the Structure of Science, MIT Press (Cambridge,
Mass.).
Churchland, P. S. and Sejnowski, T. (1992) The Computational
Brain, MIT Press (Cambridge, MA).
Clark, A. (1989), Microcognition: Philosophy, Cognitive Science
and Parallel Distributed Processing, MIT Press (Cambridge,
Mass.).
Clark, A. and Thornton, C. (1997), "Trading Spaces: Computation,
representation, and the limits of uninformed learning", in
Behavioral and Brain Sciences, 20, pp.57-90.
Cosmides, L. and Tooby, J. (1994), "From Function to Structure:
The Role of Evolutionary Biology and Computational Theories in
Cognitive Neuroscience" in Cosmides, L. & Tooby, J.
1994. Gazzaniga, (1994).
Cummins, R. (1989), Meaning and Mental Representation,
MIT Press (Cambridge, Mass.).
Dennett, D. (1978), Brainstorms: Philosophical Essays on Mind
and Psychology, Bradford Books (Montgomery, VT).
Dennet, D. (1995), Darwin's Dangerous Idea : Evolution and
the Meanings of Life, Simon & Schuster (New York).
M. Gazzaniga (Ed.), (1994), The Cognitive Neurosciences.,
MIT Press (Cambridge, MA).
Grossberg, S. (1987), "Competitive Learning: From Interactive
Activation to Adaptive Resonance", in Cognitive Science,
11, pp. 23-63.
Gould, S. (1980), The Panada's Thumb: More Reflections in Natural
History, Norton & Co. (New York).
Haugeland, J. (1985), Artificial Intelligence: The Very Idea,
MIT Press (Cambridge, Mass.).
Quinlan, P. (1991), Connectionism and Psychology: A Psychological
Perspective on New Connectionist Research, U of Chicago Press
(Chicago, IL).
McClelland, J. Rumelhart, D. and Hinton, G. (1986), "The
Appeal of Parallel Distributed Processing", in Rumelhart
et al. (1986: pp. 3-44).
Pylyshyn, Z. (1984), Computation and Cognition, MIT Press
(Cambridge MA).
Rumelhart, D., McClelland, J. and The PDP Research Group (1986),
Parallel Distributed Processing: Explorations in the Microstructure
of Cognition, (2 Vols.), MIT Press (Cambridge, Mass.).
Sterelny, K. (1990), The Representational Theory of Mind,
Blackwell (Oxford).