Hearing Yourself Think:

natural language, inner speech and thought

by David Cole

Draft version 8-8-97

"Mantras were not viewed as the only means of expressing truth, however. Thought, which was defined as internalized speech, offered yet another aspect of truth. And if words and thoughts designated different aspects of truth, or reality, then there had to be an underlying unity behind all phenomena" (S. A. Nigosian 1994: World Faiths, p. 84)

The claim I will argue for is a bit less grand: namely that much thought is indeed internalized speech, and so there is an underlying unity for at least some higher cognitive capacities: overt language use (speaking and speech comprehension), and thinking abstractly. Here I will not set out the case for the thesis identifying thought with internalized speech -- that case depends on introspection and results of psychological experiments (summarized briefly below). In the following I want primarily to do two things: to reply to three objections to the inner speech account of thought, including two pressed by Jerry Fodor, and to speculate about why inner speech would be the form abstract thought takes.

First a bit more on the thesis: much human thought, esp. abstract thought, thought about the past and future, planning and theorizing, takes place in inner speech. This inner speech is imagistic. Current evidence supports the view that this speech imaging consists both of auditory images of spoken natural language, as well as subvocalization, which includes kinesthetic imaging of speech production. The former involves processes used in speech comprehension, the latter processes used in speech production.

The thesis has several historical antecedents, but differs significantly from them: in philosophy, Descartes and the British Empiricists largely conceived of thought as imagistic -- but they they conceived of thought almost exclusively in terms of visual imaging, and treated natural language simply as a medium of transmission of thought. Words named and encoded ideas, and ideas were not linguistic. Some of the behaviorists, notably Watson, hypothesized that much human thought was subvocalization. But Watson was vehemently opposed to images, and held imaging had no place in psychological explanation. Because of the more recent ascendency of mentalese accounts of thought, on the one hand, and connectionism, with its focus on low-level subsymbolic neural activity and pattern recognition, on the other, appreciation of the role of natural language in thought is not currently duly appreciated, at least among philosophers.

Arguments against natural language as the medium of thought

In The Language of Thought, Jerry Fodor considers the suggestion that natural language is the language of thought -- the language we think in.

He says: "The only thing wrong with this proposal is that it isn't possible to take it seriously." (Language of Thought (hereinafter, LOT) p. 56).

Against the supposition that we think in natural language Fodor offers two arguments. The first (LOT pp. 56-58) is just the consideration that beasts think. Animals that do not speak nevertheless think -- they learn, solve problems, understand salient facts about their environment. The second argument is an argument from learning language. This second is a type of transcendental argument.

Fodor considers the thought is inner speech thesis presumably because it considers it the strongest and most natural rival to his own thesis, that thought takes place in the medium of an innate non-natural language, Mentalese. The thought as inner speech hypotheses satisfies many of Fodor's desiderata for an internal representation system -- first, it is one, and it is productive, infinitely extensible, etc. But Mentalese is quite different from natural language. We are never consciously aware of Mentalese in thinking; it is the vantage point from which we represent everything. So the evidence for Mentalese must be indirect. Fodor then needs to counter the view that we think in plain vanilla natural language -- a view with a certain appealing unity: thinking is talking to oneself, and speech is thinking out loud. The alternative Fodor advocates involves a linguistic dualism: thought is in mentalese, speaking consists of translation into public language. Language comprehension is the reverse process. We're all polyglots.

Let's consider Fodors arguments in turn.

The argument from animals has an obvious limitation -- it is an argument from species other than our own. Nothing follows about humans without a claim about the universality of nonlinguisitc representation systems. If subhuman animals can fly, and don't use airplanes, nothing much follows about how humans fly. Nevertheless, I am sympathetic to Fodor's point. I assume that human cognitive capacities are largely an overlay on capacities that are similar to those of other animals, espcially those phylogenetically close.

Still, the point would only apply to _some_ thought, and most likely the least abstract. It is not yet clear just what the cognitive capabilities of lower animals are, but the bottom line is clear -- they are strikingly limited. Perhaps humans are not unique in using tools, but there is a very substantial gap between the tool using abilities of humans and the most proficient apes. So it may be that the abilities shared with animals account only for very concrete problem solving tied to current sensory stimulation -- not to abstract thought, conscious thought, or, to the point, the abilities that specifically underlie language use. In short, Fodor's first argument would count only against the claim that all thought, in any species, was in human natural language. No one is claiming that. The present claim is just that people do think in natural language -- some thought, especially abstract thought not tied directly to current perception, has as its medium natural language.

Fodor's second argument cuts deeper because it does support the thesis that very substantial prelinguistic abilities underlie all human linguistic ability and language acquisition. Fodor's thesis here (LOT pp. 58-64) is that language learners must form hypotheses about the meaning of terms in their acquired language. So they must have the wherewithal to represent the truth conditions of the terms of natural language. Thus Fodor defends the position that Augustine appears to take in the brief passage Wittgenstein uses as a foil in the beginning of the Philosophical Investigations.

Fodor repeatedly points to the situation with computers. Computers have a built-in machine "language"; a set of operations that are performed whenever certain binary strings appear in the right place -- the syntactic (electrical) properties of the binary code cause certain events to occur such as addition, equality checking, conditional jumps to new sections of the binary code, and so forth. This is the native instruction set of the computer. The programming languages that modern computer programmers use are much higher level (and more mneumonic for humans); they are translated into the machine language of the computer by special programs, the compilers or interpreters. The same computer can run programs written in several different programming languages -- as long as it has a compiler that allows it to "understand" the programming language -- that is, translate it into its native machine code. Fodor appears to take this as the model for understanding human language -- natural spoken and written languages such as English are like Fortran and C++; they must be converted to a native language to be understood or used in thought, inference, etc.

In a computer, the machine language strictly sets the boundaries the computational capabilities of the machine -- if a problem can't in principle be solved by program written directly in the machine language of a particular computer, it can't be solved in any higher level language program running on that machine either. The higher level programming languages do not create any new computational capacities. Fodor has concluded the same applies to humans -- all of the cognitive capacities of the human are in the language of thought; all the conceptual capacities are implicit in the representational capacities of the innate code, mentalese.

At first blush, this seems an incredibly strong claim. Every concept that I have acquired, every concept that I will acquire, every possible concept that any human might ever acquire is implicit, in the inborn conceptual set.

But let us stick with the machine analogy for a bit longer. As far as we know, machines can potentially produce all the overt linguistic behavior of humans, and so provide all the evidence of understanding, cognitive capacity, and inference potency of humans. At least, even critics of the cognitive capacity of machines, such as Searle, concede this, for the sake of argument (such critics don't the programmed machine can have "genuine" understanding, but this is a nicety I won't pursue here). All that is needed for a computationalist or functionalist such as myself, and I think Fodor, is that the machine support the same counterfactuals describing behavior, overt inferences relected in output, etc.

But a computer programmed to have overt cognitive capacity equivalent to humans will still have the same native capacity as it had before being programmed. And what is that capacity? On the order of about 100 syntactic operations. That is the entire native capacity for a machine that runs expert systems that deal with the niceties of blood diseases, petrogeology, etc.

A grammar for English, and a vocabulary larger than any human English speaker, along with orthography and lexical definitions, can sit in an off the shelf laptop computer. The reason is this: we can build representations, models, of just about anything, including the syntax and semantics of English, from a very small set of logical primitives. Recursion makes simple operations go a (very) long way.

So I think we can concede the nativist point, put this backhand way: Humans take linguistic strings as input and produce linguistic strings as output. They must have the native capacity to do that. They must have the native capacity to represent the rules by which those transformations take place -- that is, they must be able to produce the (possibly rule-governed) syntactic operations that convert linguistic input into output. But there is no evidence so far that this involves more than a handful of basic string manipulation operations -- just as in computers. All that must be represented is a series of syntactic steps that will transform string A into String B.

(The point is even stronger on connectionist models -- all learning is changing the weightings in an innate network.)

An argument against mentalese as the language of thought

As I understand the theory that invokes mentalese, we have innate ideas of say carburetor, justice, charisma, etc -- or at least the concepts which by composition can form the latter. Now I don't suppose that any non-human animals can have any of these concepts -- but why not? Presumably on the mentalese account, there are some holes, as compared to humans, in their innate conceptual structure -- but what are the missing pieces? What innate concepts do humans have that dogs lack that prevent the latter from acquiring a concept of carburetor or binary number system? We need an much more complete account of conceptual composition than we have to make the mentalese position plausible.

As I understand the mentalese theory, mentalese is a representational system capable of supporting all human inference. Natural language is essentially a communication code. As mentioned above, this view of the relation of natural language to thought was shared by Descartes and Locke. All thoughts thinkable could be thought without the acquisition of natural language - although one would have to keep them to oneself.

But this view raises the following expectation: all thinking would be in unconscious mentalese, the native code. Language would then play only the role of communication medium, much as Morse code did for telegraphed messages. A module would translate say English into mentalese, one would think things through, then an output module would produce English code. Interference with those input and output modules would have no effect on thought itself.

But that's not the way things are. Both phenomenology and experiment are relevant here. If mentalese were the language of thought, natural language would not play a role in thought. Since the representational power of mentalese is at least as great as natural langauge, and it is native, the machine langauge, the role of natural language would be the same as it typically is in a computer program -- confined to the interface with the external world. But natural language appears to play an integral role in at least some thought. Interference with imaging or subvocalization interferes with cognition. And it appears to play a role as an acoustic image of natural language. Too much noise, and one can't hear oneself think. On the mentalese hypothesis, this is totally mysterious -- images are generally hard to process, especially compared to symbolic codes designed to support inference. And since on the Mentalese account, thought takes place in a distinct medium from public language, interfering with subvocalization, as by involving articulatory musculature in other tasks, should have no effect on memory, list recall, or other cognitive tasks. Interference with auditory imaging, as requiring subjects to image a musical tune, should not affect cognition if carried out in non-imagistic Mentalese. But many experiments, as well as ordinary experience, display just such interference.

The argument here parallels an argument against dualism. Dualism allocates cognitive function to a non-physical entity. The physical acts only as input-output handler for Res Cogitans. But this runs in the face of myriad troubling empirical evidence -- alcohol does indeed affect input-output -- but it also essentially affects allegedly non-physical processes, such as inference. If function were allocated as the (Cartesian) dualist would have it, alcohol could only affect input-output coding, not thought itself (or personality, will, etc.) A Cartesian dualist would expect alcohol to be of use in a rape -- but not a seduction.

And alcohol is just one of a long series of embarassments for anyone who seriously thinks thinking ain't in the head. Other drugs, cerebral hemmhorages, brain injuries etc. have effects that point to the essential role of the brain in all cognitive function.

Same for mentalese -- if thinking is in mentalese rather than natural language, then events that impair linguistic function or auditory imaging should not affect thinking. But they do.

Another problem with the Mentalese hypothesis is that it has the apparent consequence, in relegating natural language to a mere transmission medium inessential to thought, that pre-linguistic humans should be fully capable, of say, coming up with the theory of General Relativity, or a Mentalese equivalent of Hamlet, without any recourse or experience with natural language or mathematical symbolism. They don't.

Unconscious Thought

Someone might concede that natural language, in the form of conscious acoustic imagery, plays an essential role in conscious thought. But much thought is not conscious. Acoustic imagery can't play a role there -- unconscious thought, at least, is the domain of mentalese.

But this inference depends on the supposition that all images must be conscious, and so images cannot play a role in unconscious thought. The images that we are self-aware of having are ipso facto conscious, but the representations that underlie the conscious images need not be. That is, I might have an internal representation of the acoustic form of, say, the English "All men are created equal". If I am conscious of that occurrent representation, I experience the acoustic image. But I might have the image, the non-propositional internal representation, while not being conscious of having the representation. It would still be an acoustic representation or image in that it would encode acoustic properties of the sentence. And, to the point, these might be causally efficacious in producing the transformations that lead to the next thought, and finally my overt behavior. Thus unconsious thought processes might involve acoustic or subvocaliztion images.

A Third ojection to thinking in natural language: ambiguity

This objection is a point made somewhere, and in another context, by Zeno Vendler; no doubt others have noticed it as well. Thought isn't ambiguous; natural language often is. If I think "I better go to the bank", it will not be ambiguous what sort of bank I am considering - or even which bank, among all the financial institutions. If I say the same thing, it may well be ambiguous.

True enough. Spoken or written natural language is often ambiguous, thought isn't. If thought were acoustic images of natural language, what could disambiguate it? The answer, I think, lies in the fact that the images are embedded in an underlying production process. When I mutter "I had better go to the bank", the uttered "bank" is not ambiguous for me, I do not wonder where it is I say I should go. I am disposed to go to a particular financial institution, the home of my checking account, the one I was at last week, the one I go by 2 blocks before I get to the grocery, etc. I have all these associations with "the bank". And my internal tokening is causally connected in analogous ways with one bank rather than another -- just one bank is the bank I withdrew money from a week ago, and is the one I went by as I was 2 blocks from the grocery, etc. In sum, the ambiguity of natural language is an epistemic problem that may afflict auditors, but does not ordinarily afflict language producers -- whether their tokening is public or not. Whatever the correct account of speaker meaning determination is, it appears it can account for internal tokenings as well as public ones.

Whorf Hypothesis

The Whorf hypothesis is that natural language affects thought. I have never found the hypothesis congenial, yet it appears to be a direct consequence of the claim that natural language is actually the medium of (much) thought.

Some easing of the psychological tension is possible. For one thing, Whorf seems to have been preoccupied with the lexicon -- with whether a language had distinct words for different phenomena, e.g. types of snow, or not. But it seems unwise to think of the expressive power of languages in terms of lexicon count. What does it matter if users of one language distinguish powder snow from hardpack snow from fluffy snow from wet snow, and another language has a single word for each of these? Or one speaker distinguishes bluish green and another has the lexical item bluegreen. And these things change over time -- yesterday's self-contained underwater breathing apparatus is today's scuba.

At the same time, I now think there is clearly some form of Whorfian hypothesis that is correct. What you can say affects what you can think. Mastery of the vocabulary of, say, algebra makes possible thoughts that, at least in practice, are impossible to someone who has not learned this extension of natural language. Learning the vocabulary of data processing is necessary for entertaining certain thoughts about computer and computation. Thoughts about virtual machines and program verification etc. depend upon mastery of certain ways of speaking. And non-euclidean geometries and other public symbol systems make the General Theory of Relativity possible. (My view here is in line with but stronger than Michael Devitt and Kim Sterelney, 1984, Language and Reality, Chapter 10 "Linguistic Relativity". They still accord "ultimate priority" to thought (p. 174), and hold that extensions of language can facillitate, but are not essential for, any domain of thought.)

What these things areas have in common is that there are no good alternatives -- if one wants to understand algebra, or data processing, everyone learns essentially equivalent extensions of language. Some versions of the Whorf hypothesis concentrate too much on superficial features of language, like the number of distinct single words, or how tenses are marked. But this much truth remains: the absence of entire domains from a public language (or an idiolect) will be reflected by corresponding cognitive deficiencies.

Why do we do it?

If one thinks that at least some thinking takes place in natural language, and one thinks of this phenomenon, as I do, as acoustic (or kinesthetic) images, it is natural to wonder: why we do it? The answer here is pure speculation, but I hope it has a certain appeal. In mastering natural language, one learns to respond appropriately to speech. One hears something, and responds with appropriate questions about the implications of what has been said, or with objections, or with concurrence. It's a cheap evolutionary trick to appropriate the social skill of responding usefully to spoken speech to one's own private use in thought. We harness our ability to respond to spoken language by simulating spoken language internally.

We could do this by "thinking out loud" -- producing the speech and upon hearing it, producing a followup and so forth. And so it is with small children. At least some of their thought is completely in the public domain, consisting of running patter. Its an acquired skill, like learning how to read without moving ones lips, that completely internalizes thinking out loud. But the underlying processes remain the same -- speech comprehension and production, with an intervening auditory image rather than physical sound. This allows us to chain and build on our single step ability to produce the next gambit in a conversation.

We can't go further, presumably as a limitation of our ability to process speech. Just as we can't identify the 5th note in the Star Spangled Banner without humming/singing/playing/imaging the first 4, we can't identify the implications, say the appropriate response to an appropriate response to an appropriate response to "The President signed a bill cutting capital gains taxes" without going through the intervening steps -- the sentences in natural language. We have a limited capacity system (maybe only so many connectionist layers) that can come produce a followup, such as n implication, of a remark in spoken natural language. We exploit this limited system by recursion, feeding its output back as input. Learning to think abstractly is largely learning tricks with words.

Empirical Implications

There are currently many exciting developments in brain imaging. One of the implications of the inner speech model of thought is that the same centers will be involved in abstract thought as are involved in spoken language production and comprehension. In particular, the same centers should be involved in abstract thought should, on my account, be those involved in forced inner speech, such as rhyme detection, syllable sequence detection in a recalled phrase (what's the third syllable in the the first line of the Gettysberg Address, or in Twinkle-Twinkle Little Star?) Thus the thesis has empirical implications that currently await test.

There are other less technologically dependent implications, and some of these have been explored. Most interesting are previously mentioned studies exploring the nature of inner speech through interference phenomena, and studies of thought in the deaf. See Reisberg.

Conclusion: Philosophical Implications

Philosophy in the Twentieth Century has prided itself on moving away from the naive imagistic theories of thought characteristic of the British empiricists. It has moved to other things: overt behavior, ordinary language, the private language of mentalese, the subsymbolic processes of connectionism. But I think a baby is in the chucked bathwater. Much important human thought is imagistic -- but it is auditory and kinesthetic rather than visual. And it is linguistic. We learn the language of thought at our mother's knee -- it's our mother tongue.