Draft of Review that appears in August 2004 issue of Minds and Machines.
http://www.kluweronline.com/issn/0924-6495/contents

------------------------ C U T   H E R E --------------------------

Yael Ravin and Claudia Leacock (editors),
{Polysemy: Theoretical and Computational Approaches},
New York: Oxford University Press,
2000, 
x+227 pp., 
$74.00 (cloth),
ISBN 0-19-823842-8,
$24.95 (paper)
ISBN 0-199-25086-3 

Most words in natural language have more than one possible
meaning. This seemingly simple observation leads to tremendous
challenges in theoretical and computational linguistics, as is clearly
shown in a volume of ten newly commissioned articles entitled
"Polysemy: Theoretical and Computational Approaches", edited by Yael
Ravin and Claudia Leacock.      

Words may be thought of as occupying a spectrum of meaning, with
synonyms at one end and homonyms at the other. Synonyms are different
word forms that refer to the same concept, like when {bank} and
{shore} refer to the side of a river. A homonym is a single word form
that refers to multiple distinct concepts. For example, {bank} is a
homonym when it refers to a financial institution or to the side of a
river. These are completely distinct concepts that happen to be
represented by the same string of characters. While there are
clear-cut synonyms and homonyms in natural language, most words lie
somewhere in between and are said to be polysemous. {bank} is
polysemous when it refers to a financial institution or a blood blank,
since these are related, but not identical, meanings. 

The scope of work represented in this volume is impressive. Chapters
2, 3, 4, and 6 focus on linguistically oriented studies of lexical
semantics. Chapters 5, 7, and 8 offer critiques of current
dictionaries and lexicography, and Chapters 9, 10, and 11 present
computational approaches to representing word meanings. Given this
wide variety, the editors have very wisely provided an extensive
introduction in Chapter 1. This is invaluable to any reader who is not
familiar with lexical semantics, lexicography, or computational
modeling (and it will be an unusual reader who is expert in all three).    

The difficulty in making precise distinctions in word meanings is
taken up in Chapter 2, "Aspects of the Micro-structure of Word
Meanings", by D. Alan Cruse. Cruse argues that words can not be
defined independent of their context, and that the set of
context-invariant semantic properties of words is insufficiently large
to act as an adequate foundation upon which to specify the meanings of
words. Rather than relying on discrete cutoff points to characterize
the semantics of a word, he advocates a continuum or gradient scale
that allows word meanings to be related more flexibly. This article
sets the stage for many that follow, since the difficulty of making
precise distinctions in word senses is what makes lexicography a
challenging enterprise, and motivates much work in computational
modeling of word meanings based on evidence found in large corpora.     

An unusual form of polysemy is discussed in Chapter 3,
"Autotroponomy", by Christiane Fellbaum. Autotroponomy occurs when the
sense of a polysemous verb can not be predicted based on a more
general use of that verb. Fellbaum introduces this idea via the verb
{behave}. When used without any arguments, as in {The children
behaved},  it means {behaved well}. However, the presence of adverbs
in more specific settings changes the meaning unpredictably, e.g., 
{The children behaved well/badly/impossibly}. While there are a small
number of such verbs, their behavior is irregular and resists
characterization by any of the well known rules of regular
polysemy. However, Fellbaum shows that there are generalizing
principles that govern this phenomenon.     

Chapter 4, "Lexical Shadowing and Argument Closure",  by James
Pustejovsky, also focuses on verbs, and identifies behavior that the
author calls lexical shadowing. This occurs when a verb carries with
it some implicit information about itself that is expressed in a
nearby noun phrase. Pustejovsky gives the example of the verb
{butter}, which carries with it implicit information about the
substance being spread. However, he goes on to show that lexical
shadowing is not restricted to verb-noun cognates like {butter}, and
occurs much more widely than previously thought. 

Chapter 6, "'The Garden Swarms with Bees' and the Fallacy of 'Argument
Alternation'", by David Dowty, deals with argument alternation, which 
occurs when {Mary wrote that book} is made passive, as in {The book was  
written by Mary}. Dowty argues that conventional practice is to treat  
these as different representations of the same underlying meaning.  
However, he shows that these alternations can represent significantly  
different semantics.   

While the early chapters of this volume focus on lexical semantics and
linguistics, dictionaries and the practice of lexicography come under
scrutiny in the middle. The inclusion of chapters on lexical semantics
and lexicography in the same volume is a significant contribution of
the editors, since these disciplines do not intersect as often as one
might expect. This may seem curious since both are concerned with
words and their meanings. However, the underlying methodologies are
quite different. Research in lexical semantics tends to be built around 
introspectively created examples that exhibit particular phenomena in
very precise ways, while the distinctions in meanings described in a
dictionary are based on the observed use of words in naturally
occurring text.  

Lexicography has had an empirical flavor since long before the advent
of computers and online corpora. For example, the development of the
Oxford English Dictionary was made possible by the work of thousands
of volunteer readers, who recorded interesting usages of words on
small slips of paper that were then sent to Oxford. The editors
would then categorize these usages relative to their current
understanding of a word, and determine if the existing inventory of
senses was adequate or if it needed to be expanded or revised based on
this new evidence. The process of creating the OED is colorfully
described in Winchester (1998), while a more general and scholarly
accounting of lexicography can be found in Landau (1984). Given the
long tradition of corpus-based work in lexicography, one might expect
that the fruits of this labor, i.e., our modern day dictionaries,
would be held up as exemplars for the computational linguistics
community, which has only recently embraced corpus-based
methods. Interestingly enough, all three articles that address
dictionaries and lexicography are somewhat critical.      

Chapter 5, "Describing Polysemy", by Charles J. Fillmore and
B.T.S. Atkins, examines four widely used used dictionaries and shows
that the verb {crawl} is defined in different ways in each, and that
there is only one sense upon which all four dictionaries agree. While
they acknowledge the corpus-based nature of modern lexicography, they
point out that there are empirically observed usages of {crawl} that
are not included in these dictionaries. They suggest that the
commercial pressures faced by lexicographers make inconsistencies in
dictionary sense inventories inevitable. Fillmore and Atkins propose a
corpus-based methodology that would allow "comprehensive and
internally consistent" descriptions of word meanings to be
developed. They evaluate their approach with a fine grained
cross-linguistic matching of the senses of the English verb {crawl}
with the French verb {ramper}.  

Chapter 7, "Polysemy: A Problem of Definition", by Cliff Goddard, raises  
another concern, that of ambiguous and circular definitions in 
dictionaries. Goddard critiques definitions of {wrong}, {love} and {send}  
as found in various commercial dictionaries. He goes on to show how such  
flawed entries might be more clearly written using the Natural 
Semantic Metalanguage, a set of semantic primitives developed by Anna  
Wierzbicka over the last 25 years. See Wierzbicka (1996) for a detailed 
exposition of NSM.  

Chapter 8, "Lexical Representations for Sentence Processing", by
George A. Miller and Claudia Leacock, suggests that dictionaries are not 
adequate theories of lexical semantics since they lack information
about how a word should be used in context. The authors point to a study  
by Miller and  Gildea (1987) that shows that when a child uses a
dictionary meaning to construct a sentence using an unfamiliar word,
often times the result is incorrect. Miller and Leacock propose that an  
adequate representation of context must include both the local context in  
which a word occurs, as well as the more general topical context of the 
surrounding text. The efficacy of topical features is shown via their own  
computational experiments, which attained reasonable, yet not optimal  
results. They also point to previous work by Kaplan (1955) and Choueka and  
Lusignan (1985), that shows that humans only require a few words of  
surrounding context to determine the meaning of a word. They suggest that  
a fruitful area of future work is in the combination of local and topical  
contextual information, and in fact during the publication of this volume  
that work came to fruition and resulted in state-of-the-art accuracy over  
a small set of words. See Leacock, et. al. (1998) for further details. 

Dictionaries are imperfect, and the points raised in the  preceding
chapters are all important. However, in their defense lexicographers
might argue that discriminating among senses and writing definitions
that capture fine shades of meaning is more art than science, and that it   
might simply not be possible to provide a single authoritative set of  
meanings for a word. The inherent challenge of the task is compounded by  
practical considerations such as the intended audience and time and space  
limitations. Kilgarriff (1997) provides a detailed discussion of these  
issues from the lexicographer's point of view, and raises still other  
concerns.  

This volume concludes with three chapters that discuss computational
approaches to representing and discovering word meanings. 

Chapter 9, "Large Vocabulary Word Sense Disambiguation", by Mark
Stevenson and Yorick Wilks, presents an algorithm for word sense
disambiguation that assigns senses to words based on a combination of
individually weak information sources. These include part-of-speech
tags of neighboring words, key words from a wide window of surrounding
context, and definitions and subject codes taken from Longman's
Dictionary of Contemporary English. This approach is based on the well
established AI paradigm of combining weak information sources to
achieve strong results. Stevenson and Wilks report over 90% accuracy
in assigning senses to the words in five Wall Street Journal
articles, which comprise approximately 1,700 total words.  

Chapter 10, "Polysemy in a Broad-Coverage Natural Language Processing
System", by William Dolan et. al., describes the MindNet project that
has been under development at Microsoft Research since the early
1990's.  MindNet seeks to represent the meaning words based on the
context in which they occur, in order to avoid using discrete sense
inventories as found in dictionaries. This allows for shades of
meaning to be represented, and is very much influenced by the work of
D. Alan Cruse, the author of Chapter 2. To achieve this representation,
large corpora of text are analyzed to identify contexts that
discriminate among the senses of a word. In addition to statistical
information extracted from large corpora,  MindNet represents context
using linguistic information obtained via parsing machine readable
dictionaries. As such MindNet represents a hybrid statistical-symbolic
system, and ultimately treats the meaning of a word as the most
closely matching context in the MindNet database.   

Chapter 11 "Disambiguation and Connectionism", by Hinrich Sch\"utze,
is also based on the idea that the meaning of a word depends on the
context in which it occurs. Specifically, this follows the hypothesis
of Miller and Charles (1991) that holds that a sense of a word is
"simply a group of occurrences with similar contexts." Sch\"utze
describes a method of context group discrimination that clusters word
meanings together by relying on contextual information that can
be extracted from a large corpus of text. He also shows how this
technique can be used to improve the performance of an information
retrieval application. This evaluation is noteworthy in that it shows
the effect of word sense discrimination on a real world task, whereas
much work in this area is evaluated as if disambiguation was the end
product. In fact grouping related words together and assigning meaning
are potential building blocks for much larger applications, and
evaluations that show their impact on such tasks are particularly
valuable. 

To conclude, this is an exceptional volume that has much to recommend
it. The coverage is broad, and brings together leading researchers in
lexical semantics, lexicography, and computational modeling in a
single volume. Despite the wide scope, there is sufficient detail in
each contribution and in the introduction so that an interested reader
knows exactly where they can find additional information. "Polysemy:
Theoretical and Computational Approaches" is a worthwhile addition to
the library of anyone who works on problems that are affected by the
slippery nature of word meanings. 

{Department of Computer Science}                        TED PEDERSEN
{University of Minnesota, Duluth}
{Duluth, MN 55812, U.S.A.}

Choueka, Y. and Lusignan, S. (1985), 'Disambiguation by Short Contexts',
Computers and the Humanities 19, pp. 147-157

Kaplan, A. (1955), 'An experimental study of ambiguity and context',
Mechanical Translation 2, pp. 39-46

Kilgarriff, A. (1997), 'I don't believe in word senses', Computers and
the Humanities 31, pp. 91--113    

Landau, S. (1984), Dictionaries: The Art and Craft of Lexicography,
Cambridge: Cambridge University Press

Leacock, C. and Chodorow, M. and Miller, G. (1998), 'Using Corpus
Statistics and WordNet Relations for Sense Identification',
Computational Linguistics, 24, pp. 147-165 
                          
Miller, G. and Gildea, P. (1987), 'How children learn words', 
Scientific American 257, pp. 94-99

Miller, G. and Charles, W. (1991), 'Contextual correlates of
semantic similarity', Language and Cognitive Processes, pp. 1-28 

Wierzbicka, A. (1996), Semantics: Primes and Universals, Oxford:
Oxford University Press

Winchester, S. (1998), The Professor and the Madman, New York:
HarperCollins Publishers