Draft of Review that appears in August 2004 issue of Minds and Machines. http://www.kluweronline.com/issn/0924-6495/contents ------------------------ C U T H E R E -------------------------- Yael Ravin and Claudia Leacock (editors), {Polysemy: Theoretical and Computational Approaches}, New York: Oxford University Press, 2000, x+227 pp., $74.00 (cloth), ISBN 0-19-823842-8, $24.95 (paper) ISBN 0-199-25086-3 Most words in natural language have more than one possible meaning. This seemingly simple observation leads to tremendous challenges in theoretical and computational linguistics, as is clearly shown in a volume of ten newly commissioned articles entitled "Polysemy: Theoretical and Computational Approaches", edited by Yael Ravin and Claudia Leacock. Words may be thought of as occupying a spectrum of meaning, with synonyms at one end and homonyms at the other. Synonyms are different word forms that refer to the same concept, like when {bank} and {shore} refer to the side of a river. A homonym is a single word form that refers to multiple distinct concepts. For example, {bank} is a homonym when it refers to a financial institution or to the side of a river. These are completely distinct concepts that happen to be represented by the same string of characters. While there are clear-cut synonyms and homonyms in natural language, most words lie somewhere in between and are said to be polysemous. {bank} is polysemous when it refers to a financial institution or a blood blank, since these are related, but not identical, meanings. The scope of work represented in this volume is impressive. Chapters 2, 3, 4, and 6 focus on linguistically oriented studies of lexical semantics. Chapters 5, 7, and 8 offer critiques of current dictionaries and lexicography, and Chapters 9, 10, and 11 present computational approaches to representing word meanings. Given this wide variety, the editors have very wisely provided an extensive introduction in Chapter 1. This is invaluable to any reader who is not familiar with lexical semantics, lexicography, or computational modeling (and it will be an unusual reader who is expert in all three). The difficulty in making precise distinctions in word meanings is taken up in Chapter 2, "Aspects of the Micro-structure of Word Meanings", by D. Alan Cruse. Cruse argues that words can not be defined independent of their context, and that the set of context-invariant semantic properties of words is insufficiently large to act as an adequate foundation upon which to specify the meanings of words. Rather than relying on discrete cutoff points to characterize the semantics of a word, he advocates a continuum or gradient scale that allows word meanings to be related more flexibly. This article sets the stage for many that follow, since the difficulty of making precise distinctions in word senses is what makes lexicography a challenging enterprise, and motivates much work in computational modeling of word meanings based on evidence found in large corpora. An unusual form of polysemy is discussed in Chapter 3, "Autotroponomy", by Christiane Fellbaum. Autotroponomy occurs when the sense of a polysemous verb can not be predicted based on a more general use of that verb. Fellbaum introduces this idea via the verb {behave}. When used without any arguments, as in {The children behaved}, it means {behaved well}. However, the presence of adverbs in more specific settings changes the meaning unpredictably, e.g., {The children behaved well/badly/impossibly}. While there are a small number of such verbs, their behavior is irregular and resists characterization by any of the well known rules of regular polysemy. However, Fellbaum shows that there are generalizing principles that govern this phenomenon. Chapter 4, "Lexical Shadowing and Argument Closure", by James Pustejovsky, also focuses on verbs, and identifies behavior that the author calls lexical shadowing. This occurs when a verb carries with it some implicit information about itself that is expressed in a nearby noun phrase. Pustejovsky gives the example of the verb {butter}, which carries with it implicit information about the substance being spread. However, he goes on to show that lexical shadowing is not restricted to verb-noun cognates like {butter}, and occurs much more widely than previously thought. Chapter 6, "'The Garden Swarms with Bees' and the Fallacy of 'Argument Alternation'", by David Dowty, deals with argument alternation, which occurs when {Mary wrote that book} is made passive, as in {The book was written by Mary}. Dowty argues that conventional practice is to treat these as different representations of the same underlying meaning. However, he shows that these alternations can represent significantly different semantics. While the early chapters of this volume focus on lexical semantics and linguistics, dictionaries and the practice of lexicography come under scrutiny in the middle. The inclusion of chapters on lexical semantics and lexicography in the same volume is a significant contribution of the editors, since these disciplines do not intersect as often as one might expect. This may seem curious since both are concerned with words and their meanings. However, the underlying methodologies are quite different. Research in lexical semantics tends to be built around introspectively created examples that exhibit particular phenomena in very precise ways, while the distinctions in meanings described in a dictionary are based on the observed use of words in naturally occurring text. Lexicography has had an empirical flavor since long before the advent of computers and online corpora. For example, the development of the Oxford English Dictionary was made possible by the work of thousands of volunteer readers, who recorded interesting usages of words on small slips of paper that were then sent to Oxford. The editors would then categorize these usages relative to their current understanding of a word, and determine if the existing inventory of senses was adequate or if it needed to be expanded or revised based on this new evidence. The process of creating the OED is colorfully described in Winchester (1998), while a more general and scholarly accounting of lexicography can be found in Landau (1984). Given the long tradition of corpus-based work in lexicography, one might expect that the fruits of this labor, i.e., our modern day dictionaries, would be held up as exemplars for the computational linguistics community, which has only recently embraced corpus-based methods. Interestingly enough, all three articles that address dictionaries and lexicography are somewhat critical. Chapter 5, "Describing Polysemy", by Charles J. Fillmore and B.T.S. Atkins, examines four widely used used dictionaries and shows that the verb {crawl} is defined in different ways in each, and that there is only one sense upon which all four dictionaries agree. While they acknowledge the corpus-based nature of modern lexicography, they point out that there are empirically observed usages of {crawl} that are not included in these dictionaries. They suggest that the commercial pressures faced by lexicographers make inconsistencies in dictionary sense inventories inevitable. Fillmore and Atkins propose a corpus-based methodology that would allow "comprehensive and internally consistent" descriptions of word meanings to be developed. They evaluate their approach with a fine grained cross-linguistic matching of the senses of the English verb {crawl} with the French verb {ramper}. Chapter 7, "Polysemy: A Problem of Definition", by Cliff Goddard, raises another concern, that of ambiguous and circular definitions in dictionaries. Goddard critiques definitions of {wrong}, {love} and {send} as found in various commercial dictionaries. He goes on to show how such flawed entries might be more clearly written using the Natural Semantic Metalanguage, a set of semantic primitives developed by Anna Wierzbicka over the last 25 years. See Wierzbicka (1996) for a detailed exposition of NSM. Chapter 8, "Lexical Representations for Sentence Processing", by George A. Miller and Claudia Leacock, suggests that dictionaries are not adequate theories of lexical semantics since they lack information about how a word should be used in context. The authors point to a study by Miller and Gildea (1987) that shows that when a child uses a dictionary meaning to construct a sentence using an unfamiliar word, often times the result is incorrect. Miller and Leacock propose that an adequate representation of context must include both the local context in which a word occurs, as well as the more general topical context of the surrounding text. The efficacy of topical features is shown via their own computational experiments, which attained reasonable, yet not optimal results. They also point to previous work by Kaplan (1955) and Choueka and Lusignan (1985), that shows that humans only require a few words of surrounding context to determine the meaning of a word. They suggest that a fruitful area of future work is in the combination of local and topical contextual information, and in fact during the publication of this volume that work came to fruition and resulted in state-of-the-art accuracy over a small set of words. See Leacock, et. al. (1998) for further details. Dictionaries are imperfect, and the points raised in the preceding chapters are all important. However, in their defense lexicographers might argue that discriminating among senses and writing definitions that capture fine shades of meaning is more art than science, and that it might simply not be possible to provide a single authoritative set of meanings for a word. The inherent challenge of the task is compounded by practical considerations such as the intended audience and time and space limitations. Kilgarriff (1997) provides a detailed discussion of these issues from the lexicographer's point of view, and raises still other concerns. This volume concludes with three chapters that discuss computational approaches to representing and discovering word meanings. Chapter 9, "Large Vocabulary Word Sense Disambiguation", by Mark Stevenson and Yorick Wilks, presents an algorithm for word sense disambiguation that assigns senses to words based on a combination of individually weak information sources. These include part-of-speech tags of neighboring words, key words from a wide window of surrounding context, and definitions and subject codes taken from Longman's Dictionary of Contemporary English. This approach is based on the well established AI paradigm of combining weak information sources to achieve strong results. Stevenson and Wilks report over 90% accuracy in assigning senses to the words in five Wall Street Journal articles, which comprise approximately 1,700 total words. Chapter 10, "Polysemy in a Broad-Coverage Natural Language Processing System", by William Dolan et. al., describes the MindNet project that has been under development at Microsoft Research since the early 1990's. MindNet seeks to represent the meaning words based on the context in which they occur, in order to avoid using discrete sense inventories as found in dictionaries. This allows for shades of meaning to be represented, and is very much influenced by the work of D. Alan Cruse, the author of Chapter 2. To achieve this representation, large corpora of text are analyzed to identify contexts that discriminate among the senses of a word. In addition to statistical information extracted from large corpora, MindNet represents context using linguistic information obtained via parsing machine readable dictionaries. As such MindNet represents a hybrid statistical-symbolic system, and ultimately treats the meaning of a word as the most closely matching context in the MindNet database. Chapter 11 "Disambiguation and Connectionism", by Hinrich Sch\"utze, is also based on the idea that the meaning of a word depends on the context in which it occurs. Specifically, this follows the hypothesis of Miller and Charles (1991) that holds that a sense of a word is "simply a group of occurrences with similar contexts." Sch\"utze describes a method of context group discrimination that clusters word meanings together by relying on contextual information that can be extracted from a large corpus of text. He also shows how this technique can be used to improve the performance of an information retrieval application. This evaluation is noteworthy in that it shows the effect of word sense discrimination on a real world task, whereas much work in this area is evaluated as if disambiguation was the end product. In fact grouping related words together and assigning meaning are potential building blocks for much larger applications, and evaluations that show their impact on such tasks are particularly valuable. To conclude, this is an exceptional volume that has much to recommend it. The coverage is broad, and brings together leading researchers in lexical semantics, lexicography, and computational modeling in a single volume. Despite the wide scope, there is sufficient detail in each contribution and in the introduction so that an interested reader knows exactly where they can find additional information. "Polysemy: Theoretical and Computational Approaches" is a worthwhile addition to the library of anyone who works on problems that are affected by the slippery nature of word meanings. {Department of Computer Science} TED PEDERSEN {University of Minnesota, Duluth} {Duluth, MN 55812, U.S.A.} Choueka, Y. and Lusignan, S. (1985), 'Disambiguation by Short Contexts', Computers and the Humanities 19, pp. 147-157 Kaplan, A. (1955), 'An experimental study of ambiguity and context', Mechanical Translation 2, pp. 39-46 Kilgarriff, A. (1997), 'I don't believe in word senses', Computers and the Humanities 31, pp. 91--113 Landau, S. (1984), Dictionaries: The Art and Craft of Lexicography, Cambridge: Cambridge University Press Leacock, C. and Chodorow, M. and Miller, G. (1998), 'Using Corpus Statistics and WordNet Relations for Sense Identification', Computational Linguistics, 24, pp. 147-165 Miller, G. and Gildea, P. (1987), 'How children learn words', Scientific American 257, pp. 94-99 Miller, G. and Charles, W. (1991), 'Contextual correlates of semantic similarity', Language and Cognitive Processes, pp. 1-28 Wierzbicka, A. (1996), Semantics: Primes and Universals, Oxford: Oxford University Press Winchester, S. (1998), The Professor and the Madman, New York: HarperCollins Publishers