Ngram Statistics Package Bibliography

Maintained by Ted Pedersen
(Last update: August 10, 2009)

If you have used the Ngram Statistics Package (or its predecessor, the Bigram Statistics Package) in work that has resulted in a paper, article, thesis, dissertation, technical report, ..., please let the world know about it by listing it here!

Add new entries online or by sending email to tpederse AT d umn edu

You might also want to check citations of NSP according to Google Scholar. Despite the availability of tools like this, we are still interested in maintaining this bibliography, so please do let us know about any of your NSP-related publications!


This bibliography lists publications that have utilized the Ngram Statistics Package, a Perl package allows users to count and measure the association between Ngrams as found in large corpora of text.

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

A

Anagnostou, N. K. and Weir, G. R. S. "Review of software applications for deriving collocations" In: ICT in the Analysis, Teaching and Learning of Languages, Preprints of the ICTATLL Workshop 2006 Glasgow, UK, August 2006. http://www.cis.strath.ac.uk/research/publications/papers/strath_cis_publication_1541.pdf

B

Baldwin, Timothy "Looking for Prepositional Verbs in Corpus Data" In: Proceedings of the 2nd ACL-SIGSEM Workshop on Linguistic Dimensions of Prepositions and their Use in Computational Linguistics Formalisms and Applications pp. 115-126, Colchester, UK, 2005. http://www.cs.mu.oz.au/~tim/pubs/sigsemprep2005.pdf

Banerjee, Satanjeev and Pedersen, Ted "The Design, Implementation, and Use of the Ngram Statistics Package" In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics pp. 370-381, Mexico City, February 2003. http://www.d.umn.edu/~tpederse/Pubs/cicling2003-2.pdf
Comments:
This is the best published description of NSP. Please use this as a reference to cite the package.

Bentivogli, Luisa and Pianta, Emanuele "Beyond Lexical Units: Enriching WordNets with Phrasets" In: Proceedings of the Research Note Sessions of the 10th Conference of the European Chapter of the Association for Computational Linguistics pp. 67-70, Budapest, Hungary, April 2003. http://tcc.itc.it/people/pianta/publications/eacl2003-phrasets.pdf

C

Calvo, Hiram and Gelbukh, Alexander and Kilgarriff, Adam "Distributional Thesaurus vs. WordNet : A Comparison of Backoff Techniques for Unsupervised PP Attachment" In: Proceedings of the Fifth International Conference on Inteligent Text Pocessing and Computational Linguistics (CICLING) pp. 177-188. Mexico City, February 2005. http://www.lexmasterclass.com/people/Publications/2005-CalvoGelbukhKilg-CICLING-PPattachThes.pdf

Costa, Luis Fernando "Esfinge - A Question Answering System in the Web using the Web" In: Proceedings of the Demonstration Session of the 11th Conference of the European Chapter of the Association for Computational Linguistics Trento, Italy, April 2006. http://acl.ldc.upenn.edu/E/E06/E06-2011.pdf

Costa, Luis "20th Century Esfinge (Sphinx) solving the riddles at CLEF 2005" In: Working Notes for the CLEF 2005 Workshop Viena, Austria, September 2005. http://www.clef-campaign.org/2005/working_notes/workingnotes2005/costa05.pdf

Costa, Luis "First Evaluation of Esfinge - a Question Answering System for Portuguese" In: Working Notes for the CLEF 2004 Workshop September 15-17, Bath, UK, 2004. http://clef.isti.cnr.it/2004/working_notes/WorkingNotes2004/48a.pdf

F

Forbes-Riley, Kate and Litman, Diane J. "Using Bigrams to Identify Relationships Between Student Certainness States and Tutor Responses in a Spoken Dialogue Corpus" In: Proceedings of 6th SIGdial Workshop on Discourse and Dialogue Lisbon, Portugal, September 2005. http://www.cs.pitt.edu/~litman/p1423.pdf
Keywords: Intelligent Tutoring

G

Galicia-Haro, Sofia and Gelbukh, Alexander "Unsupervised Learning of P NP P Word Combinations" In: Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics (CICLING) Mexico City, February 2005. http://www.gelbukh.com/lab/Publications/2005/CICLing-2005-Word-Combinations.pdf

Ghayoomi, Masood and Assi, Seyyed Mostafa "Word Prediction in a Running Text : A Statistical Language Modeling for the Persian Language" In: Proceedings of the Australasian Language Technology Workshop 2005 pp. 57-63. Sydney, Australia, December 2005. http://www.alta.asn.au/events/altw2005/cdrom/pdf/ALTA200510.pdf

H

Hannah, William P. "Automated Music Genre Classification Based on Analyses of Web-Based Documents and Listeners. Organizational Schemes" Master's paper for M.S. in L.S. degree, University of North Carolina at Chapel Hill, School of Information and Library Science, May 2005. http://hdl.handle.net/1901/205

I

Inkpen, Diana and Hirst, Graeme "Building and Using a Lexical Knowledge Base of Near-Synonym Differences" Computational Linguistics 32(2): 223-262, June 2006. http://www.site.uottawa.ca/~diana/publications/InkpenHirst_cl.pdf
Keywords: Lexicography

Inkpen, Diana Zaiu and Hirst, Graeme "Acquiring Collocations for Lexical Choice between Near-Synonyms" In: Unsupervised Lexical Acquisition: Proceedings of the Workshop of the ACL Special Interest Group on the Lexicon (SIGLEX) pp. 67-76. Philadelphia, PA, July 2002. http://acl.ldc.upenn.edu/W/W02/W02-0909.pdf

J

Joshi, Mahesh "Kernel Methods for Word Sense Disambiguation and Abbreviation Expansion in the Medical Domain" Master of Science Thesis, Department of Computer Science, University of Minnesota, Duluth. August 2006. http://www.d.umn.edu/~tpederse/Pubs/mahesh-thesis.pdf
Keywords: Supervised Word Sense Disambiguation

Joshi, Mahesh and Pakhomov, Serguei and Pedersen, Ted and Maclin, Richard and Chute, Christoper "An End-to-end Supervised Target-Word Sense Disambiguation System" In: Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-06), Intelligent Systems Demonstrations Boston, MA, July 2006. http://www.d.umn.edu/~tpederse/Pubs/aaai06-mahesh-demo.pdf
Keywords: Supervised Word Sense Disambiguation
Comments: This describes the use of the WSDGate system, which includes an interface to NSP called NSPGate.

Joshi, Mahesh and Pedersen, Ted and Maclin, Richard "A Comparative Study of Support Vectors Machines Applied to the Supervised Word Sense Disambiguation Problem in the Medical Domain" In: Proceedings of the Second Indian International Conference on Artificial Intelligence Pune, India, December 2005. http://www.d.umn.edu/~tpederse/Pubs/iicai05-joshi.pdf
Comments: Supervised Word Sense Disambiguation
Comments:
The experiments in this paper were done with the WSDShell, uses NSP to identify features.

K

Kis, Balázs and Villada Moirón, Begońa and Bouma, Gosse and Bíró, Tamás and Pohl, Gábor and Ugray, Gábor and Nerbonne, John "Methods for the Extraction of Hungarian Multi-Word Lexemes" In: Bart Decadt, Veronique Hoste, Guy De Pauw (eds.) Proceedings of Computational Linguistics in the Netherlands 2003. pp. 47-62, Amsterdam: Rodopi, 2004. http://odur.let.rug.nl/~nerbonne/papers/CLIN2004-RuG-ML.pdf

Kis, Balázs and Villada, Begońa and Bouma, Gosse and Ugray, Gábor and Bíró, Tamás and Pohl, Gábor and Nerbonne, John "A New Approach to the Corpus-based Statistical Investigation of Hungarian Multi-Word Lexemes" In: Proceedings of the 4th International Conference on Language Resources and Evaluation pp. 1677-1681, Lisbon, Portugal, May 26-28, 2004. http://odur.let.rug.nl/~nerbonne/papers/LREC_HunMWLs.pdf

Koeva, Svetla "Multi-word Term Extraction for Bulgarian" In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing pp. 59-66, Prague, Czech Repupblic, June 2007. http://www.aclweb.org/anthology/W/W07/W07-1708

Kohli, Saiyam "Introducing an Object Oriented Design to the Ngram Statistics Package" Master of Science Project, Department of Computer Science, University of Minnesota, Duluth, July, 2006. http://www.d.umn.edu/~tpederse/Pubs/saiyam-report.pdf
Comments:
This report describes the rewrite of NSP that took place froms versions 0.91 to 1.01.

Křen, Michal "Compilation of the Dictionary of Karel Čapek" In: Corpus Linguistics, Computer Tools, and Applications - State of the Art pp. 469-481, Peter Lang, 2008.

Křen, Michal "Kolokační míry a četina: srovnání na datech Českého národního korpusu (Association measures applied on Czech: comparison based on the Czech National Corpus)" In: Kolokace pp. 223 - 248, 2006.

Kulkarni, Anagha and Pedersen, Ted "Name Discrimination and Email Clustering using Unsupervised Clustering and Labeling of Similar Contexts" In: Proceedings of the Second Indian International Conference on Artificial Intelligence Pune, India, December 2005. http://www.d.umn.edu/~tpederse/Pubs/iicai05-kulkarni.pdf
Keywords: Unsupervised Context Clustering
Comments: Describes how NSP is used to identify features for clustering and cluster labeling in the SenseClusters system.

M

Marom, Yuval and Zukerman, Ingrid "Automating Help-desk Responses : A Comparative Study of Information-gathering Approaches" In: Proceedings of the ACL Workshop on Task-Focused Summarization and Question Answering pp. 40-47. Sydney, Australia, July 2006. http://acl.ldc.upenn.edu/W/W06/W06-0706.pdf

McInnes, Bridget and Pedersen, Ted and Pakhomov, Serguei "Determining the Syntactic Structure of Medical Terms in Clinical Notes" In: Proceedings of BioNLP-2007 pp. 9-16, Prague, Czech Republic, June 2007. http://www.aclweb.org/anthology/W/W07/W07-1002

McInnes, Bridget T. "Extending the Log Likelihood Measure to Improve Collocation Identification" M.S. Thesis, Department of Computer Science, University of Minnesota, Duluth, December 2004. http://www.d.umn.edu/~tpederse/Pubs/bridget-thesis.pdf
Comments:
Describes the use of the log-likelihood measure for identifying significant N-grams, where N > 2.

McInnes, Bridget Thomson and Pakhomov, Serguei and Pedersen, Ted and Chute, Christopher "Incorporating Ngram Statistics in the Normalization of Clinical Notes" In: Proceedings of MEDINFO 2004 San Francisco, CA, September 2004. http://www.d.umn.edu/~tpederse/Pubs/mcinnes_Medinfo2004.pdf
Comments:
Describes the use of NSP for spell-checking in medical domain.

Mohammad, Saif and Pedersen, Ted "Complementarity of Lexical and Simple Syntatic Features: The SyntaLex Approach to Senseval-3" In: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text Barcelona, Spain, July 2004. http://acl.ldc.upenn.edu/acl2004/senseval/pdf/mohammad.pdf
Keywords: Supervised Word Sense Disambiguation

Moissinac, Jean-Claude and Yvon, François and Ben Hazez, Slim "Automatic Indexing of Classes and Conferences" In: Proceedings of RIAO 2004 pp. 885-894, University of Avignon, France, April 2004. http://www.riao.org/sites/RIAO-2004/Proceedings-2004/papers/1430.pdf

O

Oberlander, Jon and Nowson, Scott "Whose thumb is it anyway? Classifying Author Personality from Weblog Text" In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions pp. 627-634. Sydney, Australia, July 2006. http://www.aclweb.org/anthology/P/P06/P06-2081.pdf

Oberlander, Jon and Gill, Alastair J. "Individual differences and implicit language: personality, parts-of-speech and pervasiveness" In: Proceedings of the 26th Annual Conference of the Cognitive Science Society pp 1035-1040, Chicago, August 5-7, 2004. http://www.hcrc.ed.ac.uk/~jon/papers/idc/OberlanderGill04pos2.pdf

P

Patwardhan, Siddharth and Riloff, Ellen "Learning Domain-Specific Information Extraction Patterns from the Web" In: Proceedings of the ACL 2006 Workshop on Information Extraction Beyond the Document pp. 66-73. Sydney, Australia, July 2006. http://www.cs.utah.edu/~sidd/papers/PatwardhanR06.pdf

Pedersen, Ted "Machine Learning with Lexical Features : The Duluth Approach to Senseval-2" In: Proceedings of Senseval-2 : Second International Workshop on Evaluating Word Sense Disambiguation Systems Toulouse, France, July 2001. http://www.d.umn.edu/~tpederse/Pubs/senseval2.pdf
Keywords: Supervised Word Sense Disambiguation

Pedersen, Ted "A Decision Tree of Bigrams is an Accurate Predictor of Word Sense" In: Proceedings of the Second Annual Meeting of the North American Chapter of the Association for Computational Linguistics pp. 79-86, Pittsburgh, July 2001. http://www.d.umn.edu/~tpederse/Pubs/naacl01.pdf
Keywords: Supervised Word Sense Disambiguation
Comments:
This paper explains why we often utilize bigram features in supervised word sense disambiguation, and in our unsupervised clustering approach to word sense discrimination SenseClusters.

Pestian, John and Lukasz, Itert and Wlodzislaw, Duch "Development of a Pediatric Text-Corpus for Part-of-Speech Tagging" In: Intelligent Information Systems pp. 219-226. 2004. http://www.phys.uni.torun.pl/publications/kmk/04-PediatricCorpus.pdf

Preiss, Judita "Probabilistic word sense disambiguation" Computer Speech & Language Volume 18, Issue 3, Pages 319-337 July 2004 http://www.cl.cam.ac.uk/~jp233/publications/csl03.pdf
Keywords: Supervised Word Sense Disambiguation

Purandare, Amruta and Pedersen, Ted "Word Sense Discrimination by Clustering Similar Contexts" In: Proceedings of the Conference on Computational Natural Language Learning (CoNLL) Boston, MA, May 2004. http://www.d.umn.edu/~tpederse/Pubs/conll04-purandarep.pdf
Keywords:
Unsupervised Sense Discrimination
Comments:
SenseClusters is an unsupervised system for clustering similar contexts. It uses NSP to identify lexical features. Thus, almost every paper about SenseClusters contains discussion of NSP.

R

Riloff, Ellen and Patwardhan, Siddharth, and Wiebe, Jan "Feature Subsumption for Opinion Analysis" In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP) pp. 440-448. Sydney, Australia, July 2006. http://acl.ldc.upenn.edu/W/W06/W06-1652.pdf

S

Singh, Inderjit "The Impact of Phrases on the Retrieval Effectiveness of Very Short Queries" Master of Science Thesis, Department of Computer Science, University of Minnesota, Duluth, August 2002. http://www.d.umn.edu/cs/thesis/inderjit_singh_ms.pdf

Slade, Benjamin "Split serpents and bitter blades: Reconstructing details of the PIE dragon-combat" In: Studies in the Linguistic Sciences: Illinois Working Papers pp. 1-57, 2009. https://www.ideals.uiuc.edu/handle/2142/13178

Stevenson, Suzanne and Fazly, Afsaneh and North, Ryan "Statistical Measures of the Semi-Productivity of Light Verb Constructions" In: Second ACL Workshop on Multiword Expressions : Integrating Processing pp. 1-8, Barcelona, Spain, July 2004. http://acl.ldc.upenn.edu/acl2004/mwe/pdf/stevenson.pdf

V

Varma, Nitin "Identifying Word Translations in Parallel Corpora Using Measures of Association" Master of Science Thesis, Department of Computer Science, University of Minnesota, Duluth, December 2004. http://www.d.umn.edu/~tpederse/Pubs/varma.pdf
Keywords: Machine Translation
Comments: Describes the use of NSP for identifying translations in parallel text.

Vechtomova, Olga "The Role of Multi-Word Units in Interactive Information Retrieval" In: Proceedings of the 27th European Conference on Information Retrieval pp. 403-420, Santiago de Compestela, Spain, March 2005. http://ovecht2.uwaterloo.ca/ecir05_vechtomova.pdf

Vechtomova, Olga and Karamuftuoglu, Murat "Approaches to High Accuracy Retrieval: Phrase-Based Search Experiments in the HARD track" In: Proceedings of the 12th Text Retrieval Conference (TREC) Gaithersburg, MD, November 2003. http://ovecht2.uwaterloo.ca/TREC2004.pdf

Verbree, A.T. and Rienks, R.J. and D.K.J. Heylen "First Steps Towards the Automatic Construction of Argument-Diagrams from Real Discussions" In: Proceedings of the in 1st International Conference on Computational Models of Argument pp. 183-194, Liverpool, UK, 2006. http://eprints.eemcs.utwente.nl/8290/01/comma2006.pdf

W

Walter, Stephan and Pinkal, Manfred "Automatic Extraction of Definitions from German Court Decisions" In: Proceedings of the ACL-2006 Workshop on Information Extraction Beyond The Document pp. 20-28. Sydney, Austalia, July 2006. http://acl.ldc.upenn.edu/W/W06/W06-0203.pdf

Wilmsmann, Bjoern "Re-write of Text-NSP" Unpublished Manuscript, Ruhr-University, Bochum, Germany, February 12, 2007. http://topicalizer.com/files/TextNSP/Re-write_of_Text-NSP.pdf
Comments: Describes a re-design of Text-NSP version 1.03 to enhance object oriented design and performance

van der Wouden, Ton, and Schuurman, Ineke, and Schouppe, Machteld and Hoekstra, Heleen "Harvesting Dutch trees: Syntactic properties of Spoken Dutch" In: Tanja Gaustad et al (ed.), Computational Linguistics in the Netherlands 2002, pp. 129-141, Rodopi, Amsterdam-Atlanta, 2003. http://www.ccl.kuleuven.ac.be/Papers/clin2003c.pdf

van der Wouden, Ton "Collocaties en het probleem van de corpusgrootte" In: STDH-studiedag "Internet als bron" Meertens Instituut, November 16, 2001. http://www.meertens.knaw.nl/events/stdh2001/wouden.pdf
Comments:
In Dutch.

van der Wouden, Ton "Collocational behaviour of non content words" In: Proceedings of the ACL/EACL Workshop on Collocations Toulouse, France, August 2001. http://odur.let.rug.nl/~vdwouden/docs/collabs02.ps

Z

Zeng, Qing T. and Crowell, Jonathan "Semantic Classification of Consumer Health Content" In: Mednet 2006: 11th World Congress on Internet in Medicine Toronto, October 2006. http://www.mednetcongress.org/ocs/viewpaper.php?id=57