Complete List of Publications

This is a complete list of all my publications arranged in chronological order. This list includes single author papers by graduate students I have advised, if the work was done at Duluth. I also provide a seperate listing of all the PhD dissertations, Master's theses, and Master's projects I have supervised.

2025

DuluthNLP at SemEval-2025 Task 7 : TF-IDF with Optimized Vector Dimensions for Multilingual Fact-Checked Claim Retrieval (Syed & Pedersen) Appears in the Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval 2025), July 2025, pp. 712-717, Vienna, Austria.

2023

DuluthNLP at SemEval-2023 Task 12 : AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset (Akrah & Pedersen) Appears in the Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval 2023), July 2023, pp. 1697-1701, Toronto.

2022

NLPSharedTasks: A Corpus of Shared Task Overview Papers in Natural Language Processing Domains (Martin, Pedersen, and D'Souza) Appears in the Proceedings of the First Workshop on Information Extraction from Scientific Publications (WIESP), November 2022, virtual.
DuluthNLP at SemEval-2022 Task 7: Classifying Plausible Alternatives with Pre-trained ELECTRA (Akrah and Pedersen) Appears in the Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval 2022), July 2022, pp. 1062-1066, Seattle (virtual).

2021

Task 11 at SemEval-2021: NLPContributionGraph - Structuring Scholarly NLP Contributions for a Research Knowledge Graph (D'Souza, Auer, and Pedersen) Appears in the Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval 2021), August 2021, pp. 364-376, Bangkok (virtual).
Duluth at SemEval-2021 Task 11: Applying DeBERTa to Contributing Sentence Selection and Dependency Parsing for Entity Extraction (Martin and Pedersen) Appears in the Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval 2021), August 2021, pp. 490-501, Bangkok (virtual).
Proceedings of the Fifth Workshop on Teaching NLP (Jurgens, Kolhatkar, Li, Mieskes, and Pedersen, Editors), June 2021, Mexico City (virtual).

2020

Duluth at SemEval-2020 Task 7: Using Surprise as a Key to Unlock Humorous Headlines (Jin, Yin, Tang, and Pedersen) Appears in the Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020), December 2020, pp. 986-994, Barcelona (virtual).
Duluth at SemEval-2020 Task 12: Offensive Tweet Identification in English with Logistic Regression (Pedersen) Appears in the Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020), December 2020, pp. 1938-1946, Barcelona (virtual).

2019

Approaching Terminological Ambiguity in Cross-Disciplinary Communication as a Word Sense Induction Task. A Pilot Study (Mennes, Pedersen, and Lefever) Language Resources and Evaluation, 53, 889-917, Springer.
Duluth at SemEval-2019 Task 6: Lexical Approaches to Identify and Categorize Offensive Tweets (Pedersen) Appears in the Proceedings of the 13th International Workshop on Semantic Eva luation (SemEval 2019), June 2019, pp. 593-599, Minneapolis, MN.
Duluth at SemEval-2019 Task 4: The Pioquinto Manterola Hyperpartisan News Detector (Sengupta and Pedersen) Appears in the Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval 2019), June 2019, pp. 949-953, Minneapolis, MN.

2018

Duluth UROP at SemEval-2018 Task 2: Multilingual Emoji Prediction with Ensemble Learning and Oversampling (Jin and Pedersen) Appears in the Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval 2018), June 2018, pp. 482-485, New Orleans, LA.
UMDSub at SemEval-2018 Task 2: Multilingual Emoji Prediction Multi-channel Convolutional Neural Network on Subword Embedding (Wang and Pedersen) Appears in the Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval 2018), June 2018, pp. 395-399, New Orleans, LA.
ALANIS at SemEval-2018 Task 3: A Feature Engineering Approach to Irony Detection in English Tweets (Swanberg, Mirza, Pedersen, and Wang) Appears in the Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval 2018), June 2018, pp. 507-511, New Orleans, LA.
UMDuluth-CS8761 at SemEval-2018 Task 9: Hypernym Discovery using Hearst Patterns, Co-occurrence frequencies and Word Embeddings (Hassan, Vallabhajosyula, and Pedersen) Appears in the Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval 2018), June 2018, pp. 914-918, New Orleans, LA.

2017

Improving Correlation with Human Judgments by Integrating Semantic Similarity with Second--Order Vectors (McInnes and Pedersen) Appears in the Proceeddings of the 16th Workshop on Biomedical Natural Language Processing (BioNLP 2017), August 2017, pp. 107-116, Vancouver, BC.
Duluth at SemEval-2017 Task 6: Language Models in Humor Detection (Yan and Pedersen) Appears in the Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval 2017), August 2017, pp. 376-380, Vancouver, BC.
Duluth at SemEval-2017 Task 7: Puns Upon a Midnight Dreary, Lexical Semantics for the Weak and Weary (Pedersen) Appears in the Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval 2017), August 2017, pp. 407-411, Vancouver, BC.
Who's to say what's funny? A computer using Language Models and Deep Learning, That's Who! (Yan and Pedersen) Appears in the Proceedings of the Workshop on Women and Underrepresented Minorities in Natural Language Processing (WiNLP 2017), July 2017, pp. xx-xx, Vancouver, BC.

2016

Analysis of Anxious Word Usage on Online Health Forums (Rey-Villamizar, Shrestha, Sadeque, Bethard, Pedersen, Mukherjee, and Solorio) Appears in the Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis (Louhi 2016), November 2016, pp. 37-42, Austin, TX.
Why Do They Leave: Modeling Participation in Online Depression Forums (Sadeque, Pedersen, Solorio, Shrestha, Rey-Villamizar, and Bethard) Appears in the Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media (SocialNLP-2016), November 2016, pp. 14-19, Austin, TX.
UMNDuluth at SemEval-2016 Task 14: WordNet's Missing Lemmas (Rusert and Pedersen) Appears in the Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), June 2016, pp. 1346-1350, San Diego, CA.
Duluth at SemEval 2016 Task 14: Extending Gloss Overlaps to Enrich Semantic Taxonomies (Pedersen) Appears in the Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), June 2016, pp. 1328-1331, San Diego, CA.
Semi-supervised CLPsych 2016 Shared Task System Submission (Rey-Villamizar, Shrestha, Solorio, Sadeque, Bethard and Pedersen) Appears in the Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology - From Linguistic Signal to Clinical Reality (ClPsych 2016), June 2016, pp. 171-175, San Diego, CA.
Age and Gender Prediction on Health Forum Data (Shrestha, Rey-Villamizar, Sadeque, Pedersen, Bethard, Solorio) Appears in the Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016), May 2016, pp. 3394-3401, Portoroz, Slovenia.
Adam Kilgarriff's Legacy to Computational Linguistics and Beyond (Evans, Gelbukh, Grefenstette, Hanks, Jakubicek, McCarthy, Palmer, Pedersen, Rundell, Rychly, Sharoff, Tugwell) Appears in the Proceedings of the 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2016), April 2016, Konya, Turkey.

2015

Predicting Continued Participation in Online Health Forums (Sadeque, Solorio, Pedersen, Shrestha, Bethard) Appears in the Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis (Louhi 2015), September 2015, pp. 12-20, Lisbon, Portugal.
Screening Twitter Users for Depression and PTSD using Lexical Decision Lists (Pedersen) Appears in the Proceedings of the 2nd Computational Linguistics and Clinical Psychology Workshop - From Linguistic Signal to Clinical Reality (CLPsych 2015), June 2015, pp. 46-53, Denver, CO.
Duluth : Word Sense Discrimination in the Service of Lexicography (Pedersen) - Appears in the Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), June 2015, pp. 282-286, Denver, CO.
Evaluating Semantic Similarity and Relatedness over the Semantic Grouping of Clinical Term Pairs (McInnes and Pedersen) - Journal of Biomedical Informatics, 54, 329 - 336, April 2015.

2014

U-path : An undirected path-based measure of semantic similarity (McInnes, Pedersen, Liu, Melton, Pakhomov) - Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association, November 2014, pp. 882 - 891, Washington, DC.
Duluth: Measuring Cross-Level Semantic Similarity with First and Second Order Dictionary Overlaps (Pedersen) - Appears in the Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), in conjunction with the 25th International Conference on Computational Linguistics (COLING-2014), August 23-24, 2014, pp. 247 - 251, Dublin, Ireland.

2013

Evaluating Measures of Semantic Similarity and Relatedness to Disambiguate Terms in Biomedical Text (McInnes and Pedersen) - Journal of Biomedical Informatics, 46(6), 1116 - 1124, December 2013.
Offspring from Reproduction Problems: What Replication Failure Teaches Us (Fokkens, van Erp, Postma, Pedersen, Vossen, and Freire) - Appears in the Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, August 4-9, 2013, pp. 1691-1701, Sofia, Bulgaria. [acceptance rate 26%, nominated for best paper award] ( presentation slides and video of ACL presentation and related web site )
Duluth: Word Sense Induction Applied to Web Page Clustering (Pedersen) - Appears in the Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), in conjunction with the Second Joint Conference on Lexical and Computational Semantics (*SEM-2013), June 13-15, 2013, pp. 202-206, Atlanta, Georgia.
UMLS::Similarity: Measuring the Relatedness and Similarity of Biomedical Concepts (McInnes, Liu, Pedersen, Melton, and Pakhomov) - Appears in the Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June, 9-14, 2013, pp. 28-31, Atlanta, Georgia.(Demonstration System) [acceptance rate 53%]

2012

Using SemRep to Label Semantic Relations Extracted from Clinical Text (Liu, Bill, Fiszman, Rindflesch, Pedersen, Melton, and Pakhomov) - Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association, November 3-7, 2012, pp. 587 - 595, Chicago, IL.
Evaluating Semantic Relatedness and Similarity Measures with Standardized MedDRA Queries (Bill, Liu, McInnes, Melton, Pedersen, and Pakhomov) - Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association, November 3-7, 2012, pp. 43 - 50, Chicago, IL.
The Language Muse System : Linguistically Focused Instructional Authoring (Burstein, Shore, Sabatini, Moulder, Holtzman, and Pedersen) - Educational Testing Service Research Report ETS RR-12-21, October 2012, Princeton, NJ.
Duluth: Measuring Degrees of Relational Similarity with the Gloss Vector Measure of Semantic Relatedness (Pedersen) - Appears in First Joint Conference on Lexical and Computational Semantics (*SEM), June 6-7, 2012, pp. 497 - 501, Montreal, Canada.
Rule-based and Lightly Supervised Methods to Predict Emotions in Suicide Notes (Pedersen) - Biomedical Informatics Insights 2012:5 (Suppl. 1) pp. 185 - 193.
Semantic Relatedness Study Using Second Order Co-occurrence Vectors Computed from Biomedical Corpora, UMLS and WordNet (Liu, McInnes, Pedersen, Melton-Meaux, and Pakhomov) - Appears in the Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, January 28-30, 2012, pp. 363 - 371, Miami, FL
Measuring the Similarity and Relatedness of Concepts in the Medical Domain : IHI 2012 Tutorial Overview (Pedersen, Pakhomov, McInnes, and Liu) - Appears in the Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, January 28-30, 2012, pp. 879, Miami, FL

2011

Knowledge-based Method for Determining the Meaning of Ambiguous Biomedical Terms Using Information Content Measures of Similarity (McInnes, Pedersen, Liu, Melton, and Pakhomov) - Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association, October 22-26, 2011, pp. 895 - 904, Washington, DC.
Using Second-order Vectors in a Knowledge-based Method for Acronym Disambiguation (McInnes, Pedersen, Liu, Pakhomov, and Melton) - Appears in the Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL 2011), June 23-24, 2011, pp. 145 - 153, Portland, Oregon.
Identifying Collocations to Measure Compositionality : Shared Task System Description (Pedersen) - Appears in the Proceedings of Distributional Semantics and Compositionality (DiSCo 2011), an ACL HLT 2011 Workshop, June 24, 2011, pp. 33 - 37, Portland, Oregon.
The Ngram Statistics Package (Text::NSP) - A Flexible Tool for Identifying Ngrams, Collocations, and Word Associations (Pedersen, Banerjee, McInnes, Kohli, Joshi, and Liu) - Appears in the Proceedings of Multiword Expressions : from Parsing and generation to the Real World (MWE 2011), an ACL HLT 2011 Workshop, June 23, 2011, pp. 131 - 133, Portland, Oregon. (Demonstration System)
Towards a Framework for Developing Semantic Relatedness Reference Standards (Pakhomov, Pedersen, McInnes, Melton, Ruggieri, and Chute) - Journal of Biomedical Informatics, 44(2), 251-265, April 2011. [Data]

2010

The Effect of Different Context Representations on Word Sense Discrimination in Biomedical Texts (Pedersen) - Appears in the Proceedings of the 1st ACM International Health Informatics Symposium, November 11 - 12, 2010, pp. 56 - 65, Arlington, VA. [acceptance rate 17%]
Semantic Similarity and Relatedness between Clinical Terms : An Experimental Study (Pakhomov, McInnes, Adam, Liu, Pedersen, and Melton) - Appears the Proceedings of the Annual Symposium of the American Medical Informatics Association, November 13-17 2010, pp. 572 - 576, Washington, DC. [acceptance rate 50%]
Towards Improving Synonym Options in a Text Modification Application (Burstein and Pedersen), University of Minnesota Supercomputing Institute Research Report UMSI 2010/165, November 2010.
Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods (Pedersen), University of Minnesota Supercomputing Institute Research Report UMSI 2010/118, October 2010. (Also available from CMP-LG E-Print Archive as 0806.3787)
Duluth-WSI: SenseClusters Applied to the Sense Induction Task of SemEval-2 (Pedersen) - Appears in the Proceedings of the SemEval 2010 Workshop : the 5th International Workshop on Semantic Evaluations, July 15-16, 2010, pp. 363-366, Uppsala, Sweden
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas (Solorio and Pedersen, Editors), June 2010, Los Angeles, CA
Information Content Measures of Semantic Similarity Perform Better Without Sense-Tagged Text (Pedersen) - Appears in the Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2010), June 1-6, 2010, pp. 329-332, Los Angeles, CA [acceptance rate 35%]

2009

UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity (McInnes, Pedersen, and Pakhomov) - Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association, Nov 14-18, 2009, pp. 431-435, San Francisco, CA. [acceptance rate 50%]
WordNet::SenseRelate::AllWords - A Broad Coverage Word Sense Tagger that Maximimizes Semantic Relatedness (Pedersen and Kolhatkar) - Appears in the Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies 2009 Conference, June 1-3, 2009, pp. 17-20, Boulder, CO. (Demonstration System)
Improved Unsupervised Name Discrimination with Very Wide Bigrams and Automatic Cluster Stopping (Pedersen) - Appears in the Proceedings of the Tenth International Conference on Intelligent Text Processing and Computational Linguistics, March 1-7, 2009, pp. 294-305, Mexico City. [acceptance rate 26%]

2008

Learning High Precision Rules to Make Predictions of Morbidities in Discharge Summaries (Pedersen) - Appears in the Proceedings of the Second i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data, Nov 7-8, 2008, Washington, DC.
Empiricism is Not a Matter of Faith (Pedersen), Computational Linguistics, Volume 34, Number 3, pp. 465-470, September 2008. [Journal Citation Reports Index Factor 2007: 2.367]
Name Discrimination and E-mail Clustering Using Unsupervised Clustering of Similar Concepts (Kulkarni and Pedersen), Journal of Intelligent Systems (Special Issue : Recent Advances in Knowledge-Based Systems and Their Applications), 17(1-3), 37-50, 2008.

2007

Using UMLS Concept Unique Identifiers (CUIs) for Word Sense Disambiguation in the Biomedical Domain (McInnes, Pedersen, and Carlis) - Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association, Nov 10-14, 2007, pp. 533-537, Chicago, IL. [acceptance rate 45%]
Measures of Semantic Similarity and Relatedness in the Biomedical Domain (Pedersen, Pakhomov, Patwardhan, and Chute), Journal of Biomedical Informatics, 40(3), 288-299, June 2007. [Journal Citation Reports Index Factor 2006: 2.346]
Determining the Syntactic Structure of Medical Terms in Clinical Notes (McInnes, Pedersen, and Pakhomov) - Appears in the Proceedings of BioNLP 2007, June 29, 2007, pp. 9-16, Prague, Czech Republic. [acceptance rate 29%] [ppt]
UMND1: Unsupervised Word Sense Disambiguation Using Contextual Semantic Relatedness (Patwardhan, Banerjee, and Pedersen) - Appears in the Proceedings of SemEval-2007: 4th International Workshop on Semantic Evaluations, June 23-24, 2007, pp. 390-393, Prague, Czech Republic.
UMND2 : SenseClusters Applied to the Sense Induction Task of Senseval-4 (Pedersen) - Appears in the Proceedings of SemEval-2007: 4th International Workshop on Semantic Evaluations, June 23-24, 2007, pp. 394-397, Prague, Czech Republic.
Unsupervised Discrimination of Person Names in Web Contexts (Pedersen and Kulkarni) - Appears in the Proceedings of the Eighth International Conference on Intelligent Text Processing and Computational Linguistics, pp. 299-310, February 18-24, 2007, Mexico City. [acceptance rate 29%] Download the data used in this paper (Kulkarni name corpus).
Discovering Identities in Web Contexts with Unsupervised Clustering (Pedersen and Kulkarni) - Appears in the Proceedings of the IJCAI-2007 Workshop on Analytics for Noisy Unstructured Text Data, pp. 23-30, January 8, 2007, Hyderabad, India. Download the data used in this paper (Kulkarni name corpus).

2006

Determining Smoker Status using Supervised and Unsupervised Learning with Lexical Features (Pedersen) - Appears in the Working Notes of the i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data, Nov 10-11, 2006, Washington, DC.
A Comparative Study of Supervised Learning as Applied to Acronym Expansion in Clinical Reports (Joshi, Pakhomov, Pedersen, and Chute) - Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association, pp. 399-403, Nov 11-16, 2006, Washington, DC. [acceptance rate 41%] [ppt]
Unsupervised Context Discrimination and Automatic Cluster Stopping (Kulkarni and Pedersen), University of Minnesota Supercomputing Institute Research Report UMSI 2006/90, August 2006. [Note: This is Anagha's MS thesis, from July 2006.]
How many different "John Smiths", and who are they? (Kulkarni and Pedersen) - Appears in the Proceedings of the Twenty-First National Conference on Artificial Intelligence, pp. 1885-1886, July 19, 2006, Boston, MA. (Student Poster)
Kernel Methods for Word Sense Disambiguation and Acronym Expansion (Joshi, Pedersen, Maclin, and Pakhomov) - Appears in the Proceedings of the Twenty-First National Conference on Artificial Intelligence, pp. 1879-1880, July 19, 2006, Boston, MA. (Student Poster)
An End-to-End Supervised Target-Word Sense Disambiguation System (Joshi, Pakhomov, Pedersen, Maclin, and Chute) - Appears in the Proceedings of the Twenty-First National Conference on Artificial Intelligence, pp. 1941-1942, July 19, 2006, Boston, MA. (Intelligent System Demonstration)
Unsupervised Corpus Based Methods for WSD (Pedersen), In Agirre, E. and Edmonds, P. (Editors), Word Sense Disambiguation : Algorithms and Applications, June 2006, pp. 133-166, Springer.
Automatic Cluster Stopping with Criterion Functions and the Gap Statistic (Pedersen and Kulkarni), Appears in the Proceedings of the Demonstration Session of the Human Language Technology Conference and the Sixth Annual Meeting of the North American Chapter of the Association for Computational Linguistics, pp. 276-279, June 6, 2006, New York City.
Selecting the "Right" Number of Senses Based on Clustering Criterion Functions (Pedersen and Kulkarni), Appears in the Proceedings of the Posters and Demo Program of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics, pp. 111-114, April 5-7, 2006, Trento, Italy. [acceptance rate 40%]
Using WordNet Based Context Vectors to Estimate the Semantic Relatedness of Concepts (Patwardhan and Pedersen) - Appears in the Proceedings of the EACL 2006 Workshop Making Sense of Sense - Bringing Computational Linguistics and Psycholinguistics Together, pp. 1-8, April 4, 2006, Trento, Italy.
Improving Name Discrimination : A Language Salad Approach (Pedersen, Kulkarni, Angheluta, Kozareva, and Solorio) - Appears in the Proceedings of the EACL 2006 Workshop on Cross-Language Knowledge Induction, pp. 25-32, April 3, 2006, Trento, Italy. Download the Bulgarian, English, Spanish, and Romanian data used in this paper!
An Unsupervised Language Independent Method of Name Discrimination Using Second Order Co-occurrence Features (Pedersen, Kulkarni, Angheluta, Kozareva, and Solorio) - Appears in the Proceedings of the Seventh International Conference on Intelligent Text Processing and Computational Linguistics, pp. 208-222, February 19-25, 2006, Mexico City. [acceptance rate 30%] Download the Bulgarian, English, Spanish, and Romanian data and stoplists used in this paper.

2005

A Comparative Study of Support Vector Machines Applied to the Supervised Word Sense Disambiguation Problem in the Medical Domain (Joshi, Pedersen, and Maclin) - Appears in the Proceedings of the Second Indian International Conference on Artificial Intelligence, pp. 3449-3468, December 20-22, 2005, Pune, India. [acceptance rate 35%]
Name Discrimination and Email Clustering using Unsupervised Clustering and Labeling of Similar Contexts (Kulkarni and Pedersen) - Appears in the Proceedings of the Second Indian International Conference on Artificial Intelligence, pp. 703-722, December 20-22, 2005, Pune, India. [acceptance rate 35%] Download the data used in this paper.
Abbreviation and Acronym Disambiguation in Clinical Discourse (Pakhomov, Pedersen and Chute) - Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association, pp. 589-593, October 22-26, 2005, Washington, DC. [acceptance rate 37%]
Identifying Similar Words and Contexts in Natural Language with SenseClusters (Pedersen and Kulkarni) - Appears in the Proceedings of the Twentieth National Conference on Artificial Intelligence, pp. 1694-1695, July 12, 2005, Pittsburgh, PA. (Intelligent Systems Demonstration)
Download the data used in this demo.
SenseRelate::TargetWord - A Generalized Framework for Word Sense Disambiguation (Patwardhan, Banerjee, and Pedersen) - Appears in the Proceedings of the Twentieth National Conference on Artificial Intelligence, pp. 1692-1693, July 12, 2005, Pittsburgh, PA. (Intelligent Systems Demonstration)
Proceedings of the ACL Interactive Poster and Demonstration Sessions (Nagata and Pedersen, Editors), June 2005, Ann Arbor, MI.
Proceedings of the ACL Workshop on Building and Using Parallel Texts (Koehn, Martin, Mihalcea, Monz, and Pedersen, Editors), June 2005, Ann Arbor, MI.
Word Alignment for Languages with Scarce Resources (Martin, Mihalcea, and Pedersen) - Appears in the Proceedings of the ACL Workshop on Building and Using Parallel Texts, pp. 65-74, June 29-30, 2005, Ann Arbor, MI.
Unsupervised Discrimination and Labeling of Ambiguous Names (Kulkarni) - Appears in the Proceedings of the Student Research Workshop of the 43rd Annual Meeting of the Association for Computational Linguistics. pp. 145-150, June 27, 2005, Ann Arbor, MI. [acceptance rate 28%] Download the data used in this paper.
SenseClusters: Unsupervised Clustering and Labeling of Similar Contexts (Kulkarni and Pedersen) - Appears in the Proceedings of the Demonstration and Interactive Poster Session of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 105-108, June 26, 2005, Ann Arbor, MI. [acceptance rate 55%] Download the data used in this paper.
SenseRelate::TargetWord - A Generalized Framework for Word Sense Disambiguation (Patwardhan, Banerjee, and Pedersen) - Appears in the Proceedings of the Demonstration and Interactive Poster Session of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 73-76, June 26, 2005, Ann Arbor, MI. [acceptance rate 55%]
Resolving Ambiguities in Biomedical Text with Unsupervised Clustering Approaches (Savova, Pedersen, Purandare and Kulkarni) - University of Minnesota Supercomputing Institute Research Report UMSI 2005/80 and CB Number 2005/21, May.
Measures of Semantic Similarity and Relatedness in the Medical Domain (Pedersen, Pakhomov, and Patawardhan) - University of Minnesota Digital Technology Center Research Report DTC 2005/12, May. [This is a preliminary version of the JBI 2007 article].
Maximizing Semantic Relatedness to Perform Word Sense Disambiguation (Pedersen, Banerjee, and Patwardhan) - University of Minnesota Supercomputing Institute Research Report UMSI 2005/25, March.
Name Discrimination by Clustering Similar Contexts (Pedersen, Purandare, and Kulkarni) - Appears in the Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics, pp. 220-231, February 13-19, 2005, Mexico City. [acceptance rate 37%] Download the data used in this paper.

2004

Improving Word Sense Discrimination with Gloss Augmented Feature Vectors (Purandare and Pedersen) - Appears in the Proceedings of the Workshop on Lexical Resources for the Web and Word Sense Disambiguation, pp. 123-130, November 22, 2004, Puebla Mexico.
Incorporating Ngram Statistics in the Normalization of Clinical Notes (McInnes, Pakhomov, Pedersen and Chute) - Appears in MEDINFO 2004 : Proceedings of the 11th World Congress on Medical Informatics, p. 1882, September 2004, San Francisco, CA. (Poster)
Word Sense Discrimination by Clustering Similar Contexts (Purandare and Pedersen), University of Minnesota Supercomputing Institute Research Report UMSI 2004/146, September 2004. [Note: This is Amruta's MS thesis, from August 2004.]
Polysemy: Theoretical and Computational Approaches. By Yael Ravin and Claudia Leacock. (Pedersen) - Appears in Minds and Machines, Volume 14, Number 3, pp. 419-423. (Book Review)
Discriminating Among Word Meanings by Identifying Similar Contexts (Purandare and Pedersen) - Appears in the Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), pp. 964-965, July 25-29, 2004, San Jose, CA (Student Abstract) [ppt]
SenseClusters - Finding Clusters that Represent Word Senses (Purandare and Pedersen) - Appears in the Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), pp. 1030-1031, July 25-29, 2004, San Jose, CA (Intelligent Systems Demonstration)
WordNet::Similarity - Measuring the Relatedness of Concepts (Pedersen, Patwardhan, and Michelizzi) - Appears in the Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), pp. 1024-1025, July 25-29, 2004, San Jose, CA (Intelligent Systems Demonstration)
The Senseval-3 Multilingual English-Hindi lexical sample task (Chklovski, Mihalcea, Pedersen, and Purandare) - Appears in the Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (Senseval-3), pp. 5-8, July 25-26, 2004, Barcelona, Spain.
The Duluth Lexical Sample Systems in Senseval-3 (Pedersen) - Appears in the Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (Senseval-3), pp. 203-208, July 25-26, 2004, Barcelona, Spain.
Complementarity of Lexical and Simple Syntactic Features: The Syntalex Approach to Senseval-3 (Mohammad and Pedersen) - Appears in the Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (Senseval-3), pp. 159-162, July 25-26, 2004, Barcelona, Spain.
Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces (Purandare and Pedersen) - Appears in the Proceedings of the Conference on Computational Natural Language Learning (CoNLL), pp. 41-48, May 6-7, 2004, Boston, MA. [acceptance rate 48%]
Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation (Mohammad and Pedersen) - Appears in the Proceedings of the Conference on Computational Natural Language Learning (CoNLL), pp. 225-32, May 6-7, 2004, Boston, MA. [acceptance rate 48%]
SenseClusters - Finding Clusters that Represent Word Senses (Purandare and Pedersen) - Appears in the Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-04), pp. 26-29, May 3-5, 2004, Boston, MA. (Demonstration System)
WordNet::Similarity - Measuring the Relatedness of Concepts (Pedersen, Patwardhan, and Michelizzi) - Appears in the Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-04), pp. 38-41, May 3-5, 2004, Boston, MA. (Demonstration System)

2003

Writing About Research, Or The Art of WAR (Pedersen) - unpublished manuscript, September 2003. Also available in html.

Extended Gloss Overlaps as a Measure of Semantic Relatedness (Banerjee and Pedersen) - Appears in the Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pp. 805-810, August 9-15, 2003, Acapulco, Mexico. [acceptance rate 21%]
Also available in postscript.
An implementation of the extended gloss overlap measure is available in the WordNet::Similarity package

Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond (Mihalcea and Pedersen, Editors), May 2003, Edmonton, Canada

An Evaluation Exercise for Word Alignment (Mihalcea and Pedersen ) - Appears in the Proceedings of the Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, pp. 1-10, May 31, 2003, Edmonton, Canada.
Also available in postscript

The Duluth Word Alignment System (Thomson-McInnes and Pedersen ) - Appears in the Proceedings of the Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, pp. 40-43, May 31, 2003, Edmonton, Canada.
Also available in postscript

Discriminating Among Word Senses Using Mcquitty's Similarity Analysis (Purandare) - Appears in the Proceedings of the Student Research Workshop at HLT-NAACL, pp. 19-24, May 30-31, 2003, Edmonton, Canada. [ppt]

Using Measures of Semantic Relatedness for Word Sense Disambiguation (Patwardhan, Banerjee and Pedersen) - Appears in the Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, pp. 241-257, February 17-21, 2003, Mexico City. [acceptance rate 46%]
Also available in postscript

Guaranteed Pre-tagging for the Brill Tagger (Mohammad and Pedersen) - Appears in the Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, pp. 148-157, February 17-21, 2003, Mexico City. [acceptance rate 46%]
Also available in postscript

The Design, Implementation, and Use of the Ngram Statistics Package (Banerjee and Pedersen) - Appears in the Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, pp. 370-381, February 17-21, 2003, Mexico City. [acceptance rate 46%]
Also available in postscript

2002

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of Senseval-2 (Pedersen) - Appears in the Proceedings of the Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, pp. 40-46, July 11, 2002, Philadelphia.
Also available in postscript or from the Computation and Language E-Print Archive as #0205068
- Download the complete listing of all pairwise system comparisons. Paper only shows top 15.
- Download the complete listing with the number of systems able to disambiguate each word correctly.

Evaluating the Effectiveness of Ensembles of Decision Trees in Disambiguating Senseval Lexical Samples (Pedersen) - Appears in the Proceedings of the Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, pp. 81-87, July 11, 2002, Philadelphia
Also available in postscript or from the Computation and Language E-Print Archive as #0205067

A Baseline Methodology for Word Sense Disambiguation (Pedersen) - Appears in the Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, pp. 126-135, February 17-23, 2002, Mexico City. [acceptance rate 52%]
Also available in postscript

An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet (Banerjee and Pedersen) - Appears in the Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, pp. 136-145, February 17-23, 2002, Mexico City. [acceptance rate 52%]
Also available in postscript

Empirical Methods for Exploiting Parallel Texts, by I. Dan Melamed (Pedersen) - Appears in Computational Linguistics, Volume 28, Number 2, pp. 235-237. (Book Review)
Also available in postscript

2001

A Plagiarism Case Study (Pedersen) - unpublished manuscript, April 2001. Also available in html.

Machine Learning with Lexical Features: The Duluth Approach to Senseval-2 (Pedersen) - Appears in the Proceedings of SENSEVAL-2: Second International Workshop on Evaluating Word Sense Disambiguation Systems, pp. 139-144, July 5-6, 2001, Toulouse, France
Also available in postscript or from the Computation and Language E-Print Archive as #0205069

A Decision Tree of Bigrams is an Accurate Predictor of Word Sense (Pedersen) - Appears in the Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-01), pp. 79-86, June 2-7, 2001, Pittsburgh, PA. [acceptance rate 28%]
Also available in postscript or from the Computation and Language E-Print Archive as #0103026

Materials from the EM algorithm Panel Discussion at EMNLP-01, June 2001, in Pittsburgh PA.
The EM Algorithm : Selected Readings is a short literature review. I've also posted my slides as powerpoint or handouts from my short introduction to EM. It works through a simple example of EM for a multinomial distribution with hidden data.

Lexical Semantic Ambiguity Resolution with Bigram Based Decision Trees (Pedersen) - Appears in the Proceedings of the Second International Conference on Intelligent Text Processing and Computational Linguistics (CICLING-01), pp. 157-168, February 18-24, 2000, Mexico City. [acceptance rate 57%] [This is a preliminary version of the NAACL 2001 paper.]

2000

A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation (Pedersen) - Appears in the Proceedings of the First Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-00), pp. 63-369, May 1-3, 2000, Seattle, WA. [acceptance rate 26%]
Also available in postscript or from the Computation and Language E-Print Archive as #0005006 )

An Ensemble Approach to Corpus Based Word Sense Disambiguation (Pedersen) - Appears in the Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLING-00), pp. 205-218, February 13-18, 2000, Mexico City. [This is a preliminary version of the NAACL 2000 paper.]

One Jump Ahead: Challenging Human Supremacy in Checkers, by Jonathan Schaeffer (Pedersen) Appears in Intelligence, Volume 11, Issue 1, Spring 2000, pp. 56-57. (Book Review)

1999

Search Techniques for Learning Probabilistic Models of Word Sense Disambiguation (Pedersen) - Appears in the Working Notes of the AAAI Spring Symposium on Search Techniques for Problem Solving Under Uncertainty and Incomplete Information, pp. 107-112, March 22-24, 1999, Palo Alto, CA

Integrating Natural Language Subtasks with Bayesian Belief Networks (Pedersen) - Appears in the Proceedings of the Pacific Asian Conference on Expert Systems (PACES-99), pp. 1-6, Feb 11-12, 1999, Los Angeles, CA

The Balancing Act: Combining Symbolic and Statistical Approaches to Language, edited by Judith L. Klavans and Philip Resnik. (Pedersen) Appears in Intelligence, Volume 10, Issue 1, Spring 1999, pp. 41-43. (Book Review)

1998

Knowledge Lean Word Sense Disambiguation (Pedersen & Bruce) - Appears in the Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98), p. 800-805, July 28-30, 1998, Madison, WI [acceptance rate 30%]

Raw Corpus Word Sense Disambiguation (Pedersen) - Appears in the Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98), p. 1198, July 28-30, 1998, Madison, WI (Student Poster)

Dependent Bigram Identification (Pedersen) - Appears in the Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98), p. 1197, July 28-30, 1998, Madison, WI (Student Poster)

Learning Probabilistic Models of Word Sense Disambiguation (Pedersen) May 1998, Southern Methodist University, 195 pages (PhD Dissertation) (Also available from CMP-LG E-Print Archive as 0707.3972)

Naive Bayes as a Satisficing Model. (Pedersen) Appears in the Working Notes of the AAAI Spring Symposium on Satisficing Models, 60-67, March 1998, Palo Alto, CA.

1997

Distinguishing Word Senses in Untagged Text (Pedersen & Bruce) - Appears in the Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (EMNLP-2), pp. 197-207, August 1-2, 1997, Providence, RI. [acceptance rate 35%] (Also available from CMP-LG E-Print Archive as #9706008 )

A New Supervised Learning Algorithm for Word Sense Disambiguation (Pedersen & Bruce) - Appears in the Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97), pp. 604-609, July 27-31, 1997, Providence, RI. [acceptance rate 36%]

Naive Mixes for Word Sense Disambiguation (Pedersen) - Appears in the Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97), p. 841, July 27-31, 1997, Providence, RI (Student Poster)

Knowledge Lean Word Sense Disambiguation (Pedersen) - Appears in the Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97), p. 814, July 27-31, 1997, Providence, RI (Doctoral Consortium)

A Statistical Decision Making Method : A Case Study in Prepositional Pharse Attachment (Kayaalp, Pedersen, and Bruce) - Appears in the Proceedings of the Computational Natural Language Learning Workshop (CoNLL), pp. 33-42, July 11, 1997, Madrid, Spain

Sequential Model Selection for Word Sense Disambiguation (Pedersen, Bruce & Wiebe) - Appears in the Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP-97), pp. 388-395, April 1-3, 1997, Washington, DC. [acceptance rate 32%] (Also available from CMP-LG E-Print Archive as #9702008 )

1996

Fishing for Exactness (Pedersen) - Appears in the Proceedings of the South - Central SAS Users Group Conference (SCSUG-96), pp. 188-200, Oct 27-29, 1996, Austin, TX (Also available from CMP-LG E-Print Archive as #9608010 )

Significant Lexical Relationships (Pedersen, Kayaalp, & Bruce) - Appears in the Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), pp. 455-460, August 4-8, 1996, Portland, OR. [acceptance rate 30%]

The Measure of a Model (Bruce, Wiebe, & Pedersen) - Appears in the Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 101-112, May 17-18, 1996, Philadelphia, PA. [acceptance rate 30%] (Also available from CMP-LG E-Print Archive as #9604018 )

1995

Lexical Acquisition via Constraint Solving (Pedersen & Chen) - Appears in the Working Notes of the AAAI Spring Symposium on Representation and Acquisition of Lexical Knowledge, pp. 118-122, March 27-29, 1995, Palo Alto, CA (Also available from CMP-LG E-Print Archive as #9502028)

By: Ted Pedersen - tpederse AT d umn edu