Research work

Masters' Thesis at University of Minnesota, Duluth

I work with Dr. Rich Maclin and Dr. Ted Pedersen on supervised machine learning methods for Word Sense Disambiguation using Support Vector Machines. In particular, my research is currently focused upon developing special purpose kernels for support vector machines applied to the task of word sense disambiguation in medical text.

Word sense disambiguation is the problem of automatically assigning the correct meaning to a word that has multiple meanings. This process relies heavily on the surrounding context of the ambiguous word. Disambiguation in medical text is different from disambiguation of English words in general because of the specialized domain of medicine as well as the nature of the text that we are dealing with. The general text in English (such as newspaper text) tends to be more structured, whereas text in medicine (such as clinical notes from physicians) is a lot less structured -- apart from being rich in domain specific terminology. This might mean that features such as part of speech tags that are common for disambiguation of English text in general are not useful for medical text. On the other hand, the specialized domain provides other interesting features that can make use of domain knowledge to differentiate among ambiguous instances. My work aims to discover such domain similarities or differences in the medical text and make use of them to devise kernels for support vector machines. Kernel methods provide a very nice framework for incorporating such domain knowledge by way of converting them into similarity or distance measures for two instances of the ambiguous term.

We are using the U.S. National Library of Medicine (NLM) word sense disambiguation test collection (available to Dr. Pedersen) and a collection of medical abbreviations created by Dr. Hongfang Liu for our experiments.


Summer Research / Internships


I was a Research Assistant to Dr. Ted Pedersen and worked on adding support for Latent Semantic Analysis (LSA) based representations to the unsupervised clustering package SenseClusters, for performing feature clustering and LSA based context clustering.


I was working as a Research Intern with Dr. Serguei Pakhomov at the Mayo Clinic, Division of Biomedical Informatics. My work focussed on study of ambiguous acronyms in the text of clinical notes and supervised machine learning methods for automatic expansion of such ambiguous acronyms. The slides that I used for the final presentation of my summer work can be found here. A short report of my internship is also available.



Back to Home Page

Last updated: August 15 2006