    Research Interests  

   Natural Language Processing & Data Mining

    Masters Thesis

   Identifying Sets of Related Words from the World Wide Web
   Thesis Proposal .pdf    Thesis Final Report .pdf

    Thesis Advisor

   Professor. Ted Pedersen


   The overall goal of my thesis research is to use the World Wide Web as a source of information to identify sets of words that
   are related in meaning. Methods have been developed to identify words that are related in meaning in fixed or static corpora
   of text. However, given the availability of huge amounts of text via the World Wide Web it is important to develop methods
   that can take advantage of this fact. The Web creates a unique set of challenges, including its ever-changing state, and the
   presence of repetitive, noisy, or low-quality data.

   We are using the search engine Google to retrieve text from the Web. Google has released an API that allows a programmer to
   interface with their content, and retrieve the data in a more convenient form. Thereafter we process that data to find sets of
   related words.

    Current Status

    Possible use of Google-Hack


    Google-Hack v0.13 released on CPAN! (02/23/05) (Click here for more info on Google-hack)
    Google-Hack v0.13 released on Sourceforge! (02/23/05)
    Google-Hack Web Interface
(Try out the Beta version)
    NLP Tools
    References/ Papers read so far
    Programs I have Implemented (These are fairly old programs that got me started with Google-Hack)
    Google API - CS 5761 (02/27/2004) (This is a presentation I did for an undergraduate class on the Google-API)
    Summary of Google Results (OLD)

