Thursday, September 12 5:00pm Campus Center 120 Unlabeling and Annotating: What I Did on my Sabbatical Vacation Rich Maclin Associate Professor Department of Computer Science University of Minnesota, Duluth This talk will cover two of the research topics I pursued during my recent sabbatical. In the first part of the talk I will discuss results from my poster with Mark Craven presented at ISMB 2002 entitled ``Automatically Extracting Keyphrases for Clusters of Genes.'' With the growing use and importance of high throughput methods such as microarrays, one problem that is emerging is how to find commonalities among the clusters of genes that respond similarly in a high throughput experiment. This task normally involves extensive literature search by an expert. In this work we have developed a server, available on the web, that can assist in this task by analyzing the literature pertaining to a cluster of genes and finding keyphrases that seem to be characteristic of that cluster. Initial results from this work are promising and this is an ongoing area of research. In the second part of my talk I will discuss results from my KDD 2002 paper with Kristin Bennett and Ayhan Demiriz entitled ``Exploiting Unlabeled Data in Ensemble Methods.'' This work presents a general method for taking advantage of unlabeled data available for a classification task. A classification task generally involves learning a model of a classification problem (e.g., which of these patients will develop cancer) based on information we can measure from each instance (e.g., results of tests performed, family history data, etc.). But standard classification methods usually require an expert to ``label'' a large number of examples as being members or not members of the class of interest. This process is time-consuming and in some cases impossible. But it is often the case that one can obtain large amounts of data that has not been labeled with a class. We present a technique for making use of such data in a general manner based on ensemble methods. The resulting method won the NIPS 2001 contest for semi-supervised methods.