CS 5761 - Introduction to Natural Language Processing 
 Project Proposal due by 5pm Weds April 7 via webdrop. Please submit  
a pdf or ps file. 
 Objectives 
To outline the design and scope of your project in a formal written 
proposal. 
 Specification 
Your project will involve producing both a Perl implementation and a   
written report. There are two possible topics:
-  Conduct an analysis that shows whether or not the Voynich Manuscript  
consists of human language (or not). 
-  Develop a method that uses information from Google to identify sets  
of related words (much like Google Sets). 
Within these topics you have considerable discretion as to how you 
proceed. If you would like to work in teams of two, that is possible. 
However, you must clearly define what each team member will be 
contributing. You should structure things so that each team member has a 
distinct role in the project.
 
If you have an alternative idea for a project, please let me know via 
email by Weds March 31. I am willing to consider such possibilities, but 
would like to discuss those with you before you proceed too far. If you 
would like to use the basic idea of the projects above and modify them in 
some significant way, then you should also send me an email note by March 
31 letting me know the general idea so we can discuss. 
By Weds April 7 you should have produced a project proposal. It should  
include the following:
-  Problem Description (1 page) : Which of these problems are 
you trying to solve? Describe the problem in general terms and what 
practical applications the techniques you are developing could have. You  
should provide at least two references to published papers (not just web  
sites) that discuss the same or a related problem. You should read and  
briefly summarize these papers. If you are working in a team of 2, you 
should each find and read 2 papers (for a total of 4). 
-  Overview of Solution/Approach (1 page) : What is the general 
approach you plan on taking? 
-  Voynich Project: describe what tests or techniques your analysis will  
consist of, and why you think they will help to answer the question of if  
it is human language.  
-  Related Words: describe your algorithm as a series of  steps. 
Provide an example that shows how it works. 
 If you are working in a team of two, clearly indicate which of you will  
handle which step. All steps in either project should be clearly 
assigned to one team member or the other.
-  Evaluation Plan (1-2 paragraphs) : How will you show that your 
solution is valid? Will you need to find or create "gold standard"  
data to use as a point of comparison? If so, where will you get that, or
how will you create it. 
-  Voynich Project: You may want to consider carrying out your analysis  
on text that is known to be human language (or not) and showing that your  
analysis produces the correct result. Text that is truly human language is 
easy to find (Project Gutenberg, etc.) and you could possibly use your 
program from lab4 to generate text based on a unigram model, which might 
not look like true human language. (These are just ideas, you can proceed 
as you wish). 
-  Google Sets: You may want to consider comparing the sets of words 
produced with  sets of words that are known to exist in a thesaurus or  
other resource.  You may want to consider using WordNet, which 
provides sets of related words. This is installed on the csdev machines  
and freely available for download. Just run "wn" or  "man wn" to find out   
more.
 
Your proposal should probably be 2-3 pages, and it should be well written 
and carefully thought out. It will provide a road map for your project so  
the more you put into this the more smoothly your  project will go.
 
 
You will also present your project proposal in the lab on Thursday April 
8. There is no need to prepare a formal presentation, we can use your 
written proposal as a point of reference while you describe things. 
 
If I have significant concerns about your topic or some aspect of your  
proposal I will let you know within a few days after you  submit the 
proposal. In that case I might request that you make some changes or  
provide  additional details.