-- CS 8761 -- Fall 2004 -- Dr. Ted Pedersen -- Readings are required, and should be completed before the lecture on the indicated date. TRY THIS/THESE are suggested programming activities and exercises that give you a way to check your progress, and let you know what types of questions I might ask on quizzes or exams. These will not be collected. ------------------------------------------------------------------------- Week 1 (Wed, Sept 08) - first day of class, no readings ------------------------------------------------------------------------- Week 2 (Mon, Sept 13) - READ: Chapter 1, Manning and Schutze TRY THESE: Problems 1.3, 1.5, 1.7 (in Perl) (Wed, Sept 15) - READ: Chapter 2.1, Manning and Schtuze TRY THESE: Problems 2.1, 2.2, 2.3, 2.4, 2.5 Extend KWIC program to handle sentences, not lines: Display the sentence in which a word occurs, not the line. Display up to N words/characters in the sentence on either side of the word. ------------------------------------------------------------------------- Week 3 (Mon, Sept 20) - READ: Peter Wiemer-Hastings visit papers (TBA) http://lsa.colorado.edu/papers/dp1.LSAintro.pdf http://reed.cs.depaul.edu/peterwh/papers/its98.pdf http://reed.cs.depaul.edu/peterwh/papers/cogsci01.pdf READ: Chapter 2.2, Manning and Schtuze TRY THESE: Problems 2.9, 2.10, 2.11, 2.12, 2.13, 2.14, 2.15 (Wed, Sept 22) - READ: Chapter 8.5, 15.4 TRY THIS: Write a Perl program that will count the number of word types and tokens in any number of files (input to command line). Display the word types and their counts in sorted order (by frequency). Also compute the token/type ratio. ------------------------------------------------------------------------- Week 4 (Mon, Sept 27) - READ: Jill Burstein visit papers: http://www.knowledge-technologies.com/presskit/KAT_IEEEdebate.pdf (Wed, Sept 29) - READ: Jill Burstein visit papers: http://www.ets.org/research/dload/IAEA.pdf http://www.ets.org/research/dload/iaai03bursteinj.pdf ================================================================== Week 5 (Mon, Oct 4) No new reading (Wed, Oct 6) Prediction and entropy of printed english. Shannon, C. E. (1951) Bell Systems Technical Journal, 30, 50-64. (not online, I have hard copies for you). ================================================================== Week 6 (Mon, Oct 11) READ: Chapters 3 & 4 TRY THESE: 3.1, 3.2, 3.3, 3.6, 3.12, 3.13 TRY THESE: 4.1 (Wed, Oct 13) READ: Chapter 5 TRY THIS: Ngram Statistics Package http://www.d.umn.edu/~tpederse/nsp.html ================================================================== Week 7 (Mon, Oct 18) No New Reading (Wed, Oct 20) No New Reading ================================================================== Week 8 (Mon, Oct 25) No New Reading, Review for Exam! (Wed, Oct 27) No New Reading, Review for Exam! ================================================================== Week 9 (Mon, Nov 01) Mid-Term Exam (Wed, Nov 03) READ: Sections 10.1, 10.4 READ: A Plagiarism Case Study TRY THIS: The Brill POS Tagger http://research.microsoft.com/~brill/ ================================================================== Week 10 (Mon, Nov 08) READ: Sections 6.1-6.2 (Wed, Nov 10) READ: Sections 6.2-6.5 ================================================================== Week 11 (Mon, Nov 15) No new reading Project ALPHA version due (Mon, Nov 17) No new reading ================================================================== Week 12 (Mon, Nov 22) READ: Sections 7.1-7.3 (Wed, Nov 24) READ: Sections 7.4-7.6 ================================================================== Week 13 (Mon, Nov 29) No new reading (Wed, Dec 01) No new reading ================================================================== Week 14 (Mon, Dec 06) No new reading (Wed, Dec 08) Project BETA Version due ================================================================== Week 15 (Mon, Dec 13) No new reading (Wed, Dec 15) No new reading (Sat, Dec 18) Final Exam ================================================================== (Wed, Dec 22) Project FINAL version due