README ------ BrillPatch Version 0.1 Copyright (C) 2001-2002 Saif Mohammad, moha0149@d.umn.edu Ted Pedersen, tpederse@d.umn.edu University of Minnesota, Duluth ##################### LAST UPDATED: Feb,2003 ###################### BrillPatch [Mohammad Pedersen 2002] is a patch to the Brill Tagger [Brill 92] [Brill 94] aimed to guarantee pre-tagging. It enables the lexicalized contextual rules to be triggered by pre-tagged words as well. 1. Steps to apply patch: ======================= 1. Download BrillPatch (in a directory $PACKAGE say) and unpack it. Set an environment variable to the directory $PACKAGE/BrillPatch. This is how I set it... setenv BRILLPATCH $PACKAGE/BrillPatch Add the BRILLPATCH directory to your PATH. This is how I did it... set path = ($BRILLPATCH $path) 2. Download the Brill Tagger. It is available at... http://www.cs.jhu.edu/~brill/RBT1_14.tar.Z The Makefile may need to be modified to use `gcc' instead of `cc'. 3. Run the `Bpatch' script which has the following usage. Bpatch DIRECTORY DIRECTORY is the directory where the Brill Tagger was downloaded and unpacked. Thus DIRECTORY/RULE_BASED_TAGGER_V1.14/ has all the Brill Tagger files including its README and Makefile. The script may be run from any directory and not necessarily in the directory which houses the script. It may be noted that the script compiles the patched Brill Tagger as part of its processing. `pretag' is the file utilized to apply the patch. The patch will be applied to... $BRILL/RULE_BASED_TAGGER_V1.14/Tagger_Code/final-state-tagger.c The original file is saved as final-state-tagger.c.orig 2. MOTIVATION FOR THE PATCH: =========================== In the present form of Brill Tagger, if a word is pre-tagged with a certain part of speech (X say), the tagger treats X as the most likely tag for that word. It applies this tag to the word by the end of its first phase of tagging. During its second phase however, it might change the tag of the word based on its context. Thus X may be changed to Y. This behavior is not desirable when we are certain of the part-of-speech of a word, as the contextual rules may force a different tag albeit we pre-tagged the word. The application of this patch causes Brill tagger to behave in such a way that if a word is pre-tagged, contextual rules are not applied to that word. Hence, its tag remains the tag with which it was pre-tagged (X). Also, this correct tag will now influence the tag of its surrounding words, which is desirable. The patch has one other advantage as well which is described next. The Brill tagger does not apply lexicalized contextual rules if the data is pre-tagged. Lexicalized contextual rules are those rules which change the tag of a word based on surrounding words and possibly their tags. This patch modifies the tagger such that these rules are applied irrespective of whether the data is pre-tagged or not. The text file `examples.txt' has a list of sentences which exemplify the cases where Lexicalized rules are triggered only after the patch has been applied. Further details on guaranteed pre-tagging and applying lexicalized rules while pre-tagging are available in the paper "Guaranteed Pre-tagging for the Brill Tagger" which is to appear in the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, in February 2003, in Mexico City. A copy of the paper is distributed along with this package as a postscript file `cicling2003.ps'. 4. Copying: ========== This suite of programs is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. Note: The text of the GNU General Public License is provided in the file GPL.txt that you should have received with this distribution. 4. References: ============= 1. [Mohammad Pedersen 2002] S.Mohammad and T.Pedersen. Guaranteed Pre-Tagging for the Brill Tagger. Appears in the Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics(CICLing2003). 2. [Brill 92] E.Brill. A Simple Rule-Based Part of Speech Tagger. In Proceedings of the Third Conference on Applied Computational Linguistics, Trento, Italy, 1992. 3. [Brill94] E.Brill. Some Advances in Rule-Based Part of Speech Tagging. Proceedings of the 12th National Conference on Artificial Intelligence (AAAI-94), Seattle, WA, 1994. #####################################################################