CS 8995 Corpus Based Natural Language Processing

Final Project: Empirical Methods for Multilingual Text

Stage 2 - Gold Standard Data, updated as of 4/16

Download the gold standard data from each team and run your sentence alignment program on it. Your sentence alignment program should be named teamname.pl (where teamname = morelia, toluca, etc) and your sentence alignment evaluation program should be named teamname-eval.pl. Remember to remove the alignment tags from the gold standard data before you feed it to your sentence aligner. The alignment tags should only be used by the evaluation program. So, if you are team TOLUCA and you are running MORELIA gold standard data, the steps you follow might look like this:
 
remove-align-tags.pl morelia.utx > morelia-notag.utx 
toluca.pl morelia-notag.utx > morelia-align.utx
toluca-eval.pl morelia.utx morelia-align.utx
Send me the alignment results as described here .

by: Ted Pedersen - tpederse@d.umn.edu