CS 8995 Corpus Based Natural Language Processing
Assignment 4 Bitexts
Below are the bitexts collected by each of the members of CS 8995. All of
these bitexts include English and another language, among them French,
Spanish, Chinese, and German.
I have reviewed these bitexts and in general they seem like reasonable
translations, although sometimes the data is rather messy. If you notice
any serious problems in this data please let me know and I will ask the
creator to correct them.
I would encourage you to use some of this data to experiment with
your sentence boundary detection and alignment programs. While you don't
have a gold standard to compare to, you can get an idea of how well
your approach is working simply by inspecting the output.