I have assigned grades to assignment 4. I've looked at all of your bitext files to varying degrees of details, and in general I have not found any major problems. I have found a few minor problems with language tags being inaccurate, questionable urls in the article tags, etc. but in general the data appears to me to be translated text. Thus, I have assigned full credit to all assignment 4 submissions. (Please read on however, there is a condition upon which you full credit will be maintained). I have posted all of the bitext files on the class web page. You can go directly to the data at: http://www.d.umn.edu/~tpederse/Courses/CS8995-SPR01/Assign/bitext.html I would encourage you to use this data for sentence boundary and alignment testing. While no gold standard is available, it may at least give you a sense of how your data performs on real world data. There are some novel language pairs available (languages that did not make it to the gold standard stage, most notably perhaps several examples of English-German bitext). Here's the condition on full credit - if you notice a problem in any of the bitext as you work with it, please let me and the creator know. If such a problem is reported, I expect that the creator will fix or replace the offending article/s within 3 days time. If this is not done then I will adjust the assignment 4 score of the creator downward. (In general the problem I am referring to are articles that are clearly not translations of one another.) We will likely use some of this data in stage 3, so it is in your best interests to check some of this data a little bit to make sure it is reasonable. Please let me know if you have any questions.