Regarding assignment 4 grading: You were to collect 20 pairs of articles, I graded those at 1/2 point per pair for an overall total of 10 points. You would lose a point for every 2 pairs of articles that were not translations of each other. My spot checking did not reveal anyone having with more than minor problems in their corpus (mislabled languages, faulty urls, etc.) so no deductions were made. However, if in using these corpus for experimental purposes you notice what appears be a pair of articles that are clearly not translations of each other (based on size differential, obviously different cognates, proper names, etc.) then let me know via email at tpederse@d.umn.edu. Be specific, refer to the particular article in the corpus that appears to be faulty. I will notify the corpus creator of this problem and they will have 3 days time to fix the problem. Please note that this will not hurt the grade of the creator, since the only way the creator of the corpus can lose points is if they don't fix the problem! NEW: if you are the first to report a faulty translation pair in a corpus, you will get 1/2 point of extra credit for every such article pair you find. I will need to agree with your judgement for the extra credit to be awarded. This is not meant to turn you into a pack of informers. Rather, we simply want to make sure that our data is of the highest quality, so that we can use it for boundary detection, sentence alignment, and translation. Remember, there is no penalty to you if you fix whatever problems are reported. If you find problems in other people's corpora, then you can get a few extra points.