Regarding assignment 4 grading: 

You were to collect 20 pairs of articles, I graded
those at 1/2 point per pair for an overall total
of 10 points. You would lose a point for every 2
pairs of articles that were not translations of
each other. 

My spot checking did not reveal anyone having with
more than minor problems in their corpus (mislabled
languages, faulty urls, etc.) so no deductions 
were made. 

However, if in using these corpus for experimental
purposes you notice what appears be a pair of articles
that are clearly not translations of each other
(based on size differential, obviously different
cognates, proper names, etc.) then let me know via
email at tpederse@d.umn.edu. Be specific, refer to
the particular article in the corpus that appears
to be faulty. I will notify the corpus creator
of this problem and they will have 3 days time to
fix the problem. Please note that this will not hurt
the grade of the creator, since the only way the
creator of the corpus can lose points is if they 
don't fix the problem!

NEW: if you are the first to report a faulty 
translation pair in a corpus, you will get 1/2 
point of extra credit for every such article
pair you find. I will need to agree with your
judgement for the extra credit to be awarded. 

This is not meant to turn you into a pack of 
informers. Rather, we simply want to make sure 
that our data is of the highest quality, so that 
we can use it for boundary detection, sentence 
alignment, and translation. 

Remember, there is no penalty to you if you fix 
whatever problems are reported. If you find problems 
in other people's corpora, then you can get a few 
extra points.