This is a Perl module that measures the similarity of two files or two strings based on the number of overlapping (shared) words, scaled by the lengths of the files. It computes the F-Measure, the Dice Coefficient, the Cosine, and the Lesk measure.

Want to report a bug or submit a patch? Please do that here!

We have mailing lists for users and news and developers.

Download the Current Version (v0.13, released October 8, 2015) from CPAN or Sourceforge


See the README and CHANGES files. Browse the current CVS version.

Text-Similarity Development Team


The development of Text-Similarity has been supported by a National Science Foundation Faculty Early Career Development (CAREER) Program award (#0092784, 2001-2007), by a Grant in Aid of Research, Artistry and Scholarship from the Graduate School of the University of Minnesota (2003-2004), and by the Digital Technology Initiative of the Digital Technology Center of the University of Minnesota (2004-2005).

SourceForge.net Logo CPAN Logo NSF Logo
By: Ted Pedersen - tpederse AT d umn edu