Text::Similarity

This is a Perl module that measures the similarity of two files or two strings based on the number of overlapping (shared) words, scaled by the lengths of the files. It computes the F-Measure, the Dice Coefficient, the Cosine, and the Lesk measure.

Want to report a bug or submit a patch? Please do that here!

We have mailing lists for users and news and developers.

Download the Current Version (v0.13, released October 8, 2015) from CPAN or Sourceforge

Documentation

See the README and CHANGES files. Browse the current CVS version.

Text-Similarity Development Team

Ted Pedersen tpederse AT d umn edu
Siddharth Patwardhan sidd AT cs utah edu
Satanjeev Banerjee satanjeev AT cmu edu
Jason Michelizzi
Ying Liu liux0395 AT umn edu

Acknowledgments

The development of Text-Similarity has been supported by a National Science Foundation Faculty Early Career Development (CAREER) Program award (#0092784, 2001-2007), by a Grant in Aid of Research, Artistry and Scholarship from the Graduate School of the University of Minnesota (2003-2004), and by the Digital Technology Initiative of the Digital Technology Center of the University of Minnesota (2004-2005).

By: Ted Pedersen - tpederse AT d umn edu