Ngram Statistics Package (NSP)

NSP allows you to identify word and character Ngrams that appear in large corpora using standard tests of association such as Fisher's exact test, the log likelihood ratio, Pearson's chi-squared test, the Dice Coefficient, etc. NSP has been designed to allow a user to add their own tests with minimal effort.

We have a mailing list designed to support NSP users.

If you would like to report a bug or request a feature, please do that here!

Download the Current Version (v1.31, released October 4, 2015) from CPAN or SourceForge


Bibliography (papers by users of the Ngram Statistics Package)


NSP Behind the Scenes

NSP has been used extensively in SenseClusters and the Duluth and word sense disambiguation systems for Senseval-2 and Senseval-3.

NSP Development Team


The development of the Ngram Statistics Package has been supported by a National Science Foundation Faculty Early Career Development (CAREER) Program award (#0092784, 2001-2007), and by a Grant in Aid of Research, Artistry and Scholarship from the Graduate School of the University of Minnesota (2000-2001). Logo CPAN Logo NSF Logo

By: Ted Pedersen - tpederse AT d umn edu