Ngram Statistics Package (NSP)

NSP allows you to identify word and character Ngrams that appear in large corpora using standard tests of association such as Fisher's exact test, the log likelihood ratio, Pearson's chi-squared test, the Dice Coefficient, etc. NSP has been designed to allow a user to add their own tests with minimal effort.

We have a mailing list designed to support NSP users.

Download the Current Version (v1.27, released February 16, 2013) from CPAN or SourceForge

Bibliography (papers by users of NSP)



NSP Behind the Scenes

NSP has been used extensively in SenseClusters and the Duluth and word sense disambiguation systems for Senseval-2 and Senseval-3.

NSP Development Team


The development of the Ngram Statistics Package has been supported by a National Science Foundation Faculty Early Career Development (CAREER) Program award (#0092784, 2001-2007), and by a Grant in Aid of Research, Artistry and Scholarship from the Graduate School of the University of Minnesota (2000-2001). Logo CPAN Logo NSF Logo

By: Ted Pedersen - tpederse AT d umn edu