Changes made in Sense-Clusters version 0.93 during version 0.95 Ted Pedersen tpederse@d.umn.edu Anagha Kulkarni kulka020@d.umn.edu Mahesh Joshi joshi031@d.umn.edu 1. Updated Toolkit/clusterstop/clusterstopping.pl : -Anagha - changed the default cluster-stopping measure from PK2 to PK3 - changed the default crfun from h2 to i2 - formatted and added details to the error messages - added check for catching "NaN" values generated by the crfuns with the Expected / reference data (Gap Statistic) - added a check for -ve delta values - updated and reorganized the documentation. - now generates PREFIX.gap file that contains crfun values, delta values and the predicted k. - updated the logic for setting the default delta value. - modified the redirection from >& to > for the vcluster and scluster calls. 2. Updated discriminate.pl : -Anagha - changed the default #clusters from 10 to 2 - modified the program logic to catch the exit status of clusterstopping.pl and if it has failed then output the reason of failure from the *.predictions file (if present) and use the default #clusters (2) to proceed. - changed the calls to vcluster and scluster such that now the --showtree option is used only if the #clusters > 1. (NOTE: The -showtree option provides a ascii representation of the clustering solution however if the #clusters is 1 then this option generates quite a few error messages which are not related to SenseClusters functionality. Thus we are currently not using this option when #clusters = 1. If Cluto fixes this problem in future then we can go back to using -showtree option consistently.) - now dendograms are generated by vclusters or scluster only if #clusters > 1 - updated and reorganized the documentation. - added an error check to verify that the number of bigram features is not 0 before proceeding with generation of co-occurrence features. - removed the error check: if --training option not used nor --split option used then --scope_train cannot be used. - modified messages: added angled brackets to the filenames and remove periods following filenames or parameters. - added an error check to discriminate.pl to verify that the specified training file exists. 3. Updated Web/SC-cgi/callwrap.pl : -Anagha - Now displays the message about SVD not being performed or cluster-stopping failing and thus using the default #clusters. 4. Updated Demos directory : -Ted - reorganized files and directories somewhat, and added new options to demo scripts, to reflect new functionality in the package that has been introduced since the demos were last updated 2 years ago. 5. Updated Toolkit/preprocess/sval2/maketarget.pl : -Anagha - added enclosing head tags to the regex generated by this script via --head option. 6. Updated default stoplist : -Ted - former stoplist only removed lower case words. The new list includes stop words that begin with upper and lower case. This affects the web interface, Demos, and Docs. 7. Updated discriminate.pl : -Mahesh - Added support for LSA context clustering using the "--context o2 --lsa" option combination - Modified error messages - Updated POD and command line help with respect to LSA context clustering - Incremented internal version - Updated to invoke nsp2regex.pl after wordvec.pl in SC native order2 context clustering mode 8. Updated Toolkit/vector/order1vec.pl : -Mahesh - Modified output of --clabel option to discard features that were not found even once in the test data - Added --transpose option to support output in the form of a feature-by-context matrix similar to Latent Semantic Analysis (LSA) representation - Added --testregex TEST_REGEX option, which outputs only those regular expressions from the input FEATURE_REGEX file that matched at least once in the input SVAL2 file. This file is required as input to order2vec.pl in LSA context clustering mode. 9. Updated Toolkit/vector/order2vec.pl : -Mahesh - Dropped the --token TOKEN_REGEX option and the FEATURES file at the command line, order2vec.pl now requires a command line of the form: order2vec.pl [options] SVAL2 WORDVEC FEATURE_REGEX - Modified the regex that reads features from features file, to accept general ngrams, rather than just unigrams - Updated POD and command line help 10. Added new test cases in Testing/vector/order2vec/ -Mahesh - Added four test cases for four types of features, testing the LSA context clustering scenario, in binary and non-binry mode 11. Updated web interface files in Web/SC-cgi -Mahesh - Modified index.cgi, first.cgi, second.cgi and callwrap.pl to support LSA context clustering 12. Updated Docs/HTML/discriminate.html -Mahesh - Updated with respect to POD update of discriminate.pl 13. Updated Docs/HTML/Toolkit_Docs/vector/order2vec.html -Mahesh - Updated with respect to POD update of order2vec.pl 14. Updated Docs/Flows/SenseClusters-ContextClustering.ai/pdf -Mahesh - Added LSA context clustering flow 15. Updated Docs/Flows/SenseClusters-WordClustering.ai/pdf -Mahesh - Removed obsolete kocos.pl call from the flow 16. Updated SC/Toolkit/clusterlabel/clusterlabeling.pl to create -Anagha the temporary files with time-stamp in their names. (Changelog-v0.93to0.95 Last Updated on August 7, 2006 by Anagha)