Academic Projects
1. Needles in Gigastack – MPI Parallel Programming with C: (http://trpc.sourceforge.net )
Task: To find top R interesting phrases of length m to n. Interesting term was defined based on the “Term
Frequency” * “Inverse Document Frequency” measure
Technology used: C, MPI. Platform: Blade Center with 1068 compute nodes and Linux operating system
Responsibilities: Analysis of the problem, design the algorithms, coding the design
Achievements:
- Successfully implemented “Suffix Array” data structure to identify n-grams (terms of length n)
- Designed
and implemented the method to identify and store unique n-grams without
bloating the memory requirements exponentially.
-
Designed and coded a novel approach for building "parallel suffix
arrays across multiple processing nodes". This approach provided
scalability when the work allocation was done properly.
- The design eliminated the need to do the expensive merge of Suffix Arrays at the manager node.
- Designed a novel approach to store unique n-gram strings in a binary tree by storing the super string.
2. Review: Improved regularized singular value decomposition for collaborative Filtering
Details about this project can be found here.