One way to view assignment 2 is as a revision in your view of how a word is formed. We have operated under the assumption that words are strings of alphabetic characters delimited by spaces. However, there are many languages in the world that do not delimit words with spaces. So we can think of what we are doing as trying to develop a tool that can deal with languages where spaces do not always delimit words - sometimes they do, other times they may be embedded in a word and simply act like an alpha character. While assignment 1 limited us to a rather English-centric view of what a word consists of, assignment 2 is forcing us to broaden our perspective. You can view assignment 1 as computing pointwise mutual information values for regular expressions that were made up strictly of string literals such as /interest/ and /rate/. We are extending this notion in assignment 2 such that more powerful regular expressions can be used. **However, the implicit assumption in assignment 1 that each regular expression matches a word remains equally valid in assignment 2.** For example, suppose the regexs are: /interest rates are / /\w+/ The matches in the following sentence are shown in parenthesis. The (interest rates are ) (falling) sharply. Now, the question arises as to how to count these "bigrams". If you view each pattern that matches as a word, then it remains exactly the same as in assignment 1. W1 W2 The interest rates are interest rates are falling falling sharply Crucially, note that 'interest' is never counted as a word. This is because 'interest rates are' is being treated as a word. If it helps to visualize things, imagine that the embedded spaces are some other character (eg. interest#rates#are). A similar analogy can be made for characters, but I will leave that for you to flesh out. This is simply an analogy meant to help you visualize the problem. If you already have a clear perspective of what you need to do then don't worry if you don't understand the above so well. There are other ways to visualize the problem that make just as much sense. Good luck!