Blinker_Data Usage



  • Key #1: I found the following file- FILE.DESCRIPTIONS which was distributed with the Blinker_Data, and it explained the Blinker file types. This file explains that some files are used for statistical uses, some are used only for open class words, some are the actual verses that were connected, and finally some have the connections in them.

  • Key #2:The file listed in Key #1 also contained an equation that helped me map the connection files to their respective verses. The equation is based on the connection file names, which are named as follows: samp##.SentPair# (where the ## is a number 1-25, and the # is a number 0-9). The equation given to calculate the verse that is mapped to a connection file is as follows:
    verse_number = ((## - 1) * 10) + (# + 1) [## and # are defined above]

  • Key #3: Each verse file is in the form: EN.sample.## or FR.sample.## (where ## is a number 1-25). Each file has 10 verses in them, which are all separated by a \n character. This, paired with the equation given above helped me map the connection file with the proper verse.

  • Key #4: In the connection files there are 2 columns of numbers. These numbers are indexes for words in the verses, opposite columns are translations of each other. If a word has a null connection, then its partner in the opposite column is a zero. This information helped me use the functions I already had implemented, to make the code simpler, and to transform the Blinker data into my format. There are some small differences, but they are unavoidable, and insignificant.

  • Key #5: The only files needed by my connection tool are the connection files and the verse files. They end in .SentPair# and .sample.## respectively. The other files are used for statistical purposes, and are not necessary for this research project, with the exception of one type. If a file ends in .open, it lists the connections ignoring the closed-class words. This may be beneficial because there will be many repeats of closed class words, and multiple connections may not be necessary.

  • Key #6: The way I chose to implement my connection tool does not match how the Blinker was used. Because of this, I needed to change the Blinker data to my format. This can be done by opening a Blinker connection file (samp##.SentPair#), and my program will find the verses needed for this connection. Then connections can be saved in my format by giving names for the English verse, French verse, and for the connection file (my format). This is necessary because the connections were made one verse at a time, and the verse files given have 10 verses in them. My application does not work with just one verse from a longer file, so I must save the single verse in its own file.