Skip to content
simulacrum6 edited this page Mar 12, 2016 · 1 revision

#Comments on the code

This section is intended to clarify parts about the code, which the author considers to be potentially confusing or not immerdiately apparent at first glance.

######ReaderTrain.java (Reader)

In the data format provided by the SemEval, each line contains a sentence and (appended) information which word of that sentence was rated and what the outcome of that rating was. One sentence usually occurs in multiple lines, since multiple words in each sentence were rated.
The reader aggregates all rated instances occuring in a single sentence under the same jCas in a line by line fashion. To achieve this, the reader's hasNext() method has two distinct functions:

  1. Determining, whether the current line is the last of the file.
  2. Extract the rating information from lines, until the following line starts with a different sentence.

Afterwards, the jCas is constructed in the getNext() method, attaching the Rating Information to the corresponding segment of the sentence.

######ReaderTrainTest.java

For some reason, it appears that CAS objects, generated in this Unit Test do only contain a single instance of each Gold Annotation, instead of the entire list.
Checking the output in a TestPipeline (see below), suggests that the Gold Annotation Process is working as intended. A simple run of the TestPipeline shows this.

######ReaderLogicTest.java

Due to the aforementioned flaws of the data reader, this class was used as a simple model of the file reader's buffering algorithm. It was used to help detecting the flaws in the readers' buffering algorithm and ultimately fix it.

######TestPipeline.java

This component was only used to enable quick testing of different annotators functionalities. It is not part of the experiments in any way.

######PlayGround.java

This component was used to get a console output for all annotators, before unit test were written for new annotators. It is not part of the experiments in any way.

######CharNGramAnnotator.java

It was originally intended to use this annotator as preprocessing for affix feature extraction. It became, however, obsolete, once DKPro TC was used.

######SyllableCountAnnotator.java

Another annotator that became obsolete. Primarily due to its poor performance in identifying the correct number of syllables in a word.

######de.unidue.langtech.testing.pp.misc

This package contains various classes, which were either used for informal tests or became obsolete during the evolution of the project. These are not considered to be integral part of the project.


#Third Party Resources used in the Project
In the following paragraphs, frameworks and resources used in the project are listed.

###Frameworks

###Language Proccessing Algorithms

####Preprocessing

####Feature Extraction

###Resources

Clone this wiki locally