-
Notifications
You must be signed in to change notification settings - Fork 1
Home
The README should contain basically all of the information necessary for running the code. Additional information is available in the code's documentation. The following are FAQs:
The Folder= option in the configuration specifies where output will be stored. By default after every round of convergence the model will print a human readable version of the the generative distributions used by the model. In addition, lexical distributions will have a second version with the suffix .lex.gz which are conditional distributions. Other files generated include the grammar Grammar.gz, the induced lexicon Lexicon.gz, the serialized models model# and any output from testing Test.#.#.JSON.gz.
Yes, this verbose printing can be turned off by setting the configuration flag printModelsVerbose=False
Load Java and Maven modules (these can be added to your .bash_profile
module load sun-jdk/1.8.0
module load apache-maven/3.0.5
If you have not registered your SSH-Keys with Bitbucket, set terminal to ask for password
unset SSH_ASKPASS
CoNLL Shared Task
Index word lemma Coarse Fine Feats Head Label
1 Afirmó afirmar v vm num=s|per=3|mod=i|tmp=s 0 ROOT
NAACL Shared Task
Index word lemma Coarse Fine UNIVERSAL Feats Head Label
1 Afirmó afirmar v vm VERB num=s|per=3|mod=i|tmp=s 0 ROOT
Universal tagset mappings for some languages are available in www.YonatanBisk.com/Thesis
Tagset
https://github.com/ybisk/CCG-Induction/blob/master/src/main/resources/english.pos.map
| English | mapping | Tag Type |
|---|---|---|
| . | punct | Period |
| , | punct conj | Comma |
| CC | conj | Coordinationg Conjunction |
| JJ | Adjective | |
| VBD | verb | Verb, past tense |
| VBG | verb | Verb, gerund |
Roles are used by Induction to denote special restrictions
CCGBank
PARG CCG-style dependencies
SRC TAR CAT Arg Index SRC word TAR word
<s> 3
2 0 S[frg]/NP 1 year Not
2 1 NP[nb]/N 1 year this
<\s>
AUTO A bracketed parse (we assume these are collapsed to a single line):
(<T S[frg] 0 2>
(<T S[frg] 0 2>
(<L S[frg]/NP RB RB Not S[frg]/NP_158>)
(<T NP 1 2>
(<L NP[nb]/N DT DT this NP[nb]_165/N_165>)
(<L N NN NN year N>)
) ¬
) ¬
(<L . . . . .>)
)
-Xmx20g -- Specifies that the heap can grow to 20gb
Should be set to value < total machine memory
-XX:+UseParallelGC -- JVM spawns parallel garbage collection threads
-XX:ParallelGCThreads=2 -- Specifies the number of threads.
-server -- Optimize loops, etc
-XX:+UseFastAccessorMethods -- Optimize
-XX:+AggressiveOpts -- Optimize