The following outlines how to add support for a new POS to the FST, using the example of proper nouns being added to the Ojibwe FST.
- Create a config file in the morphological source.
- For example: create
proper_nouns.jsoninOjibweMorph/config/. - You also need to make sure this config file is being given as a command-line argument to
csv2lexc.py, but this was taken care of automatically by theMakefileinOjibweMorphwhich generates the list of config files based on the contents ofconfig/. - Check out the docs for more info on what the config values mean. Not all are necessary for basic functioning.
- The following must be set and their values must be correct
morphology_source_path-regular_csv_fileslexical_databasemorph_features(assuming you want the analysis to contain any info!)
- The following must be set, but their value can really be anything -- just choose something specific to this POS:
posroot_lexiconregular_lexc_file
- The following must be set, but you can just use the default/null value (described in the config docs):
irregular_csv_filesirregular_lexc_filemissing_tag_markermissing_form_markermultichar_symbolstemplate_path
- The following must be set and their values must be correct
- For example: create
- Add the root name to the template for building
root.lexc.- For example: add
ProperNounRootto the end ofOjibweMorph/templates/root.lexc.j2.
- For example: add
- Create a paradigm spreadsheet in the morphological source.
- For example: create
PROPER_NOUN.csvin one of theOtherSpreadsheets/folders inOjibweMorph. - There's more info in the docs, but basically there are six mandatory columns for this spreadsheet:
Paradigm,Class,Lemma,Stem,Form1Surface, andForm1Split. - There has to be at least one row for every possible paradigm and class value.
- For example, there are two possibilities for proper nouns: paradigm and class being
PersonName, or paradigm and class beingPlaceName. Therefore, there are two rows inPROPER_NOUN.csv.
- For example, there are two possibilities for proper nouns: paradigm and class being
- For example: create
- Not required, but if you want the FST to handle more than a few cases, you'll need a spreadsheet in the lexical source.
- For example: add
PROPER_NOUNS.csvtoOjibweLexicon/OPD/. - As outlined in the docs, the six columns are mandatory (
Lemma,Stem,Paradigm,Class,Translation, andSource), but only four require real values;TranslationandSourcecan beNONE.
- For example: add