We need to integrate the semanticizer into xtas somehow, to get rid of a dependency on a single UvA server, or for that matter any web service. My proposal is to rework the semanticizer's core algorithms (only the basic ones) to work in-memory as a small library.
marisa-trie or datrie can be used to store the n-gram tables, instead of Redis.
Ping @c-martinez @IsaacHaze @dodijk @graus.
@dodijk @graus, can you help identify the main parts? Is there a paper or other design document that outlines how the n-gram matching works? Which parts of a WP database dump do we need to parse?
We need to integrate the semanticizer into xtas somehow, to get rid of a dependency on a single UvA server, or for that matter any web service. My proposal is to rework the semanticizer's core algorithms (only the basic ones) to work in-memory as a small library.
marisa-trie or datrie can be used to store the n-gram tables, instead of Redis.
Ping @c-martinez @IsaacHaze @dodijk @graus.
@dodijk @graus, can you help identify the main parts? Is there a paper or other design document that outlines how the n-gram matching works? Which parts of a WP database dump do we need to parse?