Skip to content

Semanticizer integration #65

@larsmans

Description

@larsmans

We need to integrate the semanticizer into xtas somehow, to get rid of a dependency on a single UvA server, or for that matter any web service. My proposal is to rework the semanticizer's core algorithms (only the basic ones) to work in-memory as a small library.

marisa-trie or datrie can be used to store the n-gram tables, instead of Redis.

Ping @c-martinez @IsaacHaze @dodijk @graus.

@dodijk @graus, can you help identify the main parts? Is there a paper or other design document that outlines how the n-gram matching works? Which parts of a WP database dump do we need to parse?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions