Phonetisaurus G2P inference, pure-Rust implementation. Ships as a Python package, a Rust crate (with a CLI), and a WASM module β same algorithm, four distribution targets.
# pypi
pip install sosap
# GitHub
pip install git+https://github.com/seanghay/sosap.gitfrom sosap import Model
model = Model("g2p.fst")
model.phoneticize("hello")
# => ['h', 'ΙΙ', 'l', 'oo']from sosap import Model
model = Model("g2p.fst")
results = model.phoneticize_sampling("hello", nbest=4)
# => [['h', 'ΙΙ', 'l', 'oo'], ['h', 'ee', '.', 'l', 'oo'], ['h', 'ΙΙ', '.', 'l', 'oo'], ['h', 'ΙΙ', 'l', '.', 'l', 'ΙΙ']]
results = model.phoneticize_sampling("hello", nbest=4, beam=1000, threshold=99.0, pmass=99.0)For full access to the underlying PhonetisaurusScript interface (per-arc weights, raw input/output labels, accumulate/pmass modes), use model.phoneticize_paths(word, ...) which returns PathData objects with .path_weight, .path_weights, .ilabels, .olabels, .uniques.
- Rust crate at
rust/βcargo build --releaseproduces a CLI (sosap <model.fst> <word>) and arustfst-compatible library. - WebAssembly β
cd rust && wasm-pack build --target web --release --no-default-features --no-typescriptbuilds a browser-ready bundle inrust/pkg/. TheModelclass accepts the FST as raw bytes (new Model(uint8Array, "")).
MIT