Skip to content

seanghay/sosap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

43 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Phonetisaurus G2P inference, pure-Rust implementation. Ships as a Python package, a Rust crate (with a CLI), and a WASM module β€” same algorithm, four distribution targets.

Install

# pypi
pip install sosap

# GitHub
pip install git+https://github.com/seanghay/sosap.git

Phoneticize

from sosap import Model

model = Model("g2p.fst")
model.phoneticize("hello")
# => ['h', 'Ι›Ι›', 'l', 'oo']

N-best sampling

from sosap import Model

model = Model("g2p.fst")
results = model.phoneticize_sampling("hello", nbest=4)
# => [['h', 'Ι›Ι›', 'l', 'oo'], ['h', 'ee', '.', 'l', 'oo'], ['h', 'Ι›Ι›', '.', 'l', 'oo'], ['h', 'Ι›Ι›', 'l', '.', 'l', 'Ι”Ι”']]

results = model.phoneticize_sampling("hello", nbest=4, beam=1000, threshold=99.0, pmass=99.0)

For full access to the underlying PhonetisaurusScript interface (per-arc weights, raw input/output labels, accumulate/pmass modes), use model.phoneticize_paths(word, ...) which returns PathData objects with .path_weight, .path_weights, .ilabels, .olabels, .uniques.

Other targets

  • Rust crate at rust/ β€” cargo build --release produces a CLI (sosap <model.fst> <word>) and a rustfst-compatible library.
  • WebAssembly β€” cd rust && wasm-pack build --target web --release --no-default-features --no-typescript builds a browser-ready bundle in rust/pkg/. The Model class accepts the FST as raw bytes (new Model(uint8Array, "")).

License

MIT

About

πŸ—£οΈ sosap(αžŸαžΌαžšαžŸαž–αŸ’αž‘) Python/Rust binding for Phonetisaurus

Topics

Resources

License

Stars

Watchers

Forks

Contributors