Just tested with Chinese, blaze currently does not support it. Any plan on supporting custom tokenizer (e.g. Unicode code point tokenizer)? Thanks.
package main
import (
"fmt"
"github.com/wizenheimer/blaze"
)
func main() {
// Create a new inverted index
idx := blaze.NewInvertedIndex()
// Index some documents
idx.Index(1, "你好,世界")
// Search with BM25 ranking
matches := idx.RankBM25("好", 10)
// Print results
for _, match := range matches {
fmt.Printf("Document %d (score: %.2f)\n", match.DocID, match.Score)
}
}
Just tested with Chinese, blaze currently does not support it. Any plan on supporting custom tokenizer (e.g. Unicode code point tokenizer)? Thanks.