Skip to content

No CJK support? #4

Description

@movsb

Just tested with Chinese, blaze currently does not support it. Any plan on supporting custom tokenizer (e.g. Unicode code point tokenizer)? Thanks.

package main

import (
	"fmt"

	"github.com/wizenheimer/blaze"
)

func main() {
	// Create a new inverted index
	idx := blaze.NewInvertedIndex()

	// Index some documents
	idx.Index(1, "你好,世界")

	// Search with BM25 ranking
	matches := idx.RankBM25("好", 10)

	// Print results
	for _, match := range matches {
		fmt.Printf("Document %d (score: %.2f)\n", match.DocID, match.Score)
	}
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions