A 10-passage / 5-query toy corpus you can ingest, search, and evaluate
end-to-end in under a minute, with no external service: Milvus-Lite
runs embedded inside the Python process and stores everything in a single
file under ./.docuverse/milvus.db.
| File | Purpose |
|---|---|
passages.jsonl |
10 documents (science + history snippets). |
queries.jsonl |
5 queries, each with a relevant field naming the gold passage id. |
qrels.tsv |
Same gold judgments in TREC qrels format (for tools that prefer it). |
recipe.yaml |
A spelled-out config file equivalent to from_preset("milvus-dense", ...). |
All three produce the same ranked results.
from docuverse import SearchEngine
engine = SearchEngine.from_preset(
"milvus-dense",
model_name="ibm-granite/granite-embedding-small-english-r2",
index_name="docuverse_quickstart",
input_passages="examples/quickstart/passages.jsonl",
input_queries="examples/quickstart/queries.jsonl",
output_file="examples/quickstart/output.json",
)
engine.ingest(engine.read_data())
queries = engine.read_questions()
results = engine.search(queries)
print(engine.compute_score(queries, results))from docuverse import SearchEngine
engine = SearchEngine(config_or_path="examples/quickstart/recipe.yaml")
engine.ingest(engine.read_data())
queries = engine.read_questions()
print(engine.compute_score(queries, engine.search(queries)))docuverse run --config examples/quickstart/recipe.yaml…or composing preset + overrides on the command line:
docuverse run --preset milvus-dense \
--override model_name=ibm-granite/granite-embedding-small-english-r2 \
--override index_name=docuverse_quickstart \
--override input_passages=examples/quickstart/passages.jsonl \
--override input_queries=examples/quickstart/queries.jsonl \
--override output_file=examples/quickstart/output.jsonThe embedded Milvus-Lite database lives at ./.docuverse/milvus.db. Delete
that file (or the whole .docuverse/ directory) to start fresh — .docuverse/
is in .gitignore so it won't end up in commits.