Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

#Vespa

Vespa sample applications - Hypencoder: query-dependent neural ranking

This sample application reproduces Hypencoder (SIGIR '25) in a Vespa rank profile. Hypencoder replaces cosine similarity with a hypernetwork: the query encoder generates the weights of a small query-specific neural network at query time, and that q-net is applied to each document's stored embedding to produce the relevance score.

The app demonstrates running Hypencoder entirely inside Vespa with no custom code: a hugging-face-embedder for the passage encoder, an onnx-model for the query encoder, and a rank profile that expresses the q-net forward pass as tensor expressions.

A forthcoming post on the Vespa blog covers the design and performance characteristics in detail.

Requires at least Vespa 8.338.38.

Prerequisites

  • Docker or Podman.

  • Vespa CLI (brew install vespa-cli on macOS).

  • Python 3.10+ with pip install -r requirements.txt.

  • Hypencoder's reference implementation (required by model2onnx.py):

    git clone https://github.com/jfkback/hypencoder-paper.git
    git -C hypencoder-paper checkout 951ee82ddf2f
    pip install -e ./hypencoder-paper

Export the ONNX models

python model2onnx.py --checkpoint jfkback/hypencoder.2_layer

This writes passage_encoder.onnx, query_encoder.onnx, and tokenizer.json to app/models/, where app/services.xml and app/schemas/doc.sd reference them.

For roughly 2-3x faster query-encoder inference on CPU at the cost of some retrieval-quality drift, add --quantize-int8.

Start Vespa

docker run --detach --name vespa --hostname vespa \
  --publish 8080:8080 --publish 19071:19071 \
  --memory 12g \
  vespaengine/vespa:latest

vespa config set target local
vespa status deploy --wait 300

Deploy

vespa deploy app --wait 300

Feed

vespa feed dataset/sample.json

The passage_embedder runs server-side, producing the 768-d CLS embedding for each document.

Query

The app exposes four rank profiles:

profile what it does
cosine_baseline Plain cosine similarity. A reference point for the cost of bi-encoder scoring on this corpus.
hypencoder_onnx Full Hypencoder rank-all: the q-net scores every matched document.
hypencoder_rerank Cosine first-phase, Hypencoder q-net on the top 100.
hypencoder_lexical_rerank BM25 first-phase, Hypencoder q-net on the top 100.

encode_query.py produces the query JSON for each profile. The flag selects the profile:

# cosine_baseline
python encode_query.py --cosine "tallest mountain in the world" > /tmp/q.json
vespa query --file /tmp/q.json

# hypencoder_onnx (default)
python encode_query.py "tallest mountain in the world" > /tmp/q.json
vespa query --file /tmp/q.json

# hypencoder_rerank
python encode_query.py --rerank "tallest mountain in the world" > /tmp/q.json
vespa query --file /tmp/q.json

# hypencoder_lexical_rerank
python encode_query.py --lexical "tallest mountain in the world" > /tmp/q.json
vespa query --file /tmp/q.json

References