Skip to content

feat(valkey): implement Valkey vector store integration#1

Open
Jonathan-Improving wants to merge 1 commit into
mainfrom
feat/AEA-422-rig-valkey
Open

feat(valkey): implement Valkey vector store integration#1
Jonathan-Improving wants to merge 1 commit into
mainfrom
feat/AEA-422-rig-valkey

Conversation

@Jonathan-Improving
Copy link
Copy Markdown
Owner

Summary

Implements rig-valkey, a companion crate providing Valkey vector store integration for the rig framework (AEA-422).

Changes

  • crates/rig-valkey/src/lib.rsValkeyVectorStore<M> implementing VectorStoreIndex and InsertDocuments traits using FT.SEARCH with KNN
  • crates/rig-valkey/src/filter.rsValkeySearchFilter implementing SearchFilter with injection prevention (field sanitization, tag escaping, => rejection)
  • tests/integrations/valkey.rs — Integration tests against testcontainers Valkey instance
  • sdd/ — Design document, evaluation harness, and constraints

Security hardening

  • Removed Deserialize derive from ValkeySearchFilter to prevent bypass of sanitization
  • ? wildcard escaped in TAG values
  • => check handles whitespace variants
  • Field names restricted to ASCII alphanumeric + underscore

What was tested

  • cargo check -p rig-valkey
  • cargo clippy --all-features --all-targets ✅ (zero warnings)
  • cargo check --tests --all-features

Blocked

  • Cookbook examples in valkey-samples repo (separate deliverable)
  • Custom Deserialize impl for ValkeySearchFilter needed to restore VectorStoreIndexDyn blanket impl compatibility

@Jonathan-Improving Jonathan-Improving force-pushed the feat/AEA-422-rig-valkey branch from d340032 to b15c4be Compare May 14, 2026 14:54
@Jonathan-Improving Jonathan-Improving marked this pull request as ready for review May 14, 2026 14:54
Copy link
Copy Markdown

@rileydes-improving rileydes-improving left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid Valkey integration. Security boundary (sanitize_field_name + escape_tag_value + => guard) is well-considered. Seven HIGH items to address before merge:

  1. Score normalization assumes COSINE — non-COSINE indexes silently produce nonsense scores.
  2. top_n vs top_n_ids disagree on rows missing the document_field.
  3. req.threshold() is ignored; eleven other backends honor it.
  4. Removing Deserialize from ValkeySearchFilter cuts the store off from dynamic_context/dynamic_tools — ship a sanitizing custom impl or document.
  5. value_to_numeric collapses non-parseable input to "0" silently.
  6. No test covers the => injection guard (the explicit security boundary).
  7. No unit tests on filter.rs at all (run without Docker).

Eleven MEDIUM items + sixteen LOW polish items in line comments.

let score = field_map
.get(SCORE_ALIAS)
.and_then(|s| s.parse::<f64>().ok())
.map(|d| 1.0 - d)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 score = 1.0 - distance hardcodes COSINE; L2/IP indexes return nonsense or negative scores. Read DISTANCE_METRIC from FT.INFO at construction, store on config, branch on it.

let rows = parse_search_rows(value, config)?;
let mut results = Vec::new();

for row in rows {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 top_n skips rows missing document_field here while top_n_ids (via parse_search_ids at L393) includes them — same query, divergent results. Either error on missing field or skip uniformly across both paths.

) -> Result<redis::Value, VectorStoreError> {
let prompt_embedding = self.model.embed_text(req.query()).await?;
let vec_bytes = embedding_to_bytes(&prompt_embedding.vec);
let samples = req.samples() as usize;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 req.threshold() is never read inside execute_knn_search. Eleven other backends (mongodb, qdrant, lancedb, milvus, sqlite, postgres, surrealdb, scylladb, s3vectors, helixdb, vectorize) honor it. Add a post-filter in top_n/top_n_ids (around L235/L243) or document the omission.

) -> Result<redis::Value, VectorStoreError> {
let prompt_embedding = self.model.embed_text(req.query()).await?;
let vec_bytes = embedding_to_bytes(&prompt_embedding.vec);
let samples = req.samples() as usize;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 samples=0 flows through unchecked and produces [KNN 0 ...] which Valkey rejects with an opaque error. Reject up front with VectorStoreError::BuilderError for a cleaner user experience.

.arg(&self.config.embedding_field)
.arg(vec_bytes.as_slice())
.arg(&self.config.document_field)
.arg(&json_doc)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 json_doc is written to every HSET per chunk — a doc with N chunks results in N copies of the same JSON on disk. Store doc once at {prefix}:doc:{doc_id} with a foreign key per chunk, or document the storage tradeoff.

.build();

assert!(request.filter().is_some());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Missing test: No test exercises ValkeyVectorStoreConfig with non-default field names. The whole point of the config is configurability — current tests only use default().

Comment thread src/lib.rs
pub use rig_surrealdb::*;
}

#[cfg(feature = "valkey")]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 pub mod valkey is missing the #[cfg_attr(docsrs, doc(cfg(feature = "valkey")))] line that every other companion module on this facade carries. The docs.rs feature badge will be missing for valkey only. Add the attribute between L141 and L142 to match lancedb/mongodb/etc.

@@ -0,0 +1,22 @@
[package]
name = "rig-valkey"
version = "0.1.0"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 rig-valkey = 0.1.0 (this file) and rig-core = 0.37.0 dep at L16 — other companion crates track the 0.x.6 series and align rig-core versions accordingly. Confirm release-train alignment.

Comment thread sdd/evaluation.md
const VALKEY_PORT: u16 = 6379;

async fn start_valkey() -> testcontainers::ContainerAsync<GenericImage> {
GenericImage::new("valkey/valkey-bundle", "8.0")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Code sample uses valkey/valkey-bundle here while the prose at L30 of this same file says valkey/valkey-extensions:8.0. sdd/constraints.md:108 agrees with extensions, tests/integrations/valkey.rs:45 uses bundle. Reconcile across all four locations.

println!("{results:?}");
Ok(())
}
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Add a 'Limitations' section noting ValkeyVectorStore cannot currently be used with dynamic_context/dynamic_tools due to the missing Deserialize impl on ValkeySearchFilter (substantive issue: crates/rig-valkey/src/filter.rs:14). Until the custom sanitizing impl lands, this is the only blocker for Tool/VectorStoreIndexDyn use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants