feat(valkey): implement Valkey vector store integration#1
feat(valkey): implement Valkey vector store integration#1Jonathan-Improving wants to merge 1 commit into
Conversation
d340032 to
b15c4be
Compare
rileydes-improving
left a comment
There was a problem hiding this comment.
Solid Valkey integration. Security boundary (sanitize_field_name + escape_tag_value + => guard) is well-considered. Seven HIGH items to address before merge:
- Score normalization assumes COSINE — non-COSINE indexes silently produce nonsense scores.
top_nvstop_n_idsdisagree on rows missing thedocument_field.req.threshold()is ignored; eleven other backends honor it.- Removing
DeserializefromValkeySearchFiltercuts the store off fromdynamic_context/dynamic_tools— ship a sanitizing custom impl or document. value_to_numericcollapses non-parseable input to"0"silently.- No test covers the
=>injection guard (the explicit security boundary). - No unit tests on
filter.rsat all (run without Docker).
Eleven MEDIUM items + sixteen LOW polish items in line comments.
| let score = field_map | ||
| .get(SCORE_ALIAS) | ||
| .and_then(|s| s.parse::<f64>().ok()) | ||
| .map(|d| 1.0 - d) |
There was a problem hiding this comment.
🔴 score = 1.0 - distance hardcodes COSINE; L2/IP indexes return nonsense or negative scores. Read DISTANCE_METRIC from FT.INFO at construction, store on config, branch on it.
| let rows = parse_search_rows(value, config)?; | ||
| let mut results = Vec::new(); | ||
|
|
||
| for row in rows { |
There was a problem hiding this comment.
🔴 top_n skips rows missing document_field here while top_n_ids (via parse_search_ids at L393) includes them — same query, divergent results. Either error on missing field or skip uniformly across both paths.
| ) -> Result<redis::Value, VectorStoreError> { | ||
| let prompt_embedding = self.model.embed_text(req.query()).await?; | ||
| let vec_bytes = embedding_to_bytes(&prompt_embedding.vec); | ||
| let samples = req.samples() as usize; |
There was a problem hiding this comment.
🔴 req.threshold() is never read inside execute_knn_search. Eleven other backends (mongodb, qdrant, lancedb, milvus, sqlite, postgres, surrealdb, scylladb, s3vectors, helixdb, vectorize) honor it. Add a post-filter in top_n/top_n_ids (around L235/L243) or document the omission.
| ) -> Result<redis::Value, VectorStoreError> { | ||
| let prompt_embedding = self.model.embed_text(req.query()).await?; | ||
| let vec_bytes = embedding_to_bytes(&prompt_embedding.vec); | ||
| let samples = req.samples() as usize; |
There was a problem hiding this comment.
🟡 samples=0 flows through unchecked and produces [KNN 0 ...] which Valkey rejects with an opaque error. Reject up front with VectorStoreError::BuilderError for a cleaner user experience.
| .arg(&self.config.embedding_field) | ||
| .arg(vec_bytes.as_slice()) | ||
| .arg(&self.config.document_field) | ||
| .arg(&json_doc) |
There was a problem hiding this comment.
🟡 json_doc is written to every HSET per chunk — a doc with N chunks results in N copies of the same JSON on disk. Store doc once at {prefix}:doc:{doc_id} with a foreign key per chunk, or document the storage tradeoff.
| .build(); | ||
|
|
||
| assert!(request.filter().is_some()); | ||
| } |
There was a problem hiding this comment.
🔵 Missing test: No test exercises ValkeyVectorStoreConfig with non-default field names. The whole point of the config is configurability — current tests only use default().
| pub use rig_surrealdb::*; | ||
| } | ||
|
|
||
| #[cfg(feature = "valkey")] |
There was a problem hiding this comment.
🔵 pub mod valkey is missing the #[cfg_attr(docsrs, doc(cfg(feature = "valkey")))] line that every other companion module on this facade carries. The docs.rs feature badge will be missing for valkey only. Add the attribute between L141 and L142 to match lancedb/mongodb/etc.
| @@ -0,0 +1,22 @@ | |||
| [package] | |||
| name = "rig-valkey" | |||
| version = "0.1.0" | |||
There was a problem hiding this comment.
🔵 rig-valkey = 0.1.0 (this file) and rig-core = 0.37.0 dep at L16 — other companion crates track the 0.x.6 series and align rig-core versions accordingly. Confirm release-train alignment.
| const VALKEY_PORT: u16 = 6379; | ||
|
|
||
| async fn start_valkey() -> testcontainers::ContainerAsync<GenericImage> { | ||
| GenericImage::new("valkey/valkey-bundle", "8.0") |
There was a problem hiding this comment.
🔵 Code sample uses valkey/valkey-bundle here while the prose at L30 of this same file says valkey/valkey-extensions:8.0. sdd/constraints.md:108 agrees with extensions, tests/integrations/valkey.rs:45 uses bundle. Reconcile across all four locations.
| println!("{results:?}"); | ||
| Ok(()) | ||
| } | ||
| ``` |
There was a problem hiding this comment.
🔵 Add a 'Limitations' section noting ValkeyVectorStore cannot currently be used with dynamic_context/dynamic_tools due to the missing Deserialize impl on ValkeySearchFilter (substantive issue: crates/rig-valkey/src/filter.rs:14). Until the custom sanitizing impl lands, this is the only blocker for Tool/VectorStoreIndexDyn use.
Summary
Implements
rig-valkey, a companion crate providing Valkey vector store integration for the rig framework (AEA-422).Changes
crates/rig-valkey/src/lib.rs—ValkeyVectorStore<M>implementingVectorStoreIndexandInsertDocumentstraits using FT.SEARCH with KNNcrates/rig-valkey/src/filter.rs—ValkeySearchFilterimplementingSearchFilterwith injection prevention (field sanitization, tag escaping,=>rejection)tests/integrations/valkey.rs— Integration tests against testcontainers Valkey instancesdd/— Design document, evaluation harness, and constraintsSecurity hardening
Deserializederive fromValkeySearchFilterto prevent bypass of sanitization?wildcard escaped in TAG values=>check handles whitespace variantsWhat was tested
cargo check -p rig-valkey✅cargo clippy --all-features --all-targets✅ (zero warnings)cargo check --tests --all-features✅Blocked
Deserializeimpl forValkeySearchFilterneeded to restoreVectorStoreIndexDynblanket impl compatibility