Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -866,7 +866,9 @@ if (vectorizedPayload) {

#### `vectorizedPayload.search(params)`

Perform vector search programmatically without making an HTTP request. Parameters and result shape are identical to [POST `/api/vector-search`](#post-apivector-search). If the pool has a [`rerank`](#reranking-optional) config, this call goes through the same rerank pipeline as the REST endpoint.
Perform vector search programmatically without making an HTTP request. The result shape is identical to [POST `/api/vector-search`](#post-apivector-search). If the pool has a [`rerank`](#reranking-optional) config, this call goes through the same rerank pipeline as the REST endpoint.

**Params:** `{ knowledgePool: string; query: string; where?: Where; limit?: number; populateEmbedding?: boolean }` (`limit` defaults to `10`, `populateEmbedding` to `false`). Set `populateEmbedding: true` to include each result's raw `embedding` vector — handy for feeding straight into [`searchByEmbedding()`](#vectorizedpayloadsearchbyembeddingparams). This option is **Local API only**: the REST endpoint never returns vectors, so it is the one parameter not shared with [POST `/api/vector-search`](#post-apivector-search).

**Returns:** `Promise<Array<VectorSearchResult>>` — the array that the REST endpoint wraps in `{ results }`.

Expand All @@ -891,7 +893,7 @@ Unlike [`search()`](#vectorizedpayloadsearchparams), this method does **not** ru

There is no REST equivalent; `searchByEmbedding` is Local API only.

**Params:** `{ knowledgePool: string; embedding: number[]; where?: Where; limit?: number }` (`limit` defaults to `10`).
**Params:** `{ knowledgePool: string; embedding: number[]; where?: Where; limit?: number; populateEmbedding?: boolean }` (`limit` defaults to `10`, `populateEmbedding` to `false`). As with [`search()`](#vectorizedpayloadsearchparams), `populateEmbedding: true` includes each result's raw `embedding` vector and is Local API only.

**Returns:** `Promise<Array<VectorSearchResult>>` — the same array shape as `search()`.

Expand All @@ -916,7 +918,7 @@ if (seed?.embedding) {

#### `vectorizedPayload.findByIds(params)`

Fetch stored embedding records by primary key. The `id` of each record is whatever [`search()`](#vectorizedpayloadsearchparams) returns as `result.id`, so a search result round-trips directly. Pass `populateEmbedding: true` to also get the raw embedding vector back (the normal search/query API never returns it) — the building block for "more like this" flows. It defaults to `false`, so by default you get the record's text and metadata without the heavy vector.
Fetch stored embedding records by primary key. The `id` of each record is whatever [`search()`](#vectorizedpayloadsearchparams) returns as `result.id`, so a search result round-trips directly. Pass `populateEmbedding: true` to also get the raw embedding vector back (it is omitted by default) — the building block for "more like this" flows. It defaults to `false`, so by default you get the record's text and metadata without the heavy vector.

**Params:** `{ knowledgePool: string; ids: string[]; populateEmbedding?: boolean }` (`populateEmbedding` defaults to `false`).

Expand Down
14 changes: 9 additions & 5 deletions adapters/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ For each document write in a collection registered to a knowledge pool:

1. A consumer calls either `POST /api/vector-search` or `getVectorizedPayload(payload).search({ knowledgePool, query, where, limit })`.
2. The plugin calls the configured `queryFn(query)` to embed the query string.
3. The plugin calls **`adapter.search(payload, queryEmbedding, poolName, limit, where)`**.
3. The plugin calls **`adapter.search(payload, queryEmbedding, poolName, limit, where, populateEmbedding)`**.
4. The plugin returns the array of `VectorSearchResult` to the caller, untransformed.

**Your adapter is responsible for translating Payload-style `where` clauses** into your store's filter language. See [Common pitfalls](#common-pitfalls).
Expand Down Expand Up @@ -151,6 +151,7 @@ export type DbAdapter = {
poolName: KnowledgePoolName,
limit?: number,
where?: Where,
populateEmbedding?: boolean,
) => Promise<Array<VectorSearchResult>>

findByIds: (
Expand All @@ -170,7 +171,7 @@ export type DbAdapter = {
| `storeChunk` | Per chunk during real-time ingest **and** per output during bulk completion. | Persist the embedding plus all fields in `StoreChunkData` (including `extensionFields`) so they are queryable from `search`. Idempotency is **not** guaranteed by the plugin — you may receive duplicate calls on retry. |
| `deleteChunks` | After a source document is deleted. | Remove every chunk where `sourceCollection === ... && docId === ...`. Must be safe to call when no chunks exist (no-op, no throw). |
| `hasEmbeddingVersion` | During bulk-embed planning, per candidate document. | Return `true` iff at least one chunk exists with the matching `(sourceCollection, docId, embeddingVersion)` triple. Must filter on **all three** — older `0.7.0` adapters that ignored `embeddingVersion` caused stale embeddings on model bumps. |
| `search` | Per `/vector-search` request and per `getVectorizedPayload().search()` call. | Translate `where` (Payload-style) into your store's filter language, perform a vector search using `queryEmbedding`, and return up to `limit` results sorted by descending relevance. |
| `search` | Per `/vector-search` request and per `getVectorizedPayload().search()` call. | Translate `where` (Payload-style) into your store's filter language, perform a vector search using `queryEmbedding`, and return up to `limit` results sorted by descending relevance. The raw `embedding` vector is **only included when `populateEmbedding` is `true`** (default `false`) — omit it otherwise so callers that only need text/metadata don't pay for it. Where possible, skip reading the vector at the source (pg: don't select the column; MongoDB: `{ projection: { embedding: 0 } }`); CF returns it only when you pass `returnValues: true`, so request it just for the populated case. |
| `findByIds` | Per `getVectorizedPayload().findByIds()` call. | Fetch stored embedding records by primary key. **Return an object keyed by the ids you were given:** every requested id must be present as a key, with a found record as the value and `undefined` for any id that didn't resolve. The raw `embedding` vector is **only included when `populateEmbedding` is `true`** (default `false`) — omit it otherwise so callers that only need text/metadata don't pay for it. Where possible, skip reading the vector at the source (pg: don't select the column; MongoDB: `{ projection: { embedding: 0 } }`); CF's `getByIds` always returns values, so omit them post-fetch. Look up by the same `id` your `search` returns as `result.id`. Unknown **and** malformed ids must map to `undefined` — never throw for a bad id. Validate the id shape against your key type before querying so a malformed id can't error the whole batch (MongoDB drops non-24-hex ids; pg drops ids that don't match the PK column type — numeric for integer PKs, uuid-shaped for `uuid` PKs — before the `IN` query; CF's ids are arbitrary strings, so an unknown one is simply absent from `getByIds`). Empty `ids` returns `{}` without a backend call. |

### Error contract
Expand Down Expand Up @@ -376,6 +377,9 @@ export interface VectorSearchResult {
chunkText: string
/** Embedding model/version string. */
embeddingVersion: string
/** The raw embedding vector — only present when `search` is called with
* `populateEmbedding: true` (default `false`). */
embedding?: number[]
/** Any extensionFields persisted via storeChunk must round-trip here. */
[key: string]: any
}
Expand All @@ -393,8 +397,8 @@ export interface EmbeddingRecord {
chunkText: string
/** Embedding model/version string. */
embeddingVersion: string
/** The raw embedding vector — never returned by `search`, and only present
* when `findByIds` is called with `populateEmbedding: true`. */
/** The raw embedding vector — only present when `findByIds` is called with
* `populateEmbedding: true`. */
embedding?: number[]
/** Any extensionFields persisted via storeChunk round-trip here. */
[key: string]: any
Expand All @@ -409,7 +413,7 @@ export interface EmbeddingRecord {
| `chunkText`, `embeddingVersion` | yes | Same. |
| `extensionFields.*` | optional | Whatever the user passed in `extensionFields` must be queryable via `where`. |

> `EmbeddingRecord` (returned by `findByIds`) is `VectorSearchResult` without `score` and with an optional raw `embedding?: number[]`present only when `findByIds` is called with `populateEmbedding: true`.
> `EmbeddingRecord` (returned by `findByIds`) is `VectorSearchResult` without `score`. Both carry an optional raw `embedding?: number[]`, present only when the call requested it via `populateEmbedding: true`.

## Testing your adapter

Expand Down
63 changes: 62 additions & 1 deletion adapters/cf/dev/specs/adapter.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ function createMockCloudflareBinding() {

return {
query: vi.fn(async (queryVector: number[], options: any) => {
const { topK = 10, returnMetadata = false, where } = options
const { topK = 10, returnMetadata = false, returnValues = false, where } = options

const results = Array.from(storage.values())
.filter((item) => {
Expand All @@ -36,6 +36,7 @@ function createMockCloudflareBinding() {
return {
id: item.id,
score,
values: returnValues ? item.values : undefined,
metadata: returnMetadata ? item.metadata : undefined,
}
})
Expand Down Expand Up @@ -439,6 +440,66 @@ describe('createCloudflareVectorizeIntegration', () => {
})
})

describe('search', () => {
test('includes the embedding vector on each result when populateEmbedding is true', async () => {
const mockBinding = createMockCloudflareBinding()
const { adapter } = createCloudflareVectorizeIntegration({
config: { default: { dims: DIMS } },
binding: mockBinding as any,
})
const mockPayload = createMockPayload(mockBinding)
const embedding = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]

await adapter.storeChunk(mockPayload, 'default', {
sourceCollection: 'posts',
docId: 'doc-1',
chunkIndex: 0,
chunkText: 'find me',
embeddingVersion: 'v1',
embedding,
extensionFields: { category: 'science' },
})

const results = await adapter.search(mockPayload, embedding, 'default', 10, undefined, true)
expect(results).toHaveLength(1)
expect(results[0].embedding).toEqual(embedding)
expect(results[0].chunkText).toBe('find me')
expect((results[0] as any).category).toBe('science')
expect(mockBinding.query).toHaveBeenCalledWith(
embedding,
expect.objectContaining({ returnValues: true }),
)
})

test('omits the embedding vector by default', async () => {
const mockBinding = createMockCloudflareBinding()
const { adapter } = createCloudflareVectorizeIntegration({
config: { default: { dims: DIMS } },
binding: mockBinding as any,
})
const mockPayload = createMockPayload(mockBinding)
const embedding = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]

await adapter.storeChunk(mockPayload, 'default', {
sourceCollection: 'posts',
docId: 'doc-1',
chunkIndex: 0,
chunkText: 'find me',
embeddingVersion: 'v1',
embedding,
extensionFields: {},
})

const results = await adapter.search(mockPayload, embedding, 'default')
expect(results).toHaveLength(1)
expect(results[0].embedding).toBeUndefined()
expect(mockBinding.query).not.toHaveBeenCalledWith(
embedding,
expect.objectContaining({ returnValues: true }),
)
})
})

describe('findByIds', () => {
test('returns full EmbeddingRecord including embedding values when populateEmbedding is true', async () => {
const mockBinding = createMockCloudflareBinding()
Expand Down
3 changes: 3 additions & 0 deletions adapters/cf/src/search.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,15 @@ export default async (
poolName: KnowledgePoolName,
limit: number = 10,
where?: Where,
populateEmbedding = false,
): Promise<Array<VectorSearchResult>> => {
const vectorizeBinding = getVectorizeBinding(payload)

try {
const queryOptions: Record<string, any> = {
topK: limit,
returnMetadata: 'all' as const,
...(populateEmbedding ? { returnValues: true } : {}),
}

let postFilter: Where | null = null
Expand Down Expand Up @@ -48,6 +50,7 @@ export default async (
chunkIndex: typeof metadata.chunkIndex === 'number' ? metadata.chunkIndex : parseInt(String(metadata.chunkIndex || '0'), 10),
chunkText: String(metadata.chunkText || ''),
embeddingVersion: String(metadata.embeddingVersion || ''),
...(populateEmbedding ? { embedding: Array.from(match.values ?? []) } : {}),
...extensionFields,
}
})
Expand Down
28 changes: 28 additions & 0 deletions adapters/mongodb/dev/specs/compliance.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,34 @@ describe('Mongo Adapter Compliance Tests', () => {
const results = await adapter.search(payload, target, 'default', 1)
expect(results.length).toBeLessThanOrEqual(1)
})

// Atlas vector search is eventually consistent: a freshly-seeded doc may not be
// queryable immediately, so poll until the index surfaces it before asserting.
const searchUntilNonEmpty = async (populateEmbedding: boolean) => {
for (let attempt = 0; attempt < 30; attempt++) {
const results = await adapter.search(payload, target, 'default', 10, undefined, populateEmbedding)
if (results.length > 0) return results
await new Promise((resolve) => setTimeout(resolve, 500))
}
return adapter.search(payload, target, 'default', 10, undefined, populateEmbedding)
}

test('includes the embedding vector on each result when populateEmbedding is true', async () => {
const results = await searchUntilNonEmpty(true)
expect(results.length).toBeGreaterThan(0)
for (const r of results) {
expect(Array.isArray(r.embedding)).toBe(true)
expect(r.embedding?.length).toBe(DIMS)
}
})

test('omits the embedding vector by default', async () => {
const results = await searchUntilNonEmpty(false)
expect(results.length).toBeGreaterThan(0)
for (const r of results) {
expect(r.embedding).toBeUndefined()
}
})
})

describe('deleteChunks()', () => {
Expand Down
4 changes: 2 additions & 2 deletions adapters/mongodb/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,8 @@ export const createMongoVectorIntegration = (
return count > 0
},

search: (payload, queryEmbedding, poolName, limit, where) =>
searchImpl(getCtx(), payload, queryEmbedding, poolName, limit, where),
search: (payload, queryEmbedding, poolName, limit, where, populateEmbedding) =>
searchImpl(getCtx(), payload, queryEmbedding, poolName, limit, where, populateEmbedding),

findByIds: (payload, poolName, ids, populateEmbedding) =>
findByIdsImpl(getCtx(), payload, poolName, ids, populateEmbedding),
Expand Down
13 changes: 10 additions & 3 deletions adapters/mongodb/src/search.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ export async function searchImpl(
poolName: string,
limit: number = 10,
where?: Where,
populateEmbedding = false,
): Promise<VectorSearchResult[]> {
const pool = ctx.pools[poolName]
if (!pool) {
Expand Down Expand Up @@ -64,7 +65,7 @@ export async function searchImpl(
const pipeline: Record<string, unknown>[] = [
{ $vectorSearch: vectorSearchStage },
{ $addFields: { score: { $meta: 'vectorSearchScore' } } },
{ $project: { embedding: 0 } },
...(populateEmbedding ? [] : [{ $project: { embedding: 0 } }]),
]

const collection = client.db(ctx.dbName).collection(pool.collectionName)
Expand All @@ -74,10 +75,13 @@ export async function searchImpl(
? rawDocs.filter((d) => evaluatePostFilter(d as Record<string, unknown>, postFilter!))
: rawDocs

return filtered.map((d) => mapDocToResult(d as Record<string, unknown>))
return filtered.map((d) => mapDocToResult(d as Record<string, unknown>, populateEmbedding))
}

function mapDocToResult(doc: Record<string, unknown>): VectorSearchResult {
function mapDocToResult(
doc: Record<string, unknown>,
populateEmbedding: boolean,
): VectorSearchResult {
if (typeof doc.score !== 'number') {
throw new Error(
`[@payloadcms-vectorize/mongodb] Search result is missing numeric "score" field; ensure the pipeline adds { score: { $meta: 'vectorSearchScore' } }`,
Expand All @@ -95,6 +99,9 @@ function mapDocToResult(doc: Record<string, unknown>): VectorSearchResult {
typeof doc.chunkIndex === 'number' ? doc.chunkIndex : Number(doc.chunkIndex ?? 0),
chunkText: String(doc.chunkText ?? ''),
embeddingVersion: String(doc.embeddingVersion ?? ''),
...(populateEmbedding
? { embedding: Array.isArray(doc.embedding) ? (doc.embedding as number[]) : [] }
: {}),
...extensionFields,
} as VectorSearchResult
}
19 changes: 19 additions & 0 deletions adapters/pg/dev/specs/compliance.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,25 @@ describe('Postgres Adapter Compliance Tests', () => {

expect(results.length).toBeLessThanOrEqual(1)
})

test('includes the embedding vector on each result when populateEmbedding is true', async () => {
const results = await adapter.search(payload, targetEmbedding, 'default', 10, undefined, true)

expect(results.length).toBeGreaterThan(0)
for (const result of results) {
expect(Array.isArray(result.embedding)).toBe(true)
expect(result.embedding?.length).toBe(DIMS)
}
})

test('omits the embedding vector by default', async () => {
const results = await adapter.search(payload, targetEmbedding, 'default', 10)

expect(results.length).toBeGreaterThan(0)
for (const result of results) {
expect(result.embedding).toBeUndefined()
}
})
})

describe('deleteChunks()', () => {
Expand Down
14 changes: 1 addition & 13 deletions adapters/pg/src/findByIds.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ import { BasePayload, SanitizedCollectionConfig } from 'payload'
import { KnowledgePoolName, EmbeddingRecord } from 'payloadcms-vectorize'
import toSnakeCase from 'to-snake-case'
import { getEmbeddingsTable } from './drizzle.js'
import { parseEmbedding } from './parseEmbedding.js'

export default async (
payload: BasePayload,
Expand Down Expand Up @@ -119,16 +120,3 @@ function mapRowsToRecords(
return record
})
}

function parseEmbedding(value: unknown): number[] {
if (Array.isArray(value)) return value as number[]
if (typeof value === 'string') {
return value
.replace(/^\[/, '')
.replace(/\]$/, '')
.split(',')
.filter((s) => s.length > 0)
.map((s) => Number(s))
}
return []
}
12 changes: 12 additions & 0 deletions adapters/pg/src/parseEmbedding.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
export function parseEmbedding(value: unknown): number[] {
if (Array.isArray(value)) return value as number[]
if (typeof value === 'string') {
return value
.replace(/^\[/, '')
.replace(/\]$/, '')
.split(',')
.filter((s) => s.length > 0)
.map((s) => Number(s))
}
return []
}
Loading
Loading