perf: speed up vector chat creation pipeline#199
Open
icancodefyi wants to merge 2 commits into
Open
Conversation
Optimize ingestion pipeline for vector (and vectorless) chat creation: - Add configurable scrape concurrency (CRAWL_SCRAPE_CONCURRENCY, default 10) - Add configurable embedding batch size (EMBEDDING_BATCH_SIZE, default 500) - Add configurable Qdrant batch size (QDRANT_BATCH_SIZE, default 500) - Parallelize embedding API calls within a page using Promise.all - Batch Qdrant upserts across pages instead of one per page - Use prisma.documentPage.createMany() instead of per-page creates - Split progress into SCRAPING (0-50%) and INDEXING (50-100%) stages - Bump CRAWL_MAX_CONCURRENCY_PER_DOMAIN default from 2 to 3 - Bump CRAWL_VECTORLESS_BATCH_SIZE default from 5 to 10 - Document new env vars in .env.example
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR tunes and refactors the ingestion pipeline used for chat knowledge creation by increasing crawl/worker concurrency and restructuring vector ingestion into separate scraping and indexing phases.
Changes:
- Increased default crawl concurrency and worker batch sizes via new/updated env-driven config.
- Refactored
processVectorinto a 2-phase pipeline (scrape/split → embed/index with batching). - Updated progress statuses to more granular phases (
SCRAPING,INDEXING) during ingestion.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| backend/utils/ragUtilities.js | Increases default per-domain crawl concurrency. |
| backend/chatWorker.js | Adds new ingestion tuning knobs; refactors vector ingestion into phased batching; updates progress statuses. |
| backend/.env.example | Documents new ingestion tuning environment variables. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+160
to
+169
| async function flushBatch() { | ||
| if (pendingPoints.length === 0) return; | ||
| const points = pendingPoints.splice(0); | ||
| const dbRecords = pendingDbRecords.splice(0); | ||
|
|
||
| await qdrant.upsert(collectionName, { wait: true, points }); | ||
| await prisma.documentPage.createMany({ data: dbRecords }).catch((err) => { | ||
| console.error("Failed to update indexed pages:", err.message); | ||
| }); | ||
| } |
Comment on lines
+200
to
+202
| if (pendingPoints.length >= qdrantBatchSize) { | ||
| await flushBatch(); | ||
| } |
Comment on lines
+165
to
+168
| await qdrant.upsert(collectionName, { wait: true, points }); | ||
| await prisma.documentPage.createMany({ data: dbRecords }).catch((err) => { | ||
| console.error("Failed to update indexed pages:", err.message); | ||
| }); |
Comment on lines
+174
to
+184
| if (chunks.length > 0) { | ||
| const batchPromises = []; | ||
| for (let i = 0; i < chunks.length; i += embeddingBatchSize) { | ||
| const chunkBatch = chunks.slice(i, i + embeddingBatchSize); | ||
| batchPromises.push(generateVectorEmbeddings(chunkBatch)); | ||
| } | ||
| const batchResults = await Promise.all(batchPromises); | ||
| const allEmbeddings = batchResults.flatMap((r) => | ||
| Array.isArray(r) ? r : [r], | ||
| ); | ||
|
|
| let scrapedCount = 0; | ||
|
|
||
| await Promise.all(allLinks.map((link) => limiter.schedule(async () => { | ||
| const scrapedPages = await Promise.all(allLinks.map((link) => limiter.schedule(async () => { |
Comment on lines
139
to
+144
| }))); | ||
|
|
||
| const validPages = scrapedPages.filter(Boolean); | ||
| if (validPages.length === 0) { | ||
| throw new Error("No pages were successfully scraped."); | ||
| } |
|
|
||
| await updateChatProgress(chatId, { | ||
| status: "PROCESSING", | ||
| status: "SCRAPING", |
Comment on lines
+150
to
+155
| await updateChatProgress(chatId, { | ||
| status: "INDEXING", | ||
| current: 0, | ||
| total: totalIndexPages, | ||
| progress: 50, | ||
| }); |
| try { | ||
| const { maxPagesPerJob, vectorlessBatchSize } = getWorkerConfig(); | ||
| await updateChatProgress(chatId, { status: "PROCESSING", progress: 0 }); | ||
| await updateChatProgress(chatId, { status: "SCRAPING", progress: 0 }); |
|
|
||
| await updateChatProgress(chatId, { | ||
| status: "PROCESSING", | ||
| status: "SCRAPING", |
avishek0769
approved these changes
Jun 12, 2026
Owner
|
@icancodefyi Resolve the merge conflicts |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #11
Summary
Optimizes the vector ingestion pipeline to reduce time from chat creation to READY.
Changes
Promise.allfans out embedding API calls instead of sequential for-loopprisma.documentPage.createMany()instead of per-page.create()CRAWL_SCRAPE_CONCURRENCY,EMBEDDING_BATCH_SIZE,QDRANT_BATCH_SIZE) with sensible defaultsSCRAPING(0–50%) andINDEXING(50–100%) for accurate user-facing feedbackCRAWL_VECTORLESS_BATCH_SIZEdefault bumped 5→10CRAWL_MAX_CONCURRENCY_PER_DOMAINdefault bumped 2→3