Change the default batch size to 16#270
Merged
Merged
Conversation
… icon to documentation. (#253)
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates Flock’s model batching behavior by introducing a shared DEFAULT_BATCH_SIZE (set to 16) and applying it as the runtime fallback when no batch size is provided via user config or DB model args. It also hardens metrics collection for concurrent updates and expands/adjusts unit tests to validate batch splitting behavior and thread-safety.
Changes:
- Introduce
DEFAULT_BATCH_SIZE = 16and use it as the default fallback for model initialization (replacing the previous hard-coded default). - Update scalar/aggregate unit tests to validate that large inputs are split across multiple provider requests using the default batch size.
- Add mutex-based synchronization to metrics manager/storage to support concurrent updates and safe aggregation/merging.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/unit/model_manager/model_manager_test.cpp | Adds default-batch-size assertions for model initialization. |
| test/unit/functions/scalar/metrics_test.cpp | Adds a multi-threaded test to validate thread-safe metrics updates/merging. |
| test/unit/functions/scalar/llm_filter.cpp | Adds coverage that default batch size splits large scalar inputs into multiple completion batches. |
| test/unit/functions/scalar/llm_embedding.cpp | Adds coverage that default batch size splits large embedding inputs into multiple embedding requests. |
| test/unit/functions/scalar/llm_complete.cpp | Updates large-input test to expect multiple completion batches (default batch size). |
| test/unit/functions/aggregate/llm_rerank.cpp | Adds coverage that default batch size affects sliding-window rerank batching; switches to single-threaded connections. |
| test/unit/functions/aggregate/llm_reduce.cpp | Adds coverage that default batch size triggers multiple reduction passes; switches to single-threaded connections. |
| test/unit/functions/aggregate/llm_reduce_json.cpp | Switches aggregate tests to single-threaded connections for deterministic gMock expectations. |
| test/unit/functions/aggregate/llm_last.cpp | Switches aggregate tests to single-threaded connections for deterministic gMock expectations. |
| test/unit/functions/aggregate/llm_first.cpp | Adds default-batch-size splitting test; switches to single-threaded connections. |
| test/unit/functions/aggregate/llm_aggregate_function_test_base.hpp | Introduces GetConnection() that forces SET threads=1 for aggregate unit tests. |
| src/model_manager/model.cpp | Uses DEFAULT_BATCH_SIZE as the fallback batch size when unset in JSON and DB model_args. |
| src/metrics/metrics.cpp | Adds locking and “unlocked” access paths for safe aggregate metrics merging. |
| src/include/flock/model_manager/repository.hpp | Defines DEFAULT_BATCH_SIZE = 16 as a shared constant. |
| src/include/flock/metrics/manager.hpp | Adds locking around DB→manager registry creation/access. |
| src/include/flock/metrics/base_manager.hpp | Adds internal mutex and locked/unlocked APIs to make metrics updates thread-safe. |
| src/functions/aggregate/llm_rerank/implementation.cpp | Simplifies batch-size selection by always clamping to num_tuples. |
| src/functions/aggregate/llm_first_or_last/implementation.cpp | Fixes batch tuple rebuilding loop and filters out flock_row_id from returned results. |
Comments suppressed due to low confidence (1)
src/include/flock/metrics/manager.hpp:35
MetricsManager::GetForDatabasenow takes a globalregistry_mutexon every call. Since hot-path methods likeUpdateTokens,IncrementApiCalls, etc. callGetForDatabaseeach time, this introduces a process-wide lock that can become a contention bottleneck under concurrent query execution. Prefer a reader/writer lock (shared reads, exclusive on first insert), or cache theMetricsManager*in thread-local context duringStartInvocationto avoid repeated global lookups.
static MetricsManager& GetForDatabase(duckdb::DatabaseInstance* db) {
if (db == nullptr) {
throw std::runtime_error("Database instance is null");
}
static std::mutex registry_mutex;
static std::unordered_map<duckdb::DatabaseInstance*, std::unique_ptr<MetricsManager>> db_managers;
std::lock_guard<std::mutex> lock(registry_mutex);
auto it = db_managers.find(db);
if (it == db_managers.end()) {
auto manager = std::make_unique<MetricsManager>();
auto* manager_ptr = manager.get();
db_managers[db] = std::move(manager);
return *manager_ptr;
}
return *it->second;
}
Comment on lines
55
to
+67
| TEST_F(ModelManagerTest, ModelInitializationMinimal) { | ||
| // Create a model config with only model_name (other details should be fetched from DB) | ||
| json model_config = { | ||
| {"model_name", "gpt-4o-test"}}; | ||
| // Model initialization should fetch remaining details from database | ||
| EXPECT_NO_THROW({ | ||
| Model model(model_config); | ||
| ModelDetails details = model.GetModelDetails(); | ||
| EXPECT_EQ(details.model_name, "gpt-4o-test"); | ||
| EXPECT_EQ(details.model, "gpt-4o"); | ||
| EXPECT_EQ(details.provider_name, "openai"); | ||
| EXPECT_EQ(details.batch_size, 32); | ||
| }); |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
queryproc
approved these changes
Jun 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.