Skip to content

Change the default batch size to 16#270

Merged
anasdorbani merged 16 commits into
mainfrom
dev
Jun 4, 2026
Merged

Change the default batch size to 16#270
anasdorbani merged 16 commits into
mainfrom
dev

Conversation

@anasdorbani

Copy link
Copy Markdown
Member

No description provided.

Copilot AI review requested due to automatic review settings June 4, 2026 00:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Flock’s model batching behavior by introducing a shared DEFAULT_BATCH_SIZE (set to 16) and applying it as the runtime fallback when no batch size is provided via user config or DB model args. It also hardens metrics collection for concurrent updates and expands/adjusts unit tests to validate batch splitting behavior and thread-safety.

Changes:

  • Introduce DEFAULT_BATCH_SIZE = 16 and use it as the default fallback for model initialization (replacing the previous hard-coded default).
  • Update scalar/aggregate unit tests to validate that large inputs are split across multiple provider requests using the default batch size.
  • Add mutex-based synchronization to metrics manager/storage to support concurrent updates and safe aggregation/merging.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/unit/model_manager/model_manager_test.cpp Adds default-batch-size assertions for model initialization.
test/unit/functions/scalar/metrics_test.cpp Adds a multi-threaded test to validate thread-safe metrics updates/merging.
test/unit/functions/scalar/llm_filter.cpp Adds coverage that default batch size splits large scalar inputs into multiple completion batches.
test/unit/functions/scalar/llm_embedding.cpp Adds coverage that default batch size splits large embedding inputs into multiple embedding requests.
test/unit/functions/scalar/llm_complete.cpp Updates large-input test to expect multiple completion batches (default batch size).
test/unit/functions/aggregate/llm_rerank.cpp Adds coverage that default batch size affects sliding-window rerank batching; switches to single-threaded connections.
test/unit/functions/aggregate/llm_reduce.cpp Adds coverage that default batch size triggers multiple reduction passes; switches to single-threaded connections.
test/unit/functions/aggregate/llm_reduce_json.cpp Switches aggregate tests to single-threaded connections for deterministic gMock expectations.
test/unit/functions/aggregate/llm_last.cpp Switches aggregate tests to single-threaded connections for deterministic gMock expectations.
test/unit/functions/aggregate/llm_first.cpp Adds default-batch-size splitting test; switches to single-threaded connections.
test/unit/functions/aggregate/llm_aggregate_function_test_base.hpp Introduces GetConnection() that forces SET threads=1 for aggregate unit tests.
src/model_manager/model.cpp Uses DEFAULT_BATCH_SIZE as the fallback batch size when unset in JSON and DB model_args.
src/metrics/metrics.cpp Adds locking and “unlocked” access paths for safe aggregate metrics merging.
src/include/flock/model_manager/repository.hpp Defines DEFAULT_BATCH_SIZE = 16 as a shared constant.
src/include/flock/metrics/manager.hpp Adds locking around DB→manager registry creation/access.
src/include/flock/metrics/base_manager.hpp Adds internal mutex and locked/unlocked APIs to make metrics updates thread-safe.
src/functions/aggregate/llm_rerank/implementation.cpp Simplifies batch-size selection by always clamping to num_tuples.
src/functions/aggregate/llm_first_or_last/implementation.cpp Fixes batch tuple rebuilding loop and filters out flock_row_id from returned results.
Comments suppressed due to low confidence (1)

src/include/flock/metrics/manager.hpp:35

  • MetricsManager::GetForDatabase now takes a global registry_mutex on every call. Since hot-path methods like UpdateTokens, IncrementApiCalls, etc. call GetForDatabase each time, this introduces a process-wide lock that can become a contention bottleneck under concurrent query execution. Prefer a reader/writer lock (shared reads, exclusive on first insert), or cache the MetricsManager* in thread-local context during StartInvocation to avoid repeated global lookups.
    static MetricsManager& GetForDatabase(duckdb::DatabaseInstance* db) {
        if (db == nullptr) {
            throw std::runtime_error("Database instance is null");
        }

        static std::mutex registry_mutex;
        static std::unordered_map<duckdb::DatabaseInstance*, std::unique_ptr<MetricsManager>> db_managers;

        std::lock_guard<std::mutex> lock(registry_mutex);
        auto it = db_managers.find(db);
        if (it == db_managers.end()) {
            auto manager = std::make_unique<MetricsManager>();
            auto* manager_ptr = manager.get();
            db_managers[db] = std::move(manager);
            return *manager_ptr;
        }
        return *it->second;
    }

Comment on lines 55 to +67
TEST_F(ModelManagerTest, ModelInitializationMinimal) {
// Create a model config with only model_name (other details should be fetched from DB)
json model_config = {
{"model_name", "gpt-4o-test"}};
// Model initialization should fetch remaining details from database
EXPECT_NO_THROW({
Model model(model_config);
ModelDetails details = model.GetModelDetails();
EXPECT_EQ(details.model_name, "gpt-4o-test");
EXPECT_EQ(details.model, "gpt-4o");
EXPECT_EQ(details.provider_name, "openai");
EXPECT_EQ(details.batch_size, 32);
});
Comment thread test/unit/functions/aggregate/llm_rerank.cpp
anasdorbani and others added 2 commits June 3, 2026 20:42
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@anasdorbani anasdorbani merged commit e0f3321 into main Jun 4, 2026
20 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants