Scaffold for running MongoDB search benchmark jobs. Populate and benchmark logic are stubs until you implement them.
src/mongo_bench/— Python package (cli,config,db,jobs/…); CLI includeslist-seller-keysto print volume-tier sellers forSTATIC_BENCH_SELLER_KEYS.src/mongo_bench/jobs/populate/— one module per long-running populate job (merge pipelines,$merge/$out, etc.); register new jobs injobs/__init__.py(POPULATE_JOBS).src/mongo_bench/resources/bench_collections.json— canonical benchmark collection and index definitions (cite in test docs).src/mongo_bench/bench_schema.py— loads that JSON and applies indexes / Atlas Search indexes.docs/benchmark-run-summary-*.md— optional run notes;docs/bench-pipelines/— per-job aggregation pipeline JSON (regenerate withPYTHONPATH=src python3 docs/bench-pipelines/generate_pipeline_docs.py).docker/Dockerfile— application image (entrypointmongo-bench).docker-compose.yml— MongoDB plus example job services.
Create a virtual environment, install the package in editable mode, then invoke the module or the console script.
Put connection settings in a .env file in the current working directory (the project root, next to pyproject.toml) or next to the project root when using an editable install. The app loads it on first settings read via python-dotenv (existing shell variables are not overwritten). Optional: set MONGO_BENCH_DOTENV to an absolute path to a specific env file.
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
python -m mongo_bench list-jobs
python -m mongo_bench init
python -m mongo_bench populate merge_all
python -m mongo_bench bench search_basicAfter populate merge_all, regenerate static seller keys for benchmark_fixtures.py:
python -m mongo_bench list-seller-keysUses the same seller_keys_by_volume logic as the search benchmarks (default collection bench_index, tier size from BENCH_SELLER_KEYS_PER_TIER). See python -m mongo_bench list-seller-keys -h for --collection, --per-tier, and --skip-schema.
With the console script:
mongo-bench list-jobs
mongo-bench list-seller-keysCompose bakes your app into an image; if you change Python code or POPULATE_JOBS, rebuild before run or you will still see old behavior (for example Unknown populate job: 'merge_all'):
docker compose build job-populate-defaultJob images read os.environ (mongo_bench.config.load_settings). Compose passes variables from the environment: block, which uses substitution: MONGODB_URI: ${MONGODB_URI:-mongodb://mongo:27017} means “use host/shell MONGODB_URI if set, else default to the mongo service”.
Docker Compose automatically loads a file named .env in the same directory as docker-compose.yml (the project root) when resolving ${…}. Put your sandbox URI and DB name there (do not commit real secrets). Exporting variables in your shell overrides the same keys from .env.
-
External MongoDB / Atlas sandbox — use your
.env(or exports), then run without starting the bundled DB:docker compose run --no-deps --rm job-populate-default
--no-depsskips the localmongocontainer anddepends_onwait; the app still usesMONGODB_URI/MONGODB_DBfrom substitution. -
Local Mongo from Compose — leave
MONGODB_URIunset in.env(or unset in shell) so the defaultmongodb://mongo:27017applies, start Mongo, then run:docker compose up -d mongo docker compose run --rm job-populate-default
populate merge_all needs seller_inventory_raw and product_catalog_raw in the database given by MONGODB_DB.
Benchmark job (same env rules):
docker compose run --no-deps --rm job-bench-search-basicTo build and list jobs from the app image without touching data:
docker compose run --rm job-populate-default list-jobs(That overrides the service default command; the populate service is normally used with its declared command.)
To use a custom command with the same image:
docker compose run --rm job-populate-default bench search_aggregationThe authoritative definition of benchmark collection names, MongoDB indexes, and Atlas Search mappings is:
src/mongo_bench/resources/bench_collections.json
Reference that file (or commit hash + path) in test plans and write-ups so others can reproduce the same physical schema.
bench_wildcard— wildcard MongoDB index on selected paths; Atlas Search indexes only explicit text-oriented fields.bench_index— compound MongoDB index; Atlas Search on explicit text fields only.bench_attributes— attribute-pattern documents with a compound index onsellerKey+attributes+inventory.quantity; Atlas Search aligned withbench_indexfor text onproduct.bench_search— no collection-level indexes; Atlas Search indexbench_search_dynamic_alluses explicit field mappings (dynamic: false) for the merge_all document shape (seebench_collections.json).
The search_index, search_wildcard, search_attributes, and search_atlas benchmarks print timings to stdout and, by default, each writes a new CSV per run:
{job}_benchmark_{UTC timestamp}.csv in the working directory (e.g. search_index_benchmark_…csv, search_wildcard_benchmark_…csv, search_atlas_benchmark_…csv).
Set BENCH_CSV_DIR to put those files in a folder, BENCH_CSV_PATH for a fixed path, BENCH_CSV_UNIQUE=0
for a stable {job}_benchmark.csv, or BENCH_CSV=0 to skip CSV. Use BENCH_WILDCARD_COLLECTION (default bench_wildcard) for the wildcard job; BENCH_INDEX_COLLECTION for search_index; BENCH_ATTRIBUTES_COLLECTION for search_attributes; BENCH_SEARCH_COLLECTION and BENCH_SEARCH_ATLAS_INDEX for search_atlas. Name search strings come only from the text field in benchmark_fixtures.TEST_CASES (merged with each FILTER_TESTS row per run), not from environment variables.
search_all runs those four benchmarks in sequence (same env vars as when run alone); each job still writes its own CSV rows under its bench_job name.
Example analysis from benchmark CSVs: docs/benchmark-run-summary-2026-06-01.md (includes bench_attributes); earlier matrix: docs/benchmark-run-summary-2026-05-29.md.
Loader and apply logic: src/mongo_bench/bench_schema.py (ensure_benchmark_collections, bench_collection_names).
Commands:
mongo-bench init— apply the schema once (collections/indexes/search as configured).python -m mongo_bench.bench_schema— same asinit(useful to run the module directly). Optional:--skip-search-indexes. In Docker the image entrypoint ismongo-bench, so use
docker compose run --rm --entrypoint python job-populate-default -m mongo_bench.bench_schema
(add--no-depswhen using an externalMONGODB_URIfrom.env).mongo-bench populate …andmongo-bench bench …— ensure the same schema by default before running the job. Use--skip-schemato skip if you know the cluster is already prepared.
Atlas Search index creation uses PyMongo’s createSearchIndexes command and requires MongoDB 7.0+ on Atlas with Search enabled on the cluster. The bundled docker-compose.yml defaults MONGO_BENCH_SKIP_SEARCH_INDEXES to 1 when the variable is unset; set MONGO_BENCH_SKIP_SEARCH_INDEXES=0 in .env at the project root (or your shell) for Atlas runs. If indexes still do not appear, check stderr for mongo-bench: lines: a code 59 / “no such command” message means the host is not Atlas Search–capable (e.g. self-managed or wrong URI tier).
See .env.example. MONGODB_URI and MONGODB_DB are read after optional .env loading in mongo_bench.config (see Local usage). MONGO_BENCH_SKIP_SEARCH_INDEXES skips Atlas Search index steps when using a non-Atlas deployment.
For Docker job runs, place the same variables in .env next to docker-compose.yml so Compose substitutes them into the container environment (see Docker usage above). If your .env lives elsewhere, run from the project directory with docker compose --env-file /path/to/.env run … so interpolation picks up those values.