Automator — Topologically-Sharded Asynchronous Multi-Agent Concurrency Substrate with Non-Euclidean Quality Optimization
A distributed, mathematically rigorous code-mutation evaluation pipeline that combines hyperbolic Riemannian geometry, asynchronous belief propagation over random spanning-tree ensembles, and multi-level hypergraph partitioning into a single, production-grade concurrency substrate for autonomous software engineering agents.
- High-Level Architecture
- Mathematical Foundation
- Core Substrate Features
- Prerequisites & Getting Started
- Repository Architecture
- Internal Code Architecture Rules
- License
The substrate is structured as a six-layer concurrent pipeline. Each layer is independently evolvable and communicates exclusively through well-typed interfaces — there are no implicit shared mutable globals between layers.
| Layer | Module | Responsibility |
|---|---|---|
| Concurrency | src/concurrency/ |
Hierarchical intent-mode lock tree; deadlock-free multi-agent write serialization |
| Prediction | src/predictive/ |
Markovian prefetch daemon; Git-history-derived co-occurrence transition matrix |
| Semantic Graph Tracking | src/semantic/ |
Symbol table construction; ripple-effect propagation across call sites |
| Geometric Optimization | src/vfs/cow_overlay.py |
Poincaré ball projection; Ollivier-Ricci / Forman-Ricci curvature flow; chaotic BP relaxation |
| Core VFS Overlay | src/vfs/cow_overlay.py |
Copy-on-Write staged commit; .vfs_tmp atomic swap; friction-gated resolution |
| Orchestration | src/main.py |
CLI dispatch; hypergraph shard partitioning (Mt-KaHyPar FFI); out-of-process worker management |
The two primary execution engines share the geometric and VFS layers but differ in their top-level orchestration strategy:
┌─────────────────────────────────────────────────────────────┐
│ src/main.py CLI Entry Point │
│ │
│ ┌──────────────────────┐ ┌──────────────────────────┐ │
│ │ MasterAutomation- │ │ ScalableOrchestration- │ │
│ │ Controller (--mac) │ │ Engine (--soe) │ │
│ │ │ │ │ │
│ │ Single-agent global │ │ k-way hypergraph shard │ │
│ │ graph ensemble with │ │ partition; per-shard │ │
│ │ STE bandit weighting │ │ out-of-process Robin │ │
│ └──────────┬───────────┘ │ solver; shared-memory │ │
│ │ │ boundary register file │ │
│ │ └──────────┬───────────────┘ │
│ └──────────┬───────────────┘ │
│ ▼ │
│ GlobalGraphEnsembleSandbox │
│ AsynchronousBeliefPropagation │
│ ContinuousFlowOptimizer │
│ CoW VFS Overlay │
└─────────────────────────────────────────────────────────────┘
Raw code quality is measured along three independent observation axes extracted by ActionFunctional:
where
The scalar friction score is the weighted inner product
with default weights --alpha, --beta, --gamma).
To endow the quality manifold with a geometry that naturally penalises extreme values, CoordinateMetricTensor maps each observation vector through a finite-difference Jacobian pullback onto the open unit Poincaré ball
equipped with the Riemannian metric
The pullback metric tensor is evaluated by finite-difference column approximation of the Jacobian
The Riemannian volume element
The global dependency graph ContinuousFlowOptimizer evolves the edge weight function
For each directed edge
where
The AFRC provides an
The weight evolution follows a prescribed Ricci flow step:
where --flow-step-size, default
When an edge weight falls below the surgery threshold GlobalGraphEnsembleSandbox executes a surgery event:
- The edge
$(u, v)$ is severed from the main graph. - A virtual relay node
$r_{uv}$ is spliced in with Robin boundary self-loops — a self-edge with weight$w_{\mathrm{self}} = \theta \cdot w_{uv}$ where$\theta = 0.8$ is the Robin under-relaxation parameter. - The original endpoints
$u, v$ are connected to$r_{uv}$ with half-weight edges, distributing the lost coupling load.
This prevents the graph from degenerating into an infinite collection of isolated nodes while ensuring that structurally over-coupled module pairs are dynamically decoupled.
Once the metric flow has stabilised, the belief marginals
generate_random_spanning_tree samples each spanning tree from the uniform distribution over all spanning trees of the working multigraph using Wilson's algorithm. For each node
- Perform a uniform random walk on the adjacency list until the walk first contacts
$\mathcal{T}$ . - Any cycle that arises is automatically erased: the pointer
next_step[u]is overwritten on every revisit, so the final commitment phase traverses only the loop-erased tail. - Commit the loop-erased path as directed parent→child tree edges.
This guarantees that each tree in the ensemble is an independent sample from the uniform spanning tree measure, yielding an ensemble-averaged BP estimator with variance decaying at the optimal rate
rather than the correlation-inflated rate produced by biased DFS-based samplers.
Before launching the asynchronous chaotic relaxation threads, AsynchronousBeliefPropagation constructs the explicit absolute transition matrix
and
The system asserts
- Unique fixed point: the belief propagation equations have a unique globally consistent solution.
- Geometric convergence: $|\mathbf{g}^{(t)} - \mathbf{g}^| \leq K^t |\mathbf{g}^{(0)} - \mathbf{g}^|$.
-
Asynchronous safety: even under non-deterministic message ordering (chaotic relaxation), the iterates converge to the unique fixed point because
$K < 1$ bounds the worst-case per-step amplification across all possible scheduling interleavings.
If ValueError and falls back to the raw shard beliefs, preventing asynchronous message feedback from amplifying into a divergent cascade.
The asynchronous domain decomposition layer (ShardWorkerProcess, aliased ShardWorkerThread) runs each shard's Robin-adjusted Gaussian-elimination solver in an isolated child process communicating exclusively through:
- A
multiprocessing.shared_memory.SharedMemorysegment owned by the parentBackboneRouter(_BoundaryRegisterFile), carrying packedfloat64boundary belief slots. - A
multiprocessing.Queuecarrying picklable_ShardWorkerResultcompletion envelopes.
Within each child process, the local linear system
is solved directly by Gaussian elimination with partial pivoting, where
Ghost-cell linear extrapolation recovers the estimate of the neighbouring shard state
The parent process applies a three-tier escalation protocol on worker joins: cooperative join(timeout) → SIGTERM + grace period → SIGKILL.
LockTreeManager implements a five-mode hierarchical intent lock protocol modelled on the IBM System R lock hierarchy:
| Mode | Symbol | Semantics |
|---|---|---|
| Intention-Shared | IS | Intends to place S locks on descendants |
| Intention-Exclusive | IX | Intends to place X locks on descendants |
| Shared + Intention-Exclusive | SIX | Holds S on current node; intends X on descendants |
| Shared | S | Concurrent reads permitted; no writes |
| Exclusive | X | Full exclusive ownership; no concurrent access |
Compatibility matrix — a cell is ✓ if both requests may be held concurrently:
| IS | IX | SIX | S | X | |
|---|---|---|---|---|---|
| IS | ✓ | ✓ | ✓ | ✓ | ✗ |
| IX | ✓ | ✓ | ✗ | ✗ | ✗ |
| SIX | ✓ | ✗ | ✗ | ✗ | ✗ |
| S | ✓ | ✗ | ✗ | ✓ | ✗ |
| X | ✗ | ✗ | ✗ | ✗ | ✗ |
Deadlock prevention is enforced by requiring all agents to sort their acquisition sets lexicographically by path before requesting any lock. Since all agents acquire in the same total order, a circular wait cannot form.
ConcurrencyOrchestrator wraps LockTreeManager to provide a single execute_transaction(agent_id, write_files, read_files, update_fn) call that acquires X locks on write targets and S locks on read targets in sorted order, executes update_fn, then releases in reverse order.
AsynchronousPrefetchDaemon builds a first-order Markov transition matrix
When an agent declares intent to write file UniversalStubber — which uses compiled Tree-sitter grammars for AST-aware extraction where available, falling back to comment-stripping regex patterns otherwise — and stored in the PredictiveContextCache bounded at 120,000 tokens.
The pre-warming step reduces the critical-path latency of the first post-commit workspace analysis by loading adjacent file stubs into the in-process cache before the write lock is even acquired.
GlobalGraphEnsembleSandbox evaluates all variant candidates within a copy-on-write staging arena before committing any bytes to disk.
Each candidate variant is stored as raw UTF-8 bytes in an in-memory buffer indexed by (variant_id, file_path). The commit protocol is:
-
Friction gate: compute the friction score
$S$ for the winning variant. Commit is permitted only if$S < S_{\text{prev}}$ (monotone decrease) or if no prior friction exists. -
Atomic swap: write the staged bytes to a
.vfs_tmpsibling file, then callos.replace(POSIX-atomic on Linux/macOS; transactional on NTFS) to swap it into the target path. -
Rollback on failure: if
os.replaceraises, the.vfs_tmpartifact is deleted and the original target file is left untouched.
This guarantees that the repository on-disk state is always in a consistent committed state — no partial writes are ever visible to concurrent readers or the version control index.
| Component | Minimum Version | Notes |
|---|---|---|
| Python | 3.11+ | Required for tomllib, match statements, and multiprocessing.shared_memory stability |
| OS | Linux / macOS / Windows 10+ |
SharedMemory on Windows requires Python ≥ 3.13 for full stability; Linux is the primary target |
| RAM | 4 GB | Per-shard Gaussian-elimination solver is |
| CPU cores | 4+ | Each ShardWorkerProcess occupies one physical core; more cores = lower wall-clock dispatch time |
The UniversalStubber uses compiled Tree-sitter shared libraries for AST-aware stub extraction. Without them the system falls back to regex patterns (functional but lower fidelity).
# Install the Tree-sitter CLI
pip install tree-sitter
# Clone and compile the Python grammar
git clone https://github.com/tree-sitter/tree-sitter-python
python -c "
from tree_sitter import Language
Language.build_library('parsers/tree-sitter-python.so', ['tree-sitter-python'])
"
# Repeat for other grammars:
# https://github.com/tree-sitter/tree-sitter-typescript -> tree-sitter-typescript.so
# https://github.com/tree-sitter/tree-sitter-go -> tree-sitter-go.so
# https://github.com/nickel-lang/tree-sitter-rust -> tree-sitter-rust.soPlace all four .so files (or .dylib on macOS) in the parsers/ directory at the repository root.
The execute_mt_kahypar_partition FFI bridge calls mt_kahypar_partition_hypergraph from libmtkahypar.so to obtain balanced
# Clone and build Mt-KaHyPar
git clone --recursive https://github.com/kahypar/mt-kahypar
cd mt-kahypar
cmake -B build -DCMAKE_BUILD_TYPE=Release -DKAHYPAR_DOWNLOAD_BOOST=ON
cmake --build build --target mtkahypar -j$(nproc)
# Copy the shared library to the repository lib/ directory
cp build/lib/libmtkahypar.so /path/to/automator/lib/Alternatively, add the build output directory to LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=/path/to/mt-kahypar/build/lib:$LD_LIBRARY_PATHBefore first use, run the setup check utility to validate all optional dependencies and create required operational directories:
python src/utils/setup_check.py --workspace .Expected output on a fully configured system:
INFO Workspace: /path/to/automator
INFO ============================================================
INFO ── Tree-sitter parser libraries ─────────────────────────────
INFO [OK] tree-sitter-python.so (Python)
INFO [OK] tree-sitter-typescript.so (TypeScript)
INFO [OK] tree-sitter-go.so (Go)
INFO [OK] tree-sitter-rust.so (Rust)
INFO ── Mt-KaHyPar native partitioning library ───────────────────
INFO [OK] Resolved: /path/to/automator/lib/libmtkahypar.so
INFO ── Operational directories ──────────────────────────────────
INFO [OK] .repo_cache (already exists)
INFO [OK] context (already exists)
INFO ── Summary ──────────────────────────────────────────────────
INFO [PASS] Tree-sitter parser libraries
INFO [PASS] Mt-KaHyPar native partitioning library
INFO [PASS] Operational directories
Exit code 0 = fully operational. Exit code 1 = degraded (optional components absent). Exit code 2 = hard failure (directory creation error).
The primary entry point is src/main.py. It must be executed from a directory where the src/ package imports resolve (i.e. with src/ on PYTHONPATH, or from within src/ directly):
export PYTHONPATH=src
python src/main.py --help| Argument | Type | Description |
|---|---|---|
--workspace DIR |
str |
Repository root directory to operate on |
| Argument | Values | Default | Description |
|---|---|---|---|
--engine |
mac, soe |
soe |
mac = MasterAutomationController (single-agent global graph); soe = ScalableOrchestrationEngine (out-of-process sharded) |
--agent-id |
str |
agent-cli |
Correlation tag for structured log lines |
| Argument | Type | Default | Description |
|---|---|---|---|
--n-partitions N |
int |
4 |
Number of topological shard partitions (SOE only) |
--diversity-scale F |
float |
0.5 |
Search diversity scale for GlobalGraphEnsembleSandbox
|
--coupling-strength F |
float |
0.1 |
Pairwise coupling penalty forwarded to BP and shard construction |
--flow-steps N |
int |
10 |
Global Ricci flow iteration count |
--flow-step-size F |
float |
0.05 |
Forward-Euler step size |
--worker-timeout SECS |
float |
60.0 |
Per-worker join timeout before SIGTERM escalation (SOE only) |
| Argument | Type | Default | Description |
|---|---|---|---|
--alpha F |
float |
1.0 |
Cyclomatic complexity delta weight |
--beta F |
float |
10.0 |
Type-coverage gap weight |
--gamma F |
float |
5.0 |
Linter density weight |
| Argument | Description |
|---|---|
--variant-file TARGET:VARIANT |
Colon-separated pair; repeatable. Registers VARIANT file content as one candidate rewrite of TARGET. |
--variant-matrix JSON |
Path to a JSON file encoding {"target": {"variant_id": "content"}}. Merged with --variant-file entries. |
--read-files PATH … |
Additional paths to acquire shared read locks on. |
# Run the sharded engine on the workspace with default hyperparameters:
python src/main.py --workspace /repo --engine soe
# Evaluate two explicit candidate rewrites of a single file:
python src/main.py --workspace /repo --engine soe \
--variant-file src/foo.py:variants/foo_v1.py \
--variant-file src/foo.py:variants/foo_v2.py
# Load a full variant matrix from JSON with custom friction weights:
python src/main.py --workspace /repo --engine soe \
--variant-matrix matrix.json \
--alpha 2.0 --beta 8.0 --gamma 3.0
# Run the macro-state controller with aggressive partitioning:
python src/main.py --workspace /repo --engine mac \
--n-partitions 8 --coupling-strength 0.25 --flow-steps 20
# Serialize the full codebase for transfer to a reasoning model:
python bundle_transfer_context.py --workspace /repo \
--output context/transfer_payload.txtautomator/
├── src/
│ ├── main.py # CLI entry point; MAC + SOE engines; FFI bridge
│ ├── analysis/
│ │ └── workspace_analyzer.py # Annotation scanner; dead-variable detector; cycle checker
│ ├── concurrency/
│ │ ├── lock_manager.py # IS/IX/SIX/S/X intent lock tree
│ │ └── orchestrator.py # Transaction coordinator wrapping LockTreeManager
│ ├── predictive/
│ │ ├── cache.py # Token-bounded in-process stub cache
│ │ └── daemon.py # Markovian prefetch daemon; Git history ingestion
│ ├── semantic/
│ │ └── analyzer.py # Symbol table; ripple-effect propagation
│ ├── stubber/
│ │ └── universal.py # Tree-sitter + regex fallback stub extractor
│ ├── utils/
│ │ └── setup_check.py # Environment validation; native library probe
│ └── vfs/
│ ├── __init__.py
│ └── cow_overlay.py # All geometric, BP, and VFS machinery (primary module)
├── parsers/ # Compiled Tree-sitter .so files (user-supplied)
├── lib/ # Native shared libraries, e.g. libmtkahypar.so
├── context/ # Generated context serialization outputs
├── .repo_cache/ # Predictive cache persistence (auto-created)
├── bundle_repo.py # Token-aware context bundler
├── bundle_transfer_context.py # Full-fidelity XML transfer payload generator
└── README.md # This document
These invariants are enforced across the entire codebase and must be preserved in all future modifications:
-
No physics-themed terminology. All variable names, comments, and documentation must use graph-theoretic, distributed-systems, and linear-algebra vocabulary exclusively. Terms such as "quantum", "wavefunction", "entropy", "temperature", "force field", or "thermodynamic" are strictly prohibited.
-
Float64 precision throughout. All matrix operations in
cow_overlay.py— Laplacian construction, Gaussian elimination, power iteration — must preservefloat64precision. No implicit downcast tofloat32. -
Module-level picklability for IPC. Any function or class that may be transmitted to a child process via
multiprocessingunder thespawnstart method must be defined at module level (not as a closure or lambda). The_run_shard_workerfunction exemplifies this requirement. -
Zero cross-layer globals. Layers communicate only through typed return values and constructor injection. No implicit shared state between
AutomationCoordinator, the geometric layer, and the VFS layer. -
Strict
if __name__ == "__main__":guard. Every executable script must wrap its top-level runner behind this guard to prevent recursive re-entry under thespawnmultiprocessing start method on all platforms. -
Monotone friction gate. The CoW overlay may only commit a variant to disk if its friction score
$S$ is strictly less than the previously committed score$S_{\text{prev}}$ . This invariant must never be bypassed even in testing paths. -
Dobrushin pre-launch gate.
AsynchronousBeliefPropagation.run()must always compute and certify$K < 1.0$ before spawning any async relaxation iteration. Callers that catchValueErrorfrom this gate must fall back to raw shard beliefs and must not retry with a modified tolerance. -
UTF-8 with error handling on all file I/O. All file reads must specify
encoding="utf-8", errors="replace"orerrors="ignore"to prevent binary files from aborting execution loops.
This project is proprietary. All rights reserved.