Skip to content

[arXiv:2606.29742] Add boundary-quality eval for architectural decomposition plans #316

Description

@azalio

Source

Paper: https://arxiv.org/pdf/2606.29742 — "MicroAgent: Context-Augmented Multi-Agent Framework for Automatic Microservice Decomposition" (arXiv:2606.29742v1, 2026-06-29).

Source-specific idea used here: MicroAgent is not directly a MAP feature request to decompose monoliths into microservices. The transferable mechanism is a review/eval layer for architecture decomposition quality: split a broad refactor into focused domain/boundary decisions, use multi-granularity structural context, run dependency/common-code tools, and measure whether boundaries are cohesive, loosely coupled, and grounded in actual code.

Relevant source takeaways

  • The paper identifies three failure modes for naive LLM decomposition: overlong repo-level context causes hallucinated entities, shallow semantic cues miss concrete usage/call relationships, and agents overlook domain/microservice principles such as cohesion/coupling.
  • MicroAgent handles this by splitting decomposition into specialized stages: domain identification, clustering, merging, common-class assignment, and final review.
  • It customizes context by stage: application/domain summaries for domain discovery, class/code/dependency context for clustering and refinement, and focused relationship retrieval for ambiguous classes.
  • Its common-class tooling is especially relevant to MAP: it computes dependency-distribution signals, retrieves directly related classes, and propagates assignment along dependency graphs so shared code is assigned consistently instead of being guessed from names.
  • Evaluation is not just a qualitative review. The paper reports decomposition quality and common-class assignment metrics, plus ablations showing dependency context, specialized tools, and multi-agent staging all materially affect results.

Repo evidence

  • MAP already has generic decomposition structure. src/mapify_cli/templates_src/agents/task-decomposer.md.jinja:250-254 requires coverage_map ownership for acceptance criteria, and docs/ARCHITECTURE.md:2000-2007 documents blueprint.json plus validate_blueprint_contract as the planning-time gate for subtask metadata, validation criteria, and requirement coverage.
  • MAP already has stage-consumption and evidence surfaces. docs/ARCHITECTURE.md:642-646 describes review bundles and validate_prior_stage_consumption; docs/ARCHITECTURE.md:967-969 describes acceptance coverage and prior-stage consumption reports.
  • MAP already has a workflow dependency graph, but it models subtask execution rather than code-domain architecture. src/mapify_cli/dependency_graph.py:550-560 lints self-loops, cycles, thin edges, same-file wave overlap, full serialization, and redundant workflow edges. src/mapify_cli/dependency_graph.py:666-701 flags thin workflow dependencies using IO/file overlap; it does not score architectural boundary quality, cohesion, coupling, or shared/common-code assignment.
  • MAP's decomposer prompt has some symbol grounding and circular import checks. src/mapify_cli/templates_src/agents/task-decomposer.md.jinja:672-700 asks for no circular imports, precise affected_files, and grep-verified symbols. This helps avoid obvious hallucinated file/symbol references, but it does not evaluate whether a large architectural/refactor plan's proposed boundaries align with domain usage or dependency purity.
  • Related backlog exists for structural discovery, but it is provider/eval infrastructure, not boundary-quality review. [CodeGraph] Add optional local structural code-map provider for MAP research #310 adds an optional local structural code-map provider for MAP research. [CodeGraph] Add structural-discovery ROI eval for MAP research #311 adds structural-discovery ROI eval. Neither defines an architecture decomposition quality gate that consumes structural evidence to review planned boundaries.

Existing issue search

Commands/searches used:

  • gh issue list --state all --limit 120 --search "MicroAgent OR 2606.29742 OR microservice decomposition OR bounded context OR DDD OR common class OR service boundary" returned no matches.
  • gh issue list --state all --limit 120 --search "structural code-map OR dependency graph OR boundary quality OR coupling OR cohesion OR partition OR weighted similarity OR code graph" returned no direct boundary-quality issue.
  • gh issue list --state all --limit 80 --json number,title,state,labels,url was reviewed for nearby issues.

Close issues checked:

Why this is not already covered

Existing MAP gates prove that each subtask has an owner, validation criteria, and dependency order, and they can catch malformed workflow DAGs or ungrounded symbols. They do not answer the MicroAgent-style question: "Given this proposed decomposition of a large refactor, are the boundaries cohesive, low-coupled, grounded in actual code relationships, and is shared/common code assigned consistently?"

That gap matters for MAP because a plan can pass coverage_map, affected_files, and acyclic workflow checks while still slicing a refactor along bad architectural boundaries: shared models assigned to one subtask without consumers, domain-specific logic split by name similarity, or cross-boundary changes that force hidden coupling in later subtasks.

Problem

For large architectural/refactor goals, MAP's planning-time decomposition is structurally valid but not architecture-quality-scored. A decomposer can produce an executable DAG with precise files and validation criteria, yet still choose poor boundaries that create coupling, misplaced shared code, or brittle cross-subtask dependencies. The current safeguards are generic; they do not use code relationship signals to review domain/boundary quality.

Proposed slice

Add an optional boundary-quality eval/review for architecture-heavy plans.

Concrete first slice:

  • Detect when a blueprint is architecture/refactor-heavy, e.g. many runtime files, cross-module changes, shared model/interface edits, explicit migration/refactor wording, or concern_type/risk metadata indicating architecture work.
  • Build a boundary_quality_report from existing blueprint data plus available structural evidence. It can start with deterministic heuristics and later consume [CodeGraph] Add optional local structural code-map provider for MAP research #310's structural code-map when available.
  • Score/report at least:
    • grounded entities: every named class/function/module in AAG/validation criteria exists or is declared as newly created;
    • cross-boundary dependency pressure: files/symbols in different subtasks with import/call/reference links;
    • shared/common-code placement: shared interfaces/models/utilities have explicit owning subtask plus consumer validation;
    • cohesion/purity proxy: each subtask's files cluster by module/domain better than by arbitrary name similarity;
    • cycle/coupling risk: potential circular imports or bidirectional dependencies between planned boundaries.
  • Surface the report in review bundle/run health for Monitor/Predictor, but keep the first implementation advisory unless a clear hard error exists (e.g. hallucinated existing symbol or impossible file path already covered by existing gates).
  • Add fixture tests where a deliberately bad architectural plan passes generic blueprint validation but receives boundary-quality findings.

Acceptance criteria

  • A deterministic command or helper can produce boundary_quality_report for a fixture blueprint without network access or external model calls.
  • The report distinguishes hard grounding errors from advisory quality warnings.
  • Tests include at least one good split and one bad split involving shared/common code or cross-boundary dependencies.
  • Review bundle or run-health output includes the report summary when present, without replacing existing coverage_map or validate_blueprint_contract gates.
  • The implementation works without [CodeGraph] Add optional local structural code-map provider for MAP research #310's optional structural provider, using current affected_files, imports, and grep/AST-style local evidence; integration with [CodeGraph] Add optional local structural code-map provider for MAP research #310 can be a follow-up enhancement.
  • Docs explain that this is for architecture-heavy plans, not every small task.

Guardrails

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions