Skip to content

feat(common): add config module with TOML schema and layered loading#1201

Merged
geoffjay merged 23 commits into
mainfrom
issue-1194
May 13, 2026
Merged

feat(common): add config module with TOML schema and layered loading#1201
geoffjay merged 23 commits into
mainfrom
issue-1194

Conversation

@geoffjay
Copy link
Copy Markdown
Owner

Adds agentd_common::config with AgentdConfig TOML schema covering all 12 services, XDG-compliant file discovery, and three-layer precedence loading (defaults < file < env vars).

Summary

  • New config module in agentd-common with AgentdConfig top-level struct
  • Per-service config structs for all 12 services (ports, sizes, feature flags)
  • GeneralConfig covering log_level, log_format, and host
  • config_file_path(): checks AGENTD_CONFIG env var then XDG config dir
  • load() / load_from_path(): layered loading (compiled defaults < TOML file < env vars)
  • apply_env_overrides(): maps AGENTD_* env vars onto config fields
  • 23 unit tests with Mutex-serialised env var tests to prevent parallel races

Test plan

  • cargo test -p agentd-common — all 23 config tests pass
  • cargo fmt -p agentd-common — no formatting changes
  • cargo clippy -p agentd-common -- -D warnings — clean

Closes #1194

@geoffjay geoffjay added the review-agent Used to invoke a review by an agent tracking this label label May 12, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.77%. Comparing base (fd71366) to head (2036182).
⚠️ Report is 24 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1201   +/-   ##
=======================================
  Coverage   63.77%   63.77%           
=======================================
  Files         173      173           
  Lines        7733     7733           
  Branches     2566     2614   +48     
=======================================
  Hits         4932     4932           
  Misses       2780     2780           
  Partials       21       21           
Flag Coverage Δ
frontend 63.77% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Owner Author

@geoffjay geoffjay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(common): add config module with TOML schema and layered loading

Stack position: Based directly on main — this is the root of the config epic. PRs #1195#1199 are all stacked on top and cannot merge until this one lands. The architecture and any breaking decisions made here propagate to all five children.

The foundation approach is solid — #[serde(default)] on all structs, load_from_path as the testable entry point, Mutex-serialised env var tests — all correct. Three issues need addressing before merge, followed by schema gaps the migration PRs will need to handle.


Blocking — must fix before merge

1. AGENTD_HISTORY_SIZE naming collision between hook and monitor

Both existing services use the same env var name for different things:

Service Variable Meaning Default
crates/hook/src/lib.rs AGENTD_HISTORY_SIZE Shell event history 500
crates/monitor/src/lib.rs AGENTD_HISTORY_SIZE Metric snapshot count 120

The shared config (line 800) maps AGENTD_HISTORY_SIZEservices.hook.history_size only. There is no path for operators to independently tune the monitor's snapshot retention, and the migration PRs cannot fully replace both services with this schema as-is.

This needs a decision now — changing env var names post-migration is an operator-visible breaking change. Options:

# Option A — give both services distinct names
AGENTD_HOOK_HISTORY_SIZE    → services.hook.history_size
AGENTD_MONITOR_HISTORY_SIZE → services.monitor.history_size   (new field needed)

# Option B — keep hook's legacy name, add new one for monitor
AGENTD_HISTORY_SIZE         → services.hook.history_size      (backward-compat)
AGENTD_MONITOR_HISTORY_SIZE → services.monitor.history_size   (new)

Either option requires adding history_size: usize to MonitorConfig.

2. Incomplete env-var reference table — 11 implemented variables missing from the docs

The # Environment Variable Overrides table is the primary operator reference. It lists 26 variables but 11 that are fully implemented in apply_env_overrides are absent:

Missing from table Maps to
AGENTD_RECONCILE_INTERVAL_SECS services.orchestrator.reconcile_interval_secs
AGENTD_COLLECTION_INTERVAL_SECS services.monitor.collection_interval_secs
AGENTD_INDEX_LANGUAGES services.index.languages
AGENTD_NOTIFY_SERVICE_URL services.hook.notify_service_url
AGENTD_MCP_NOTIFY_URL services.mcp.notify_url
AGENTD_MCP_ASK_URL services.mcp.ask_url
AGENTD_MCP_MEMORY_URL services.mcp.memory_url
AGENTD_MCP_COMMUNICATE_URL services.mcp.communicate_url
AGENTD_MCP_WRAP_URL services.mcp.wrap_url
AGENTD_MCP_MONITOR_URL services.mcp.monitor_url
AGENTD_MCP_HOOK_URL services.mcp.hook_url

3. Orchestrator env-var rename introduces a silent breaking change

The existing orchestrator (crates/orchestrator/src/main.rs, line 351) reads:

let communicate_url = env::var("AGENTD_COMMUNICATE_SERVICE_URL")

The shared config introduces AGENTD_ORCHESTRATOR_COMMUNICATE_URL instead. When PR #1197 migrates the orchestrator, any operator who has AGENTD_COMMUNICATE_SERVICE_URL in their environment or systemd unit will silently fall back to the compiled default. The rename is fine — but it needs to be explicit and documented, e.g. a note in the env-var table: AGENTD_ORCHESTRATOR_COMMUNICATE_URL (replaces legacy AGENTD_COMMUNICATE_SERVICE_URL).


Schema gaps — required before migration PRs can complete

These fields exist in the current per-service configs but are absent from the shared structs. The migration PRs (#1196/#1197) cannot fully replace the existing configs without them.

HookConfig is missing (from crates/hook/src/config.rs):

pub notify_on_failure: bool,           // AGENTD_NOTIFY_ON_FAILURE, default: true
pub notify_on_long_running: bool,      // AGENTD_NOTIFY_ON_LONG_RUNNING, default: true
pub long_running_threshold_ms: u64,    // AGENTD_LONG_RUNNING_THRESHOLD_MS, default: 30_000

MonitorConfig is missing (from crates/monitor/src/lib.rs):

pub cpu_alert_threshold: f64,          // AGENTD_CPU_ALERT_THRESHOLD, default: 90.0
pub memory_alert_threshold: f64,       // AGENTD_MEMORY_ALERT_THRESHOLD, default: 90.0
pub disk_alert_threshold: f64,         // AGENTD_DISK_ALERT_THRESHOLD, default: 90.0
pub history_size: usize,               // see naming-collision note above, default: 120

IndexConfig is missing (from crates/index/src/config.rs):

pub embedding_endpoint: String,        // AGENTD_INDEX_EMBEDDING_ENDPOINT, default: "http://localhost:11434/v1"
pub embedding_api_key: Option<String>, // AGENTD_INDEX_EMBEDDING_API_KEY
pub lance_table: String,               // AGENTD_INDEX_LANCE_TABLE, default: "code_chunks"
pub watch_interval_secs: u64,          // AGENTD_INDEX_WATCH_INTERVAL, default: 30
pub ignore_patterns: Vec<String>,      // AGENTD_INDEX_IGNORE_PATTERNS

If adding these is intentionally deferred to the migration PRs, document the omissions with // TODO(#1196): add remaining fields before migration so the child PRs know what to add and Clippy can't silently miss it.


Non-blocking suggestions

Silent discard of malformed env var valuesparse_port("AGENTD_NOTIFY_PORT=abc") returns None and falls back to the default with no diagnostic. A tracing::warn! would help operators catch typos, though logging before the logger is initialised has its own tradeoffs.

Missing test coverage — no tests for AGENTD_RECONCILE_INTERVAL_SECS or AGENTD_COLLECTION_INTERVAL_SECS env overrides, and no test for AGENTD_INDEX_LANGUAGES with spaces ("go, ruby, kotlin") even though the implementation trims them correctly.

merge() "reset to default" limitationpick treats file == default as "not set in file", so a config file cannot explicitly reset a field to its compiled default value after a prior layer set it differently. This is harmless in practice since base is always AgentdConfig::default() at the call site, but the function signature implies a more general contract. A brief comment noting the assumption (// base is always AgentdConfig::default() in practice) would prevent confusion.


What's correct

  • #[serde(default)] on all structs — right approach for partial TOML
  • load_from_path(None) as the test entry point — avoids AGENTD_CONFIG races
  • ENV_LOCK: Mutex<()> pattern for env-var test serialisation
  • Option::or for notify_service_url in merge() — correct semantics
  • Comma-split with .trim() for AGENTD_INDEX_LANGUAGES
  • config_file_path() correctly guards against empty AGENTD_CONFIG
  • pub mod config in lib.rs — right re-export level
  • toml = { workspace = true } — no version drift risk
  • XDG path resolution via directories::ProjectDirs for lance paths

The layered loading logic is sound. Resolve the three blocking items and add the schema fields above, and this is ready to go.

@geoffjay geoffjay added needs-rework PR has review feedback that must be addressed before merging and removed review-agent Used to invoke a review by an agent tracking this label labels May 12, 2026
geoffjay added a commit that referenced this pull request May 12, 2026
…ilures

Replace silent unwrap_or_default() with unwrap_or_else(|e| { tracing::warn\!(...) })
in all six new service config loaders (ask, communicate, core, notify, orchestrator,
wrap). Restore diagnostic warn\!() calls for invalid AGENTD_WRAP_HISTORY_BYTES and
AGENTD_WRAP_CHANNEL_CAPACITY env vars with TODO comments referencing #1201.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
geoffjay added a commit that referenced this pull request May 12, 2026
…s; add TODO(#1201) comments

Replace silent unwrap_or_default() with unwrap_or_else(|e| tracing::warn\!(...)) in
all six migrated load() callers (hook, monitor, mcp, memory::EmbeddingConfig,
memory::LanceConfig, index, ui). Operators now see a warning when config.toml is
malformed rather than silently falling back to compiled defaults.

Add TODO(#1201) comments on every hardcoded fallback that is pending a schema
addition: notify_on_failure, notify_on_long_running, long_running_threshold_ms
(hook); cpu/memory/disk_alert_threshold, history_size (monitor);
embedding_endpoint (index.embedding); lance_table (index.lance).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@geoffjay geoffjay added review-agent Used to invoke a review by an agent tracking this label and removed needs-rework PR has review feedback that must be addressed before merging labels May 12, 2026
geoffjay and others added 15 commits May 12, 2026 16:29
…_file_path indirection

- Make cmd_init pub(crate) so tests can call it directly
- Replace boolean-literal overwrite-guard tests with real tests that set
  AGENTD_CONFIG to a tempdir and call cmd_init; verify error message mentions
  --force on refusal and that the template is written on force-overwrite
- Add ENV_LOCK mutex to serialise tests that mutate AGENTD_CONFIG
- Remove the config_file_path() local wrapper; import via use instead
- Fix cmd_show None arm to reuse the already-resolved path rather than
  calling config_file_path() a second time

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g tests

Add ENV_LOCK to mcp and index config test modules so env-var-mutating
tests are serialised in the parallel test harness. Also clear AGENTD_PORT
(set by agentd at runtime) before asserting default port values.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s; add TODO(#1201) comments

Replace silent unwrap_or_default() with unwrap_or_else(|e| tracing::warn\!(...)) in
all six migrated load() callers (hook, monitor, mcp, memory::EmbeddingConfig,
memory::LanceConfig, index, ui). Operators now see a warning when config.toml is
malformed rather than silently falling back to compiled defaults.

Add TODO(#1201) comments on every hardcoded fallback that is pending a schema
addition: notify_on_failure, notify_on_long_running, long_running_threshold_ms
(hook); cpu/memory/disk_alert_threshold, history_size (monitor);
embedding_endpoint (index.embedding); lance_table (index.lance).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a ValidateConfig trait to agentd_common with implementations for all
12 shared config sections. Add AgentdConfig::validate() that collects
errors from every section. Implement ValidateConfig on all service-level
config structs and wire validate() into each service's startup path so
misconfigurations fail early with descriptive error messages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Create dedicated config.rs modules for core, orchestrator, wrap, ask,
notify, and communicate. Each struct loads from agentd_common::config::load()
first, then overlays legacy AGENTD_* env vars for backward compatibility.
All inline env::var() calls in main.rs replaced with config struct access.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add impl ValidateConfig for EmbeddingConfig and LanceConfig in
  crates/memory/src/config.rs (blocking gap: memory was the only service
  with no validation at startup)
- Wire validation in memory/main.rs: call validate() on embedding and
  lance configs, load shared config for port/host so port=0 is caught
  before any I/O with a clear error message
- Remove validate_inner() indirection in crates/index/src/config.rs;
  move logic directly into impl ValidateConfig for IndexConfig and
  import the trait in index/main.rs and the test module
- Make validate_url() pub in agentd-common so service crates can reuse
  it without duplicating the http/https check

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ilures

Replace silent unwrap_or_default() with unwrap_or_else(|e| { tracing::warn\!(...) })
in all six new service config loaders (ask, communicate, core, notify, orchestrator,
wrap). Restore diagnostic warn\!() calls for invalid AGENTD_WRAP_HISTORY_BYTES and
AGENTD_WRAP_CHANNEL_CAPACITY env vars with TODO comments referencing #1201.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
geoffjay and others added 2 commits May 12, 2026 16:29
- Create docs/configuration.md with quick start, file location, precedence
  rules, complete key reference table, per-service sections, migration guide,
  and four example configs (minimal, development, production, Docker)
- Fix the env var table in agentd-common/config.rs to include all overrides
  (AGENTD_INDEX_LANGUAGES, AGENTD_RECONCILE_INTERVAL_SECS, all MCP URLs,
  AGENTD_NOTIFY_SERVICE_URL, AGENTD_COLLECTION_INTERVAL_SECS)
- Add link to configuration reference from README.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
geoffjay added 6 commits May 12, 2026 16:59
docs: add configuration system documentation
feat: add ValidateConfig trait and per-service validation
feat: add config structs for services with inline env var config
refactor: migrate services with existing config.rs to shared TOML config
feat(cli): add config init and config show subcommands
test(common): add integration tests for layered config precedence
@geoffjay geoffjay merged commit 31940cb into main May 13, 2026
4 of 10 checks passed
@geoffjay geoffjay deleted the issue-1194 branch May 13, 2026 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-agent Used to invoke a review by an agent tracking this label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add config module to agentd-common with TOML schema and layered loading

1 participant