Skip to content

feat: Weights & Biases sink integration#59

Merged
bordeauxred merged 5 commits into
aganthos:mainfrom
dantp-ai:gh-52/add-observability-integration-wandb
May 3, 2026
Merged

feat: Weights & Biases sink integration#59
bordeauxred merged 5 commits into
aganthos:mainfrom
dantp-ai:gh-52/add-observability-integration-wandb

Conversation

@dantp-ai
Copy link
Copy Markdown
Collaborator

@dantp-ai dantp-ai commented Apr 22, 2026

Summary

What changed and why.

Test plan

  • pytest tests/ -x passes
  • Tested manually (if applicable)

@dantp-ai dantp-ai self-assigned this Apr 22, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Weights & Biases integration for ClawLoop, featuring the WandbSink class for logging metrics such as reward curves, playbook growth, and layer state hashes. The changes include a demonstration script, a comprehensive test suite, and the addition of wandb as an optional dependency. Review feedback identified an incorrect version constraint for the wandb library in pyproject.toml and recommended synchronizing the internal step counter within log_iteration to prevent state inconsistencies when mixing logging APIs.

Comment thread pyproject.toml
Comment thread clawloop/integrations/wandb.py
kiranannadatha8 and others added 4 commits April 24, 2026 13:10
…aganthos#51)

* refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes aganthos#40

Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg
normalization, session state, tool-call id reconciliation, harness update)
into clawloop/environments/_purple_base.py. CAR and entropic adapters now
override only the two bench-specific seams: _build_initial_messages and
_format_a2a_response.

No behavior change. 600 lines deleted, 379 added across the three files.

* refactor: hoist stdlib imports in _purple_base, clarify reconcile scope

Addresses Gemini review on aganthos#51:
- Move socket, time, httpx to module-level imports (PEP 8).
- Expand docstring + comment on _reconcile_tool_call_id to explain why
  it intentionally stops at the most-recent assistant message.

No behavior change.
…nthos#58)

Split the 486-line learning_loop() god-function into three focused
helper classes plus five module-private glue functions.

New modules:
  - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling
    + 3-way adapter dispatch)
  - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level
    counters + all archive writes)
  - clawloop/core/transaction.py (LayerTransaction — two-phase
    fb→optim→rollback protocol with cross-layer rollback invariant)

learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines.

Also:
  - Add Harness.pending_paradigm_insights() so LayerTransaction can
    query paradigm-tagged insights without touching _pending directly.
  - Move iter_cost accumulation outside the archive try-block in
    ArchiveRecorder so total_cost_tokens is tracked even when
    log_iteration fails.

No public API change on learning_loop. Full suite unchanged
(961 passed / 42 skipped) plus 28 new unit tests covering the
extracted helpers (runner 97%, archive_recorder 92%,
transaction 91% coverage).
@bordeauxred
Copy link
Copy Markdown
Contributor

Thanks a lot @dantp-ai

@bordeauxred bordeauxred merged commit 530e0c3 into aganthos:main May 3, 2026
5 checks passed
bordeauxred pushed a commit that referenced this pull request May 22, 2026
* feat: Weights & Biases sink integration

* refactor: extract common base for purple-agent adapters (#40) (#51)

* refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes #40

Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg
normalization, session state, tool-call id reconciliation, harness update)
into clawloop/environments/_purple_base.py. CAR and entropic adapters now
override only the two bench-specific seams: _build_initial_messages and
_format_a2a_response.

No behavior change. 600 lines deleted, 379 added across the three files.

* refactor: hoist stdlib imports in _purple_base, clarify reconcile scope

Addresses Gemini review on #51:
- Move socket, time, httpx to module-level imports (PEP 8).
- Expand docstring + comment on _reconcile_tool_call_id to explain why
  it intentionally stops at the most-recent assistant message.

No behavior change.

* refactor: extract helpers from learning_loop — fixes #39 (#58)

Split the 486-line learning_loop() god-function into three focused
helper classes plus five module-private glue functions.

New modules:
  - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling
    + 3-way adapter dispatch)
  - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level
    counters + all archive writes)
  - clawloop/core/transaction.py (LayerTransaction — two-phase
    fb→optim→rollback protocol with cross-layer rollback invariant)

learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines.

Also:
  - Add Harness.pending_paradigm_insights() so LayerTransaction can
    query paradigm-tagged insights without touching _pending directly.
  - Move iter_cost accumulation outside the archive try-block in
    ArchiveRecorder so total_cost_tokens is tracked even when
    log_iteration fails.

No public API change on learning_loop. Full suite unchanged
(961 passed / 42 skipped) plus 28 new unit tests covering the
extracted helpers (runner 97%, archive_recorder 92%,
transaction 91% coverage).

* fix: sync log_iteration with self._step

* Ignore wandb dir

---------

Co-authored-by: kiranannadatha8 <87536091+kiranannadatha8@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants