feat: Weights & Biases sink integration by dantp-ai · Pull Request #59 · aganthos/clawloop

dantp-ai · 2026-04-22T20:50:25Z

Summary

What changed and why.

Test plan

pytest tests/ -x passes
Tested manually (if applicable)

gemini-code-assist

Code Review

This pull request introduces a Weights & Biases integration for ClawLoop, featuring the WandbSink class for logging metrics such as reward curves, playbook growth, and layer state hashes. The changes include a demonstration script, a comprehensive test suite, and the addition of wandb as an optional dependency. Review feedback identified an incorrect version constraint for the wandb library in pyproject.toml and recommended synchronizing the internal step counter within log_iteration to prevent state inconsistencies when mixing logging APIs.

…aganthos#51) * refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes aganthos#40 Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg normalization, session state, tool-call id reconciliation, harness update) into clawloop/environments/_purple_base.py. CAR and entropic adapters now override only the two bench-specific seams: _build_initial_messages and _format_a2a_response. No behavior change. 600 lines deleted, 379 added across the three files. * refactor: hoist stdlib imports in _purple_base, clarify reconcile scope Addresses Gemini review on aganthos#51: - Move socket, time, httpx to module-level imports (PEP 8). - Expand docstring + comment on _reconcile_tool_call_id to explain why it intentionally stops at the most-recent assistant message. No behavior change.

…nthos#58) Split the 486-line learning_loop() god-function into three focused helper classes plus five module-private glue functions. New modules: - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling + 3-way adapter dispatch) - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level counters + all archive writes) - clawloop/core/transaction.py (LayerTransaction — two-phase fb→optim→rollback protocol with cross-layer rollback invariant) learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines. Also: - Add Harness.pending_paradigm_insights() so LayerTransaction can query paradigm-tagged insights without touching _pending directly. - Move iter_cost accumulation outside the archive try-block in ArchiveRecorder so total_cost_tokens is tracked even when log_iteration fails. No public API change on learning_loop. Full suite unchanged (961 passed / 42 skipped) plus 28 new unit tests covering the extracted helpers (runner 97%, archive_recorder 92%, transaction 91% coverage).

bordeauxred · 2026-05-03T06:29:01Z

Thanks a lot @dantp-ai

* feat: Weights & Biases sink integration * refactor: extract common base for purple-agent adapters (#40) (#51) * refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes #40 Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg normalization, session state, tool-call id reconciliation, harness update) into clawloop/environments/_purple_base.py. CAR and entropic adapters now override only the two bench-specific seams: _build_initial_messages and _format_a2a_response. No behavior change. 600 lines deleted, 379 added across the three files. * refactor: hoist stdlib imports in _purple_base, clarify reconcile scope Addresses Gemini review on #51: - Move socket, time, httpx to module-level imports (PEP 8). - Expand docstring + comment on _reconcile_tool_call_id to explain why it intentionally stops at the most-recent assistant message. No behavior change. * refactor: extract helpers from learning_loop — fixes #39 (#58) Split the 486-line learning_loop() god-function into three focused helper classes plus five module-private glue functions. New modules: - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling + 3-way adapter dispatch) - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level counters + all archive writes) - clawloop/core/transaction.py (LayerTransaction — two-phase fb→optim→rollback protocol with cross-layer rollback invariant) learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines. Also: - Add Harness.pending_paradigm_insights() so LayerTransaction can query paradigm-tagged insights without touching _pending directly. - Move iter_cost accumulation outside the archive try-block in ArchiveRecorder so total_cost_tokens is tracked even when log_iteration fails. No public API change on learning_loop. Full suite unchanged (961 passed / 42 skipped) plus 28 new unit tests covering the extracted helpers (runner 97%, archive_recorder 92%, transaction 91% coverage). * fix: sync log_iteration with self._step * Ignore wandb dir --------- Co-authored-by: kiranannadatha8 <87536091+kiranannadatha8@users.noreply.github.com>

feat: Weights & Biases sink integration

d827d7c

dantp-ai self-assigned this Apr 22, 2026

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread pyproject.toml

Comment thread clawloop/integrations/wandb.py

kiranannadatha8 and others added 4 commits April 24, 2026 13:10

fix: sync log_iteration with self._step

7c1cab2

Ignore wandb dir

8859d71

bordeauxred merged commit 530e0c3 into aganthos:main May 3, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Weights & Biases sink integration#59

feat: Weights & Biases sink integration#59
bordeauxred merged 5 commits into
aganthos:mainfrom
dantp-ai:gh-52/add-observability-integration-wandb

dantp-ai commented Apr 22, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

bordeauxred commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dantp-ai commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

bordeauxred commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dantp-ai commented Apr 22, 2026 •

edited

Loading