refactor: extract helpers from learning_loop — fixes #39 by kiranannadatha8 · Pull Request #58 · aganthos/clawloop

kiranannadatha8 · 2026-04-20T00:37:17Z

Summary

Pure refactor of clawloop/core/loop.py — no public API change, no behavioral drift. Splits the 486-line learning_loop() god-function into three focused helper classes plus five module-private glue functions. Full suite unchanged (960 passed / 42 skipped), plus 27 new unit tests covering the extracted helpers.

What moved

New module	Responsibility
clawloop/core/runner.py	`EpisodeCollectorRunner` — task sampling + 3-way adapter dispatch
clawloop/core/archive_recorder.py	`ArchiveRecorder` — owns run-level counters + all archive writes
clawloop/core/transaction.py	`LayerTransaction` — two-phase fb→optim→rollback protocol

Five module-private helpers stay in loop.py (each <40 lines, single-use): _avg_reward, _set_evolver_context, _flush_generation_if_advanced, _log_evolution_entry, _run_after_iteration.

Before / After

	Before	After
`loop.py` total lines	742	337
`learning_loop()` body	486	100
Helper modules	0	3
New unit tests	—	27

Acceptance criterion from #39 (learning_loop ≤ 100 lines) met exactly.

Test plan

Full suite green: uv run pytest -q → 960 passed, 42 skipped (matches baseline)
New unit tests: tests/unit/core/test_runner.py (8), test_archive_recorder.py (11), test_transaction.py (8)
Coverage: runner 97%, archive_recorder 92%, transaction 91%
Rollback invariant preserved — verified in test_transaction.py::test_optim_error_rolls_back_all_snapshotted_layers
Paradigm-shift ordering preserved — verified in test_paradigm_shift_on_harness_fb_mutates_tried_paradigms
Archive JSONL format unchanged — existing test_archive_integration.py green
No public API change — clawloop/__init__.py, clawloop/core/__init__.py untouched; all 11 learning_loop call-sites unchanged
ruff check clawloop/core/ clean

gemini-code-assist

Code Review

This pull request refactors the learning_loop by extracting its core logic into specialized components: ArchiveRecorder for metrics and logging, EpisodeCollectorRunner for collection, and LayerTransaction for the training protocol. New unit tests are included for these modules. Feedback identifies a bug in ArchiveRecorder where cost tracking is incorrectly nested within a logging try-block, and notes an encapsulation violation in LayerTransaction regarding private harness state.

Split the 486-line learning_loop() god-function into three focused helper classes plus five module-private glue functions. New modules: - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling + 3-way adapter dispatch) - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level counters + all archive writes) - clawloop/core/transaction.py (LayerTransaction — two-phase fb→optim→rollback protocol with cross-layer rollback invariant) learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines. Also: - Add Harness.pending_paradigm_insights() so LayerTransaction can query paradigm-tagged insights without touching _pending directly. - Move iter_cost accumulation outside the archive try-block in ArchiveRecorder so total_cost_tokens is tracked even when log_iteration fails. No public API change on learning_loop. Full suite unchanged (961 passed / 42 skipped) plus 28 new unit tests covering the extracted helpers (runner 97%, archive_recorder 92%, transaction 91% coverage).

bordeauxred · 2026-04-21T08:29:03Z

tested and run LLM end-to-end against Gemini:
test_full_loop_with_real_reflector_and_evolver + all 4
test_evolver_real_llm.py tests pass — confirms learning_loop →
EpisodeCollectorRunner → LayerTransaction → ArchiveRecorder preserves semantics across iterations.

Thank you @kiranannadatha8!

…nthos#58) Split the 486-line learning_loop() god-function into three focused helper classes plus five module-private glue functions. New modules: - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling + 3-way adapter dispatch) - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level counters + all archive writes) - clawloop/core/transaction.py (LayerTransaction — two-phase fb→optim→rollback protocol with cross-layer rollback invariant) learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines. Also: - Add Harness.pending_paradigm_insights() so LayerTransaction can query paradigm-tagged insights without touching _pending directly. - Move iter_cost accumulation outside the archive try-block in ArchiveRecorder so total_cost_tokens is tracked even when log_iteration fails. No public API change on learning_loop. Full suite unchanged (961 passed / 42 skipped) plus 28 new unit tests covering the extracted helpers (runner 97%, archive_recorder 92%, transaction 91% coverage).

* feat: Weights & Biases sink integration * refactor: extract common base for purple-agent adapters (#40) (#51) * refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes #40 Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg normalization, session state, tool-call id reconciliation, harness update) into clawloop/environments/_purple_base.py. CAR and entropic adapters now override only the two bench-specific seams: _build_initial_messages and _format_a2a_response. No behavior change. 600 lines deleted, 379 added across the three files. * refactor: hoist stdlib imports in _purple_base, clarify reconcile scope Addresses Gemini review on #51: - Move socket, time, httpx to module-level imports (PEP 8). - Expand docstring + comment on _reconcile_tool_call_id to explain why it intentionally stops at the most-recent assistant message. No behavior change. * refactor: extract helpers from learning_loop — fixes #39 (#58) Split the 486-line learning_loop() god-function into three focused helper classes plus five module-private glue functions. New modules: - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling + 3-way adapter dispatch) - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level counters + all archive writes) - clawloop/core/transaction.py (LayerTransaction — two-phase fb→optim→rollback protocol with cross-layer rollback invariant) learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines. Also: - Add Harness.pending_paradigm_insights() so LayerTransaction can query paradigm-tagged insights without touching _pending directly. - Move iter_cost accumulation outside the archive try-block in ArchiveRecorder so total_cost_tokens is tracked even when log_iteration fails. No public API change on learning_loop. Full suite unchanged (961 passed / 42 skipped) plus 28 new unit tests covering the extracted helpers (runner 97%, archive_recorder 92%, transaction 91% coverage). * fix: sync log_iteration with self._step * Ignore wandb dir --------- Co-authored-by: kiranannadatha8 <87536091+kiranannadatha8@users.noreply.github.com>

Split the 486-line learning_loop() god-function into three focused helper classes plus five module-private glue functions. New modules: - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling + 3-way adapter dispatch) - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level counters + all archive writes) - clawloop/core/transaction.py (LayerTransaction — two-phase fb→optim→rollback protocol with cross-layer rollback invariant) learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines. Also: - Add Harness.pending_paradigm_insights() so LayerTransaction can query paradigm-tagged insights without touching _pending directly. - Move iter_cost accumulation outside the archive try-block in ArchiveRecorder so total_cost_tokens is tracked even when log_iteration fails. No public API change on learning_loop. Full suite unchanged (961 passed / 42 skipped) plus 28 new unit tests covering the extracted helpers (runner 97%, archive_recorder 92%, transaction 91% coverage).

* feat: Weights & Biases sink integration * refactor: extract common base for purple-agent adapters (#40) (#51) * refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes #40 Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg normalization, session state, tool-call id reconciliation, harness update) into clawloop/environments/_purple_base.py. CAR and entropic adapters now override only the two bench-specific seams: _build_initial_messages and _format_a2a_response. No behavior change. 600 lines deleted, 379 added across the three files. * refactor: hoist stdlib imports in _purple_base, clarify reconcile scope Addresses Gemini review on #51: - Move socket, time, httpx to module-level imports (PEP 8). - Expand docstring + comment on _reconcile_tool_call_id to explain why it intentionally stops at the most-recent assistant message. No behavior change. * refactor: extract helpers from learning_loop — fixes #39 (#58) Split the 486-line learning_loop() god-function into three focused helper classes plus five module-private glue functions. New modules: - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling + 3-way adapter dispatch) - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level counters + all archive writes) - clawloop/core/transaction.py (LayerTransaction — two-phase fb→optim→rollback protocol with cross-layer rollback invariant) learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines. Also: - Add Harness.pending_paradigm_insights() so LayerTransaction can query paradigm-tagged insights without touching _pending directly. - Move iter_cost accumulation outside the archive try-block in ArchiveRecorder so total_cost_tokens is tracked even when log_iteration fails. No public API change on learning_loop. Full suite unchanged (961 passed / 42 skipped) plus 28 new unit tests covering the extracted helpers (runner 97%, archive_recorder 92%, transaction 91% coverage). * fix: sync log_iteration with self._step * Ignore wandb dir --------- Co-authored-by: kiranannadatha8 <87536091+kiranannadatha8@users.noreply.github.com>

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread clawloop/core/archive_recorder.py Outdated

Comment thread clawloop/core/transaction.py Outdated

kiranannadatha8 force-pushed the 39-extract-loop-helpers branch from d9bb8e2 to dc738b8 Compare April 20, 2026 00:57

kiranannadatha8 force-pushed the 39-extract-loop-helpers branch from dc738b8 to 03a9301 Compare April 20, 2026 00:59

bordeauxred merged commit fae94e9 into aganthos:main Apr 21, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: extract helpers from learning_loop — fixes #39#58

refactor: extract helpers from learning_loop — fixes #39#58
bordeauxred merged 1 commit into
aganthos:mainfrom
kiranannadatha8:39-extract-loop-helpers

kiranannadatha8 commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

bordeauxred commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kiranannadatha8 commented Apr 20, 2026

Summary

What moved

Before / After

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

bordeauxred commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants