refactor: extract helpers from learning_loop — fixes #39#58
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the learning_loop by extracting its core logic into specialized components: ArchiveRecorder for metrics and logging, EpisodeCollectorRunner for collection, and LayerTransaction for the training protocol. New unit tests are included for these modules. Feedback identifies a bug in ArchiveRecorder where cost tracking is incorrectly nested within a logging try-block, and notes an encapsulation violation in LayerTransaction regarding private harness state.
d9bb8e2 to
dc738b8
Compare
Split the 486-line learning_loop() god-function into three focused
helper classes plus five module-private glue functions.
New modules:
- clawloop/core/runner.py (EpisodeCollectorRunner — task sampling
+ 3-way adapter dispatch)
- clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level
counters + all archive writes)
- clawloop/core/transaction.py (LayerTransaction — two-phase
fb→optim→rollback protocol with cross-layer rollback invariant)
learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines.
Also:
- Add Harness.pending_paradigm_insights() so LayerTransaction can
query paradigm-tagged insights without touching _pending directly.
- Move iter_cost accumulation outside the archive try-block in
ArchiveRecorder so total_cost_tokens is tracked even when
log_iteration fails.
No public API change on learning_loop. Full suite unchanged
(961 passed / 42 skipped) plus 28 new unit tests covering the
extracted helpers (runner 97%, archive_recorder 92%,
transaction 91% coverage).
dc738b8 to
03a9301
Compare
|
tested and run LLM end-to-end against Gemini: Thank you @kiranannadatha8! |
…nthos#58) Split the 486-line learning_loop() god-function into three focused helper classes plus five module-private glue functions. New modules: - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling + 3-way adapter dispatch) - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level counters + all archive writes) - clawloop/core/transaction.py (LayerTransaction — two-phase fb→optim→rollback protocol with cross-layer rollback invariant) learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines. Also: - Add Harness.pending_paradigm_insights() so LayerTransaction can query paradigm-tagged insights without touching _pending directly. - Move iter_cost accumulation outside the archive try-block in ArchiveRecorder so total_cost_tokens is tracked even when log_iteration fails. No public API change on learning_loop. Full suite unchanged (961 passed / 42 skipped) plus 28 new unit tests covering the extracted helpers (runner 97%, archive_recorder 92%, transaction 91% coverage).
* feat: Weights & Biases sink integration * refactor: extract common base for purple-agent adapters (#40) (#51) * refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes #40 Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg normalization, session state, tool-call id reconciliation, harness update) into clawloop/environments/_purple_base.py. CAR and entropic adapters now override only the two bench-specific seams: _build_initial_messages and _format_a2a_response. No behavior change. 600 lines deleted, 379 added across the three files. * refactor: hoist stdlib imports in _purple_base, clarify reconcile scope Addresses Gemini review on #51: - Move socket, time, httpx to module-level imports (PEP 8). - Expand docstring + comment on _reconcile_tool_call_id to explain why it intentionally stops at the most-recent assistant message. No behavior change. * refactor: extract helpers from learning_loop — fixes #39 (#58) Split the 486-line learning_loop() god-function into three focused helper classes plus five module-private glue functions. New modules: - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling + 3-way adapter dispatch) - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level counters + all archive writes) - clawloop/core/transaction.py (LayerTransaction — two-phase fb→optim→rollback protocol with cross-layer rollback invariant) learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines. Also: - Add Harness.pending_paradigm_insights() so LayerTransaction can query paradigm-tagged insights without touching _pending directly. - Move iter_cost accumulation outside the archive try-block in ArchiveRecorder so total_cost_tokens is tracked even when log_iteration fails. No public API change on learning_loop. Full suite unchanged (961 passed / 42 skipped) plus 28 new unit tests covering the extracted helpers (runner 97%, archive_recorder 92%, transaction 91% coverage). * fix: sync log_iteration with self._step * Ignore wandb dir --------- Co-authored-by: kiranannadatha8 <87536091+kiranannadatha8@users.noreply.github.com>
Split the 486-line learning_loop() god-function into three focused
helper classes plus five module-private glue functions.
New modules:
- clawloop/core/runner.py (EpisodeCollectorRunner — task sampling
+ 3-way adapter dispatch)
- clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level
counters + all archive writes)
- clawloop/core/transaction.py (LayerTransaction — two-phase
fb→optim→rollback protocol with cross-layer rollback invariant)
learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines.
Also:
- Add Harness.pending_paradigm_insights() so LayerTransaction can
query paradigm-tagged insights without touching _pending directly.
- Move iter_cost accumulation outside the archive try-block in
ArchiveRecorder so total_cost_tokens is tracked even when
log_iteration fails.
No public API change on learning_loop. Full suite unchanged
(961 passed / 42 skipped) plus 28 new unit tests covering the
extracted helpers (runner 97%, archive_recorder 92%,
transaction 91% coverage).
* feat: Weights & Biases sink integration * refactor: extract common base for purple-agent adapters (#40) (#51) * refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes #40 Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg normalization, session state, tool-call id reconciliation, harness update) into clawloop/environments/_purple_base.py. CAR and entropic adapters now override only the two bench-specific seams: _build_initial_messages and _format_a2a_response. No behavior change. 600 lines deleted, 379 added across the three files. * refactor: hoist stdlib imports in _purple_base, clarify reconcile scope Addresses Gemini review on #51: - Move socket, time, httpx to module-level imports (PEP 8). - Expand docstring + comment on _reconcile_tool_call_id to explain why it intentionally stops at the most-recent assistant message. No behavior change. * refactor: extract helpers from learning_loop — fixes #39 (#58) Split the 486-line learning_loop() god-function into three focused helper classes plus five module-private glue functions. New modules: - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling + 3-way adapter dispatch) - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level counters + all archive writes) - clawloop/core/transaction.py (LayerTransaction — two-phase fb→optim→rollback protocol with cross-layer rollback invariant) learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines. Also: - Add Harness.pending_paradigm_insights() so LayerTransaction can query paradigm-tagged insights without touching _pending directly. - Move iter_cost accumulation outside the archive try-block in ArchiveRecorder so total_cost_tokens is tracked even when log_iteration fails. No public API change on learning_loop. Full suite unchanged (961 passed / 42 skipped) plus 28 new unit tests covering the extracted helpers (runner 97%, archive_recorder 92%, transaction 91% coverage). * fix: sync log_iteration with self._step * Ignore wandb dir --------- Co-authored-by: kiranannadatha8 <87536091+kiranannadatha8@users.noreply.github.com>
Summary
Pure refactor of clawloop/core/loop.py — no public API change, no behavioral drift. Splits the 486-line
learning_loop()god-function into three focused helper classes plus five module-private glue functions. Full suite unchanged (960 passed / 42 skipped), plus 27 new unit tests covering the extracted helpers.What moved
EpisodeCollectorRunner— task sampling + 3-way adapter dispatchArchiveRecorder— owns run-level counters + all archive writesLayerTransaction— two-phase fb→optim→rollback protocolFive module-private helpers stay in
loop.py(each <40 lines, single-use):_avg_reward,_set_evolver_context,_flush_generation_if_advanced,_log_evolution_entry,_run_after_iteration.Before / After
loop.pytotal lineslearning_loop()bodyAcceptance criterion from #39 (
learning_loop ≤ 100 lines) met exactly.Test plan
uv run pytest -q→ 960 passed, 42 skipped (matches baseline)tests/unit/core/test_runner.py(8),test_archive_recorder.py(11),test_transaction.py(8)test_transaction.py::test_optim_error_rolls_back_all_snapshotted_layerstest_paradigm_shift_on_harness_fb_mutates_tried_paradigmstest_archive_integration.pygreenclawloop/__init__.py,clawloop/core/__init__.pyuntouched; all 11learning_loopcall-sites unchangedruff check clawloop/core/clean