Skip to content

refactor: extract helpers from learning_loop — fixes #39#58

Merged
bordeauxred merged 1 commit into
aganthos:mainfrom
kiranannadatha8:39-extract-loop-helpers
Apr 21, 2026
Merged

refactor: extract helpers from learning_loop — fixes #39#58
bordeauxred merged 1 commit into
aganthos:mainfrom
kiranannadatha8:39-extract-loop-helpers

Conversation

@kiranannadatha8
Copy link
Copy Markdown
Contributor

Summary

Pure refactor of clawloop/core/loop.py — no public API change, no behavioral drift. Splits the 486-line learning_loop() god-function into three focused helper classes plus five module-private glue functions. Full suite unchanged (960 passed / 42 skipped), plus 27 new unit tests covering the extracted helpers.

What moved

New module Responsibility
clawloop/core/runner.py EpisodeCollectorRunner — task sampling + 3-way adapter dispatch
clawloop/core/archive_recorder.py ArchiveRecorder — owns run-level counters + all archive writes
clawloop/core/transaction.py LayerTransaction — two-phase fb→optim→rollback protocol

Five module-private helpers stay in loop.py (each <40 lines, single-use): _avg_reward, _set_evolver_context, _flush_generation_if_advanced, _log_evolution_entry, _run_after_iteration.

Before / After

Before After
loop.py total lines 742 337
learning_loop() body 486 100
Helper modules 0 3
New unit tests 27

Acceptance criterion from #39 (learning_loop ≤ 100 lines) met exactly.

Test plan

  • Full suite green: uv run pytest -q → 960 passed, 42 skipped (matches baseline)
  • New unit tests: tests/unit/core/test_runner.py (8), test_archive_recorder.py (11), test_transaction.py (8)
  • Coverage: runner 97%, archive_recorder 92%, transaction 91%
  • Rollback invariant preserved — verified in test_transaction.py::test_optim_error_rolls_back_all_snapshotted_layers
  • Paradigm-shift ordering preserved — verified in test_paradigm_shift_on_harness_fb_mutates_tried_paradigms
  • Archive JSONL format unchanged — existing test_archive_integration.py green
  • No public API change — clawloop/__init__.py, clawloop/core/__init__.py untouched; all 11 learning_loop call-sites unchanged
  • ruff check clawloop/core/ clean

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the learning_loop by extracting its core logic into specialized components: ArchiveRecorder for metrics and logging, EpisodeCollectorRunner for collection, and LayerTransaction for the training protocol. New unit tests are included for these modules. Feedback identifies a bug in ArchiveRecorder where cost tracking is incorrectly nested within a logging try-block, and notes an encapsulation violation in LayerTransaction regarding private harness state.

Comment thread clawloop/core/archive_recorder.py Outdated
Comment thread clawloop/core/transaction.py Outdated
@kiranannadatha8 kiranannadatha8 force-pushed the 39-extract-loop-helpers branch from d9bb8e2 to dc738b8 Compare April 20, 2026 00:57
Split the 486-line learning_loop() god-function into three focused
helper classes plus five module-private glue functions.

New modules:
  - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling
    + 3-way adapter dispatch)
  - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level
    counters + all archive writes)
  - clawloop/core/transaction.py (LayerTransaction — two-phase
    fb→optim→rollback protocol with cross-layer rollback invariant)

learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines.

Also:
  - Add Harness.pending_paradigm_insights() so LayerTransaction can
    query paradigm-tagged insights without touching _pending directly.
  - Move iter_cost accumulation outside the archive try-block in
    ArchiveRecorder so total_cost_tokens is tracked even when
    log_iteration fails.

No public API change on learning_loop. Full suite unchanged
(961 passed / 42 skipped) plus 28 new unit tests covering the
extracted helpers (runner 97%, archive_recorder 92%,
transaction 91% coverage).
@kiranannadatha8 kiranannadatha8 force-pushed the 39-extract-loop-helpers branch from dc738b8 to 03a9301 Compare April 20, 2026 00:59
@bordeauxred
Copy link
Copy Markdown
Contributor

tested and run LLM end-to-end against Gemini:
test_full_loop_with_real_reflector_and_evolver + all 4
test_evolver_real_llm.py tests pass — confirms learning_loop
EpisodeCollectorRunnerLayerTransactionArchiveRecorder preserves semantics across iterations.

Thank you @kiranannadatha8!

@bordeauxred bordeauxred merged commit fae94e9 into aganthos:main Apr 21, 2026
5 checks passed
dantp-ai pushed a commit to dantp-ai/clawloop that referenced this pull request Apr 24, 2026
…nthos#58)

Split the 486-line learning_loop() god-function into three focused
helper classes plus five module-private glue functions.

New modules:
  - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling
    + 3-way adapter dispatch)
  - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level
    counters + all archive writes)
  - clawloop/core/transaction.py (LayerTransaction — two-phase
    fb→optim→rollback protocol with cross-layer rollback invariant)

learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines.

Also:
  - Add Harness.pending_paradigm_insights() so LayerTransaction can
    query paradigm-tagged insights without touching _pending directly.
  - Move iter_cost accumulation outside the archive try-block in
    ArchiveRecorder so total_cost_tokens is tracked even when
    log_iteration fails.

No public API change on learning_loop. Full suite unchanged
(961 passed / 42 skipped) plus 28 new unit tests covering the
extracted helpers (runner 97%, archive_recorder 92%,
transaction 91% coverage).
bordeauxred pushed a commit that referenced this pull request May 3, 2026
* feat: Weights & Biases sink integration

* refactor: extract common base for purple-agent adapters (#40) (#51)

* refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes #40

Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg
normalization, session state, tool-call id reconciliation, harness update)
into clawloop/environments/_purple_base.py. CAR and entropic adapters now
override only the two bench-specific seams: _build_initial_messages and
_format_a2a_response.

No behavior change. 600 lines deleted, 379 added across the three files.

* refactor: hoist stdlib imports in _purple_base, clarify reconcile scope

Addresses Gemini review on #51:
- Move socket, time, httpx to module-level imports (PEP 8).
- Expand docstring + comment on _reconcile_tool_call_id to explain why
  it intentionally stops at the most-recent assistant message.

No behavior change.

* refactor: extract helpers from learning_loop — fixes #39 (#58)

Split the 486-line learning_loop() god-function into three focused
helper classes plus five module-private glue functions.

New modules:
  - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling
    + 3-way adapter dispatch)
  - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level
    counters + all archive writes)
  - clawloop/core/transaction.py (LayerTransaction — two-phase
    fb→optim→rollback protocol with cross-layer rollback invariant)

learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines.

Also:
  - Add Harness.pending_paradigm_insights() so LayerTransaction can
    query paradigm-tagged insights without touching _pending directly.
  - Move iter_cost accumulation outside the archive try-block in
    ArchiveRecorder so total_cost_tokens is tracked even when
    log_iteration fails.

No public API change on learning_loop. Full suite unchanged
(961 passed / 42 skipped) plus 28 new unit tests covering the
extracted helpers (runner 97%, archive_recorder 92%,
transaction 91% coverage).

* fix: sync log_iteration with self._step

* Ignore wandb dir

---------

Co-authored-by: kiranannadatha8 <87536091+kiranannadatha8@users.noreply.github.com>
bordeauxred pushed a commit that referenced this pull request May 22, 2026
Split the 486-line learning_loop() god-function into three focused
helper classes plus five module-private glue functions.

New modules:
  - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling
    + 3-way adapter dispatch)
  - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level
    counters + all archive writes)
  - clawloop/core/transaction.py (LayerTransaction — two-phase
    fb→optim→rollback protocol with cross-layer rollback invariant)

learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines.

Also:
  - Add Harness.pending_paradigm_insights() so LayerTransaction can
    query paradigm-tagged insights without touching _pending directly.
  - Move iter_cost accumulation outside the archive try-block in
    ArchiveRecorder so total_cost_tokens is tracked even when
    log_iteration fails.

No public API change on learning_loop. Full suite unchanged
(961 passed / 42 skipped) plus 28 new unit tests covering the
extracted helpers (runner 97%, archive_recorder 92%,
transaction 91% coverage).
bordeauxred pushed a commit that referenced this pull request May 22, 2026
* feat: Weights & Biases sink integration

* refactor: extract common base for purple-agent adapters (#40) (#51)

* refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes #40

Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg
normalization, session state, tool-call id reconciliation, harness update)
into clawloop/environments/_purple_base.py. CAR and entropic adapters now
override only the two bench-specific seams: _build_initial_messages and
_format_a2a_response.

No behavior change. 600 lines deleted, 379 added across the three files.

* refactor: hoist stdlib imports in _purple_base, clarify reconcile scope

Addresses Gemini review on #51:
- Move socket, time, httpx to module-level imports (PEP 8).
- Expand docstring + comment on _reconcile_tool_call_id to explain why
  it intentionally stops at the most-recent assistant message.

No behavior change.

* refactor: extract helpers from learning_loop — fixes #39 (#58)

Split the 486-line learning_loop() god-function into three focused
helper classes plus five module-private glue functions.

New modules:
  - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling
    + 3-way adapter dispatch)
  - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level
    counters + all archive writes)
  - clawloop/core/transaction.py (LayerTransaction — two-phase
    fb→optim→rollback protocol with cross-layer rollback invariant)

learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines.

Also:
  - Add Harness.pending_paradigm_insights() so LayerTransaction can
    query paradigm-tagged insights without touching _pending directly.
  - Move iter_cost accumulation outside the archive try-block in
    ArchiveRecorder so total_cost_tokens is tracked even when
    log_iteration fails.

No public API change on learning_loop. Full suite unchanged
(961 passed / 42 skipped) plus 28 new unit tests covering the
extracted helpers (runner 97%, archive_recorder 92%,
transaction 91% coverage).

* fix: sync log_iteration with self._step

* Ignore wandb dir

---------

Co-authored-by: kiranannadatha8 <87536091+kiranannadatha8@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants