Skip to content

Observability integrations (umbrella): W&B, MLflow, Langfuse, OpenTelemetry, Phoenix #52

@bordeauxred

Description

@bordeauxred

Why

ClawLoop already emits structured episodes, reward signals, and layer-state transitions. Teams running it in production or research invariably have an observability stack they want those signals landing in. Shipping first-class sinks for the common ones makes ClawLoop feel native in existing workflows instead of yet another dashboard to check.

Each item below is a small, self-contained adapter with a clear contract — ideal entry points for first-time contributors.

Integration stubs

  • Weights & Biases sink — log per-iteration reward curves, playbook growth, layer state hashes. Pattern: clawloop.integrations.wandb.WandbSink(run_id=...) consuming the existing episode stream.
  • MLflow tracking — iterations as runs, playbook entries as artifacts, reward signals as metrics. Same shape as W&B, different backend.

Contract

Each sink should:

  1. Consume the existing Episode / EpisodeSummary / iteration-level events — no core changes.
  2. Be an optional extra: uv sync --extra wandb etc., so core stays dependency-light.
  3. Ship with a minimal example under examples/observability/ and a one-paragraph README section.
  4. Fail soft — a broken sink never breaks a training run.

Why an umbrella?

Each integration is ~a day of work and independent of the others. Tracking them together shows intent; splitting them off keeps PRs reviewable.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestgood first issueGood for newcomersroadmapFuture direction; not a launch blocker

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions