feat(viewer): interactive solution visualizations via per-task visualize.py#20
Merged
Conversation
…ize.py
Adds a "see the solution" view to kai viewer: an interactive picture of
what a run's best program actually produced, not just its score. A task
ships an optional visualize.py beside its evaluator.py (the visualizer is
a per-task input, parallel to the evaluator) exposing:
render(program_path: str) -> str # self-contained HTML/SVG fragment
Execution model (kaievolve/viewer/solution.py): lazy at view-time, run in
a sandboxed subprocess with a timeout (it execs evolved code; the viewer
binds to localhost), and cached to best/solution_viz.html, invalidated
when the program is rewritten. This works retroactively on existing runs
since it only needs the stored best_program.py. The visualizer is resolved
by task-dir name under --tasks roots (default examples/), or dropped
directly in the bench task dir.
Two reference visualizers, both interactive, vanilla JS + inline SVG, no
deps, CSS scoped under .kv-sol:
- packing_circles_max_sum_of_radii: the packing in the unit square,
hover a circle for radius/center/share, scroll-zoom, drag-pan, shading
by radius.
- autocorrelation_C1: the step function f plus its autoconvolution f*f
with the score-driving peak marked; hover to read any step.
Plus: /setup/{label}/run/{idx}/solution route (graceful when no visualizer
exists), a dashboard "see the solution" button shown only when one is
found, a --tasks CLI flag, and skills/visualization/SKILL.md documenting
the contract.
Tests: test_solution.py covers visualizer discovery, the render+cache
cycle (incl. cache invalidation and error reporting), and both reference
visualizers; test_viewer.py covers the route present/absent paths. Full
suite green (333 tests).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Viewer roadmap item E. Adds a "see the solution" view — an interactive picture of what a run's best program actually produced, modeled on AlphaEvolve's solution panel.
The design (as agreed)
A task ships an optional
visualize.pybeside itsevaluator.py— the visualizer is a per-task input, exactly parallel to the evaluator. It exposes one function:Execution: lazy at view-time + cached. Since solutions aren't stored (only code + metrics), the viewer re-runs the best program through
visualize.pyin a sandboxed subprocess with a timeout (it execs evolved code; the viewer binds to localhost only), and caches the fragment tobest/solution_viz.html(invalidated when the program is rewritten). First render ~2–5s, cached ~0.04s. Works retroactively on all existing runs — it only needs the storedbest_program.py.Discovery: by task-dir name under
--tasksroots (defaultexamples/), or avisualize.pydropped directly in the bench task dir.Two reference visualizers (interactive, vanilla JS + inline SVG, no deps, CSS scoped under
.kv-sol)fplus its autoconvolutionf∗fwith the score-driving peak marked in red; hover to read any step's height.Also
/setup/{label}/run/{idx}/solutionroute (graceful "no visualizer" message when absent).--tasksCLI flag onkai viewer.skills/visualization/SKILL.mddocumenting the contract for new tasks.Verification
test_solution.py(9): visualizer discovery (direct / by-name / none), render+cache cycle (incl. cache invalidation, force, error-reported-not-raised, missing program), and both reference visualizers render hermetically off the committedinitial_program.pys.test_viewer.py(+2): solution route present/absent paths and the conditional dashboard button.🤖 Generated with Claude Code