You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add an agn replay command that can rerun a previously recorded AGN run by replaying the concrete tool actions it performed, such as write_file, read_file, shell commands, patches, etc. The replay should be based on what AGN actually did, not on the original chat/messages.
This allows the same exact changes/actions from a prior run to be applied again deterministically.
Task
Implement replay support with entry points such as:
The implementation should support recording enough structured execution data during an AGN run to later replay the meaningful side-effecting actions. At minimum, the replay data should capture the ordered tool/action calls, relevant inputs, outputs or status where needed, and enough metadata to identify and validate the target workspace/environment.
Use Cases
Repeat the same run: Re-run the same sequence of actions again and again without re-prompting the model or relying on message history.
Apply environment-tested changes: Run AGN in a disposable environment, inspect/approve the resulting behavior, then replay the recorded actions against another target such as the real working directory.
Acceptance Criteria
AGN records the concrete actions/tool calls performed during a run in a replayable format.
agn replay --conversation <conversation-id> replays actions from a recorded conversation/run.
agn replay --environment <environment-id> replays actions associated with a disposable environment run.
Replay applies the recorded actions in the original order.
Replay does not depend on the original prompt/messages to regenerate behavior.
Replay has clear behavior for reads, writes, patches, shell commands, failures, and mismatched preconditions.
Replay should provide useful output showing which actions were replayed and whether each succeeded or failed.
Documentation explains the two main workflows: deterministic repeat runs and applying changes that were first tested in an isolated environment.
Open Questions
Should replay be fully automatic, interactive/confirming each side-effecting step, or support both modes?
Should read_file and other non-mutating actions be replayed, skipped, or validated against expected prior outputs?
How should shell commands be classified and guarded, especially destructive or environment-specific commands?
Should replay validate file preconditions/hashes before applying writes or patches?
What should happen if the target workspace differs from the original environment?
Should the CLI option be named --environment rather than --environement?
References
Support AGN disposable Docker environments #7 — Describes AGN disposable Docker environments. Replay should integrate with this workflow by allowing a run performed safely in an isolated environment to be replayed/applied later outside that environment after inspection.
Summary
Add an
agn replaycommand that can rerun a previously recorded AGN run by replaying the concrete tool actions it performed, such aswrite_file,read_file, shell commands, patches, etc. The replay should be based on what AGN actually did, not on the original chat/messages.This allows the same exact changes/actions from a prior run to be applied again deterministically.
Task
Implement replay support with entry points such as:
The implementation should support recording enough structured execution data during an AGN run to later replay the meaningful side-effecting actions. At minimum, the replay data should capture the ordered tool/action calls, relevant inputs, outputs or status where needed, and enough metadata to identify and validate the target workspace/environment.
Use Cases
Acceptance Criteria
agn replay --conversation <conversation-id>replays actions from a recorded conversation/run.agn replay --environment <environment-id>replays actions associated with a disposable environment run.Open Questions
read_fileand other non-mutating actions be replayed, skipped, or validated against expected prior outputs?--environmentrather than--environement?References