AgentOps Workbench is a local observability and audit tool for AI coding-agent runs. It helps teams understand what an agent did, what it changed, what evidence supports its final answer, and where the run created risk.
It is built for post-hoc review of Claude Code, Codex, PAI/KAI-style, and other coding-agent workflows through a shared JSONL event schema.
- Latest published release:
v3.1.0— one-command install: standalone binaries viacurl | sh(no Bun, no clone) - Current
main: tracks the latest release - Capabilities: stable local review workflow with simplified product commands, guided first-run setup, first-class Codex and Claude Code capture commands, forensic plain-text import, deterministic quality gates for CI/PR workflows, read-only MCP session/report lookup, OpenInference-style JSON span export, decision-quality dashboard views, documented compatibility for schemas, adapters, CLI commands, config, reports, exports, migrations, privacy defaults, and release smoke coverage
- Runtime model: local CLI, local SQLite, stdout reports
- Distribution model: standalone self-contained binaries (macOS/Linux, arm64/x64) via the
curl | shinstaller or release download; source clone with Bun for development; npm publication still deferred - Native Codex exec JSONL ingestion: implemented
- Native Claude Code stream JSON ingestion: implemented with synthetic fixture coverage
AI coding agents can execute long, high-impact workflows across files, shell commands, MCP tools, tests, and external systems. The transcript usually contains the truth, but it is hard to inspect after the fact.
Engineering leaders need a compact answer to:
- What did the agent do?
- Which files and commands were involved?
- Did it run tests or only claim success?
- Did it touch risky paths or expose secrets?
- How long did it take and how much did it cost?
- Where did it retry, stall, or change direction?
- Is the output good enough to trust?
Download the standalone binary — no Bun, no clone, no PATH setup:
curl -fsSL https://raw.githubusercontent.com/DevenDucommun/agentops-workbench/main/install.sh | sh
agentops --helpThe installer detects your OS/arch (macOS and Linux, arm64/x64), downloads the
matching binary from the latest release,
and installs it to /usr/local/bin (override with AGENTOPS_INSTALL_DIR). You
can also grab a binary from the release page directly. The binary is
self-contained — the Bun runtime and SQLite are bundled in.
Then try it on synthetic fixtures:
agentops init
agentops demo
agentops look
agentops check
agentops openRequirements: Bun and Git.
git clone https://github.com/DevenDucommun/agentops-workbench.git
cd agentops-workbench
bun install
export PATH="$PWD/bin:$PATH" # so you can type `agentops` instead of ./bin/agentops
agentops init
agentops demo
agentops look
agentops check
agentops save
agentops openThe agentops command is the repo's bin/agentops (a Bun script). The PATH
line above makes it available in the current shell; add it to your shell profile
to keep it. Without it, run the binary directly as ./bin/agentops <command>.
For a no-surprises demo, inspect generated synthetic artifacts in docs/demo, or regenerate them locally:
bun run demo:artifacts
bun run smoke:demo-artifactsFor a new audited run:
agentops run codex "review the current diff"
agentops lookor:
agentops run claude "review the current diff"
agentops lookFor after-the-fact review, import an existing machine-readable JSONL artifact:
agentops audit path/to/session.jsonlTo create an auditable artifact without the AgentOps wrapper, run the provider in its machine-readable mode:
codex exec --json "review the current diff" > codex-session.jsonl
claude -p --output-format stream-json --verbose "review the current diff" > claude-session.jsonlPlain terminal output and copied chat text can be imported for best-effort forensic review:
agentops audit path/to/transcript.txtForensic text imports are lower-fidelity than provider JSONL. Reports label the
adapter as forensic-text, mark shell-prompt commands as observed, mark
narrative command/file mentions as inferred, and flag weak transcripts that
do not include observable commands.
The local dashboard reads from SQLite and surfaces a merge-readiness decision, claim-vs-evidence checks, and a risk drilldown for each session. The synthetic demo fixtures exercise three decision states:
Ready — verification evidence present, no blocking risks:
Needs review — at least one risk to look at before merging:
Blocked — high-severity risks or unsupported success claims:
Reproduce these states locally:
agentops audit ./fixtures/sample-session.jsonl
agentops audit ./fixtures/needs-review-session.jsonl
agentops audit ./fixtures/risky-session.jsonl
agentops openGenerate a repo-aware PR report:
agentops save prExpose local AgentOps evidence to MCP clients:
agentops mcpCheck public-readiness hygiene:
agentops scan-publicationValidate large synthetic-session performance:
bun run smoke:large-sessionValidate tracked synthetic demo artifacts:
bun run smoke:demo-artifactsSee Install above for the standalone binary (recommended) and
Run From Source for the Bun clone path. Full
details — PATH usage, bun link, release-archive caveats, and packaging — are in
docs/INSTALLATION.md.
Regular workflow:
agentops init
agentops demo
agentops run codex "review the current change"
agentops run claude "review the current change"
agentops audit ./fixtures/sample-session.jsonl
agentops status
agentops look
agentops check
agentops save
agentops openagentops save writes a local review bundle with default filenames:
agentops-report.mdagentops-pr-comment.mdagentops-gate.jsonagentops-session.json
Specific saves are available when needed:
agentops save report
agentops save pr
agentops save json
agentops save json --repo
agentops save json --format openinference
agentops check --saveAdvanced commands adapters, config, sessions, and scan-publication
remain available. There are two intents — launch a new run (agentops run,
with --no-ingest to write the artifact only) and review an existing artifact
(agentops audit, with --quiet to ingest only). The v1.x
review|report|export|gate|repo-report|pr|inspect|dashboard|ingest|show
commands were removed in v2.0.0; capture/import and the
save repo-json|trace|gate kinds were folded into flags in v3.0.0. See
CLI reference and Compatibility policy.
See Compatibility policy for the stable v3.0.0
surfaces and experimental boundaries.
agentops mcp starts a local stdio MCP server for read-only lookup of stored
sessions, inspection output, session reports, quality gates, and repo reports.
It does not ingest artifacts, run agents, post to GitHub, or read private
transcript stores.
See MCP server for available tools and client configuration.
AgentOps currently ingests normalized post-hoc JSONL exports plus native Claude Code and Codex CLI event streams:
agentops-jsonl: canonicalagentops.event.v1JSONL — any sanitized export (Claude Code, Codex, PAI/KAI, ...); provenance is preserved in each record'ssourcefieldclaude-code-stream-json: nativeclaude -p --output-format stream-jsonJSONL streamcodex-exec-jsonl: nativecodex exec --jsonJSONL streamforensic-text: best-effort plain terminal transcript or copied coding-agent text
agentops run launches Codex or Claude Code and ingests the result. Add
--no-ingest to write the native JSONL artifact without ingesting it:
agentops run codex "summarize the repo risk areas"
agentops run claude "review the current change"
agentops run codex "summarize the repo risk areas" --no-ingestRaw captures are written under .agentops/captures/ by default and should be
reviewed before publishing or turning into fixtures.
PAI-compatible post-hoc exports use the same canonical JSONL schema and are
auto-detected as agentops-jsonl:
agentops audit ./fixtures/pai-export-session.jsonl --quiet
agentops look
agentops save reportSynthetic Claude Code and Codex exports are the same canonical AgentOps JSONL,
distinguished only by their source field:
agentops audit ./fixtures/claude-code-session.jsonl --quiet
agentops audit ./fixtures/claude-code-stream-session.jsonl --quiet
agentops audit ./fixtures/codex-session.jsonl --quiet
agentops audit ./fixtures/codex-exec-session.jsonl --quiet
agentops adapters --input ./fixtures/codex-session.jsonlThe claude-code-session and codex-session fixtures are canonical
agentops-jsonl export examples (source: claude-code / source: codex). The
codex-exec-session fixture represents the native codex exec --json stream
shape with synthetic data.
The claude-code-stream-json fixture represents the native
claude -p --output-format stream-json --verbose stream shape with synthetic
data.
Forensic text import is intentionally narrower than transcript-store scraping:
agentops audit ./fixtures/forensic-terminal-transcript.txt
agentops audit ./fixtures/forensic-final-only.txt
agentops audit ./fixtures/forensic-codex-final-output.txt
agentops audit ./fixtures/forensic-claude-text-output.txtUse it for saved terminal output or copied chat text when JSONL is unavailable. It can infer commands, files, and final claims, but missing evidence remains missing evidence. Raw Claude/Codex private transcript-file parsing remains out of scope.
To inspect adapter detection:
agentops adapters --input ./fixtures/codex-session.jsonl
agentops adapters --input ./fixtures/claude-code-stream-session.jsonl
agentops adapters --input ./fixtures/codex-exec-session.jsonlAgentOps is local-first by design:
- The default SQLite database lives at
.agentops/agentops.db. .agentops/,.agents/, local databases, and env files are ignored by git.- Raw payload storage is disabled by default.
- Raw payload hashes are stored by default.
- Redaction runs before storage by default.
- Public fixtures are synthetic.
agentops scan-publicationprovides a baseline public-readiness check.- Forensic imports may contain shell prompts, local paths, environment output, copied secrets, or account identifiers. Keep real transcripts under ignored local paths until redaction has been reviewed.
Override the database path when needed:
AGENTOPS_DB=/path/to/agentops.db agentops sessionsCurrent user-facing docs:
- Installation (includes packaging strategy)
- CLI reference (includes the repo report /
save pr) - Capture guide
- Dashboard
- MCP server
- Quality gates
- Exports
- Configuration
- Demo artifacts
Architecture and compatibility:
- Architecture
- Compatibility policy
- Event schema (includes standards mapping)
- Adapter strategy (includes the hook envelope shape)
- Publication and privacy plan
Release docs:
- Release checklist (includes the release flow)
- Changelog
Historical planning, research, and the Spec-Kit MVP artifacts live under docs/archive: roadmaps, project brief, preliminary plan, research landscape, native adapter research, PAI integration plan, design decisions, pre-1.0 release records, and the Spec-Kit constitution/spec/plan/tasks.
- Session summary
- Timeline of major actions
- Files touched
- Commands run
- Tests and verification evidence
- Risk flags
- Stalls/retries/loops
- Cost/token summary, when available
- Final outcome assessment
- Hosted SaaS
- Multi-user auth
- Full distributed-trace / OTLP-style waterfall UI (the local decision dashboard is a supported feature)
- Model benchmarking
- Deep semantic evals
- Direct modification of agent behavior
- Raw Claude Code transcript-file parsing
- TypeScript + Bun
- SQLite for local storage
- Markdown report output first
- Local decision dashboard as a supported first-class surface
- Adapter-based ingestion for Claude Code, KAI, and future runners
bun install --frozen-lockfile
bun run ciTo use the exact agentops command during local development, put the repo's bin directory on your path:
export PATH="$PWD/bin:$PATH"
agentops audit ./fixtures/sample-session.jsonl
agentops look
agentops check
agentops save

