糊弄 / HulunGuard

HulunGuard is a proof-first reliability guard and desktop risk meter for long-running AI agent work.

它不是“AI 文风检测器”。它监控的是智能体是否正在失去可验证的任务执行能力：目标漂移、证据不足、上下文断裂、空转总结、工具失败后继续下结论。

Public examples are synthetic. Do not publish private conversation logs, credentials, customer files, or production records into the repository when using HulunGuard on real work.

Security policy: SECURITY.md. Private vulnerability reports should use the GitHub advisory flow: https://github.com/Felix-Zuo/HulunGuard/security/advisories/new

What Works Now

Desktop HulunGauge: small always-on-top progress bar, click-drag to move, double-click to close.
Per-conversation monitors: each conversation gets its own monitor ID.
Group/project board: active conversations are grouped and averaged into project-level risk.
Local state ledger: .hulun/state.json, .hulun/resume.md, .hulun/risk.json, .hulun/verification_report.md.
Universal startup prompt: #HULUN_ON for any agent that can run shell commands.
OpenClaw hook: injects HulunGuard guidance into OpenClaw agent bootstrap.
Realtime HulunIndex observations: record phase, claims, failures, tokens, cost, latency, and retry fingerprints.
Privacy-safe trace ingestion: import generic JSON/JSONL, OpenTelemetry GenAI, OpenInference, OpenHands-like events, SWE-agent-like trajectories, LangGraph stream parts, LangSmith run exports, Langfuse OTEL traces, Phoenix/OpenInference spans, and Phoenix CLI trace exports without persisting raw sensitive payloads by default.
Native service export: explicitly export bounded LangSmith run-query results or Langfuse Observations API v2 rows into a redacted local file, then inspect with trace-doctor and import.
Runtime payload bridge: SDK, MCP, and stdin ingestion can queue in-memory spans or stream events without writing trace files first.
Local HTTP collector: accept live OTLP/HTTP JSON traces at /v1/traces and adapter payloads at /ingest/<format> into the durable queue, with offline operations status, Prometheus metrics, alert rules, service templates, and lifecycle controls for long-running managed mode.
Python SDK and MCP server: agents can record runtime state directly without shell glue.
Built-in validation suite: run synthetic healthy/slop-risk scenarios before release.
Product operations: onboard, quickstart, doctor, trace-doctor, compatibility, integration-kit, adapter-matrix, schema-check, release-verify, cleanup, and benchmark commands for onboarding, trace diagnostics, agent compatibility, first-run integration packages, adapter integration, schema compatibility, release verification, retention cleanup, scan performance, and public-safe real-world workflow checks.
Conversation runtime monitoring: per-conversation events, user challenges, pending tool calls, unresolved failures, unsupported final claims, and monitor sync.

Quick Start

Prove a supported agent path with one command:

python .\hulun.py onboard --agent langgraph
python .\hulun.py onboard --agent all --force

Print a copy-paste startup path:

python .\hulun.py quickstart

Open a desktop bar:

python .\hulun.py open --conversation "Codex task" --group "Demo Project" --widget

Update a monitor:

python .\hulun.py update --id M1 --score 72 --summary "Tool failed and no evidence yet" --reason "unresolved failure"
python .\hulun.py update --id M1 --delta -30 --summary "Tests passed and evidence was recorded"

Record a realtime agent observation and immediately rescan the slop index:

python .\hulun.py observe --type final_attempt --phase final --summary "Everything is completed and verified" --claim "completed and verified" --scan
python .\hulun.py observe --type tool_result --phase verify --result fail --summary "pytest failed" --action-key "pytest" --scan
python .\hulun.py observe --type llm_call --phase summarize --summary "Long summary without evidence" --prompt-tokens 9000 --completion-tokens 5000 --cost 6.5 --latency-ms 70000 --scan

Start a true conversation runtime monitor:

python .\hulun.py conversation start --name "Codex live task" --group "HulunGuard" --monitor --widget
python .\hulun.py conversation event --id C1 --type user_challenge --summary "User challenged whether monitoring is actually live"
python .\hulun.py conversation event --id C1 --type tool_call --phase verify --summary "Run pytest" --action-key pytest
python .\hulun.py conversation event --id C1 --type tool_result --phase verify --summary "pytest passed" --action-key pytest
python .\hulun.py conversation scan --id C1

Conversation monitoring is not magic chat-log access. It becomes live when the agent or adapter records runtime events as they happen.

Import an external trace:

python .\hulun.py trace-doctor --file .\trace.jsonl --json
python .\hulun.py ingest --file .\trace.jsonl --format generic --scan
python .\hulun.py ingest --file .\otel-trace.json --format opentelemetry --scan
python .\hulun.py ingest --file .\openinference-trace.json --format openinference --scan
python .\hulun.py ingest --file .\openhands-events.json --format openhands --scan
python .\hulun.py ingest --file .\run.traj --format swe-agent --scan
python .\hulun.py ingest --file .\langgraph-stream.json --format langgraph --scan
python .\hulun.py ingest --file .\langsmith-runs.json --format langsmith --scan
python .\hulun.py ingest --file .\langfuse-otel.json --format langfuse --scan
python .\hulun.py ingest --file .\phoenix-openinference.json --format phoenix --scan
python .\hulun.py ingest --file .\phoenix-trace.json --format auto --scan
python .\hulun.py ingest --file .\openai-agents-trace.json --format openai-agents --scan
python .\hulun.py export-otel --output .\hulun-otel.json

For an empty project, add --init-if-missing to create the minimal local ledger before import.

Export a bounded LangSmith run slice:

$env:LANGSMITH_API_KEY = "<key>"
python .\hulun.py service-export langsmith --project-id "<project-id>" --api-key-env LANGSMITH_API_KEY --output .\langsmith-runs.json --json
python .\hulun.py trace-doctor --format langsmith --file .\langsmith-runs.json --json
python .\hulun.py ingest --format langsmith --file .\langsmith-runs.json --scan --init-if-missing

Export a bounded Langfuse observation slice:

$env:LANGFUSE_PUBLIC_KEY = "<public-key>"
$env:LANGFUSE_SECRET_KEY = "<secret-key>"
python .\hulun.py service-export langfuse --public-key-env LANGFUSE_PUBLIC_KEY --secret-key-env LANGFUSE_SECRET_KEY --from-start-time "2026-06-25T00:00:00Z" --to-start-time "2026-06-25T01:00:00Z" --output .\langfuse-observations.json --json
python .\hulun.py trace-doctor --format generic --file .\langfuse-observations.json --json
python .\hulun.py ingest --format generic --file .\langfuse-observations.json --scan --init-if-missing

Import a Phoenix CLI trace export:

px trace list --limit 100 --last-n-minutes 60 --format raw --file .\phoenix-trace.json
python .\hulun.py trace-doctor --format auto --file .\phoenix-trace.json --json
python .\hulun.py ingest --format auto --file .\phoenix-trace.json --scan --init-if-missing

LangSmith and Langfuse service exports require explicit credentials and are documented in docs/SERVICE_EXPORTS.md.

Queue high-frequency runtime events and flush them in batches:

python .\hulun.py batch enqueue --type tool_result --phase verify --summary "pytest passed" --result pass
python .\hulun.py batch ingest-file --file .\trace.jsonl --format generic
'{"events":[{"type":"tasks","event_type":"tool_result","phase":"verify","summary":"pytest passed","result":"pass","action_key":"pytest"}]}' | python .\hulun.py batch ingest-stdin --format langgraph
python .\hulun.py batch status
python .\hulun.py batch flush --limit 500 --scan

batch writes a durable local JSONL queue first. ingest-stdin accepts JSON or JSONL from an agent process, shell pipe, or host runtime. flush moves queued observations into .hulun/state.json; flush --scan then recomputes the HulunIndex from those events.

Run a live local HTTP collector for OTLP/HTTP JSON or adapter payloads:

python .\hulun.py collector serve
python .\hulun.py collector serve --flush-interval-seconds 5 --scan-on-flush --init-if-missing
python .\hulun.py collector smoke --json
python .\hulun.py collector smoke --managed --scan --init-if-missing --json
python .\hulun.py collector shutdown-check --json
python .\hulun.py collector status --require-status-file --queue-pending-threshold 100 --json
python .\hulun.py collector metrics --require-status-file --queue-pending-threshold 100 --dead-letter-threshold 0
python .\hulun.py collector alert-rules --output .\.hulun\collector-alerts --force --json
python .\hulun.py collector service-template --output .\.hulun\collector-service --force --json
python .\hulun.py collector service-lifecycle --output .\.hulun\collector-service-lifecycle --force --json
python .\hulun.py batch flush --scan --init-if-missing

The collector listens on 127.0.0.1:4318 by default. POST /v1/traces accepts OTLP/HTTP JSON, while POST /ingest/<format> accepts adapter payloads such as generic, langgraph, langsmith, langfuse, phoenix, and openai-agents. Queue-only mode is the default. Managed mode periodically flushes queued observations and can update .hulun/risk.json automatically. collector shutdown-check verifies that a temporary collector can stop gracefully and write a final stopped runtime status. collector status reads queue, status, and risk files without opening a server and includes grouped diagnostics with operator actions for queue, status freshness, runtime lifecycle, dead letters, managed flush, and risk. collector metrics and GET /metrics export Prometheus health metrics without local paths as labels. collector alert-rules writes Prometheus alerting rule files for those metrics without installing them or changing Alertmanager routing. collector service-template generates systemd, launchd, and Windows Scheduled Task templates without installing them or embedding tokens. collector service-lifecycle generates reviewed install/start/stop/restart/status/uninstall controls for those service targets without running them. Non-loopback binds require --allow-remote --token.

By default, runtime observations and imported traces redact known secrets, emails, URL query strings, private home paths, and raw payload fields such as prompts, completions, outputs, and tool arguments. Use --include-sensitive --retention-days 7 only for trusted local debugging.

Check mainstream agent compatibility:

python .\hulun.py compatibility
python .\hulun.py compatibility --json

Generate a verified onboarding kit for an agent:

python .\hulun.py onboard --agent langgraph
python .\hulun.py integration-kit --agent langgraph --verify
python .\hulun.py integration-kit --agent all --output .\.hulun\integration-kits --force --verify

The observability surface is documented in docs/OBSERVABILITY.md. The security boundary and threat assumptions are documented in docs/THREAT_MODEL.md.

Run the release validation suite:

python .\hulun.py doctor --run-validation
python .\hulun.py validate
python .\hulun.py calibrate
python .\hulun.py calibration-drift
python .\hulun.py threat-model-check --json
python .\hulun.py compatibility --json
python .\hulun.py integration-kit --agent all --output .\.hulun\integration-kits --force --verify --json
python .\hulun.py onboard --agent all --output .\.hulun\onboarding --force --json
python .\hulun.py adapter-matrix --json
python .\hulun.py collector smoke --json
python .\hulun.py collector smoke --managed --scan --init-if-missing --json
python .\hulun.py collector shutdown-check --json
python .\hulun.py collector status --require-status-file --queue-pending-threshold 100 --json
python .\hulun.py collector metrics --require-status-file --queue-pending-threshold 100 --dead-letter-threshold 0
python .\hulun.py collector alert-rules --output .\.hulun\collector-alerts --force --json
python .\hulun.py collector service-template --output .\.hulun\collector-service --force --json
python .\hulun.py collector service-lifecycle --output .\.hulun\collector-service-lifecycle --force --json
@'
{"type":"tool_result","phase":"verify","summary":"stdin payload passed","result":"pass","action_key":"stdin-smoke"}
'@ | python .\hulun.py batch ingest-stdin --format generic --json
'{"type":"tool_result","phase":"verify","summary":"pytest passed","result":"pass","action_key":"pytest","refs":["command:pytest"]}' | Set-Content -Encoding UTF8 trace-doctor-sample.jsonl
python .\hulun.py trace-doctor --file trace-doctor-sample.jsonl --format generic --json
python .\hulun.py schema-check --json
python .\hulun.py cleanup --json
python .\hulun.py benchmark --events 10000
python .\hulun.py benchmark --suite real-world
python -m pytest -q
python -m build
python scripts/generate_release_metadata.py --verify --json
python scripts/verify_release_artifacts.py
python .\hulun.py release-verify --asset-dir .\dist --skip-attestation --json

Open the project board:

python .\hulun.py board --serve --open

Generate a prompt for any agent:

python .\hulun.py prompt --conversation "Claude research" --group "Market Research"

Paste the output beginning with #HULUN_ON into that agent conversation.

SDK And MCP

Python agents can integrate directly:

from hulun_guard import HulunGuardClient

client = HulunGuardClient(".")
client.init(
    objective="Ship an evidence-backed agent workflow",
    criteria=["final claims have verification evidence"],
)
client.observe(
    event_type="tool_result",
    phase="verify",
    summary="pytest passed",
    result="pass",
    source_platform="my-agent",
    action_key="pytest",
    scan=True,
)

MCP-capable hosts can run:

hulun-mcp --root .

See docs/SDK_AND_MCP.md for tool names, schemas, and privacy behavior.

Task Ledger Mode

For a concrete project root:

python .\hulun.py init --objective "Complete a long-running task" --criterion "Final result is evidence-backed"
python .\hulun.py add-step --text "Build the artifact"
python .\hulun.py record-evidence --kind test --summary "Tests passed" --command "python -m unittest discover -s tests"
python .\hulun.py scan
python .\hulun.py verify
python .\hulun.py dashboard

If Chrome cannot open a local .html path:

python .\hulun.py serve --open

Risk Bands

0-35 green: continue.
36-65 yellow: checkpoint or calibrate.
66-100 red: block final, recover state, or ask the user for missing evidence.

Signals:

Evidence gap.
Claim overhang.
Unfinished criteria.
Stagnation.
Unhandled failures.
Context decay.
Intent drift.
Phase disorder.
Retry loop.
Polish without progress.
Cost pressure.
Uncertainty without verification.

OpenClaw

Install the hook:

.\scripts\Install-OpenClawHook.ps1
openclaw hooks list --json

The hook should show hulunguard as eligible, loadable, enabled, and attached to agent:bootstrap.

Project Structure

src/hulun_guard/: Python package.
hulun.py: no-install CLI entry.
tools/: Windows wrappers.
scripts/verify_release_artifacts.py: clean-environment wheel and sdist smoke test.
scripts/generate_release_metadata.py: repository wrapper for release checksum and CycloneDX SBOM generation.
scripts/verify_github_release.py: repository wrapper for hulun release-verify.
tests/: end-to-end tests.
CODE_OF_CONDUCT.md: public participation and moderation boundary.
CONTRIBUTING.md: development and pull request standards.
integrations/openclaw/: OpenClaw hook.
candidate_skill/: Codex skill adapter.
docs/: usage and integration docs.
research/: source matrix and industrial design notes.
docs/ADAPTER_CONFORMANCE.md: supported adapter contract and unsupported-field policy.
docs/ADAPTER_MATRIX.md: adapter integration matrix and support tiers.
docs/AGENT_COMPATIBILITY.md: mainstream agent compatibility paths and bridge boundaries.
docs/SERVICE_EXPORTS.md: hosted service export commands, privacy boundary, and failure modes.
docs/INTEGRATION_KITS.md: first-run onboarding kits for supported agent runtimes and trace formats.
docs/ONBOARDING.md: zero-knowledge onboarding command, output contract, and safety boundary.
docs/THREAT_MODEL.md: local-first security model, privacy boundaries, and threat assumptions.
docs/SCHEMAS.md: public JSON schema compatibility and migration policy.
docs/RETENTION.md: local retention cleanup model and safety boundary.
docs/REAL_WORLD_BENCHMARKS.md: public-safe real-world benchmark suite and fixture policy.

Validation

python -m unittest discover -s tests
python .\hulun.py validate
python .\hulun.py calibration-drift
python .\hulun.py threat-model-check --json
python .\hulun.py compatibility --json
python .\hulun.py integration-kit --agent all --output .\.hulun\integration-kits --force --verify --json
python .\hulun.py onboard --agent all --output .\.hulun\onboarding --force --json
python .\hulun.py adapter-matrix --json
python .\hulun.py schema-check --json
python .\hulun.py cleanup --json
python .\hulun.py benchmark --events 10000
python .\hulun.py benchmark --suite real-world
python -m build
python scripts/generate_release_metadata.py --verify --json
python scripts/verify_release_artifacts.py
python -m hulun_guard release-verify --asset-dir dist --skip-attestation --json
python .\hulun.py --help
python .\hulun.py open --help
python .\hulun.py board --help

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github		.github
candidate_skill/anti-slop-longrun		candidate_skill/anti-slop-longrun
docs		docs
integrations/openclaw/hulunguard		integrations/openclaw/hulunguard
research		research
scripts		scripts
src/hulun_guard		src/hulun_guard
tests		tests
tools		tools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PROJECT_DEFINITION.md		PROJECT_DEFINITION.md
README.md		README.md
RELEASING.md		RELEASING.md
SECURITY.md		SECURITY.md
hulun.py		hulun.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

糊弄 / HulunGuard

What Works Now

Quick Start

SDK And MCP

Task Ledger Mode

Risk Bands

OpenClaw

Project Structure

Validation

About

Uh oh!

Releases 43

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

糊弄 / HulunGuard

What Works Now

Quick Start

SDK And MCP

Task Ledger Mode

Risk Bands

OpenClaw

Project Structure

Validation

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 43

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages