Skip to content

ChuyueSun/inference-dalek

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

inference-dalek

LLM-based Verus formal verification agent for curve25519-dalek.

inference-dalek generates Verus contracts (requires / ensures), loop invariants, and termination proofs for Rust functions, then repairs verification failures through multi-turn LLM cycles. It targets cryptographic code in the curve25519-dalek family and uses a layered domain model (field → scalar → Edwards → Montgomery → Ristretto) so each function is proved against an accumulated set of already-verified lemmas.

What it does

Given a function with admit() or missing proof blocks, the pipeline:

  1. Spec generation — emits requires / ensures clauses from the function body and surrounding context.
  2. Assembly — inserts the spec into a scaffold and runs Verus.
  3. Proof generation — on failure, generates loop invariants, decreases clauses, and proof blocks.
  4. Repair loop — classifies Verus errors (22 distinct types), dispatches to a repair-skill agent, re-verifies, and guards against regressions.
  5. Escalation — admits and flags the function for human review after a configurable number of turns.

For the full design rationale, see pipeline_complete.md.

Install

pip install -e ".[dev]"          # core + pytest
pip install -e ".[openai,dev]"   # include OpenAI backend

Required environment:

  • ANTHROPIC_API_KEY
  • VERUS_PATH (auto-detected if verus is on your PATH)

Optional: OPENAI_API_KEY for the OpenAI backend.

CLI

inference-dalek <subcommand>
# run | proof | struct | project | evaluate | baseline | synthesize | generate

Single-function debug run with full DEBUG logging:

inference-dalek run --sample <id> -v -v --debug-log debug.log

Batch HAB evaluation:

scripts/run_hab_eval.sh <experiment_id> [--model claude-opus-4-6] [--layer-set A]

Verbosity flags: -v = INFO, -v -v = DEBUG. --debug-log <file> writes DEBUG output regardless of the console level.

Tests

pytest                           # all
pytest -x                        # stop on first failure
pytest -m "not integration"      # skip Verus-binary tests
pytest -m "not slow"

Tests use a MockProvider for LLM calls and never hit real APIs.

Architecture

High-level layout:

  • inference_dalek/agent.py — orchestrator (VerusAgent)
  • inference_dalek/repair_loop.py — multi-turn verify/repair state machine
  • inference_dalek/error_routing.py — error type → repair skill dispatch
  • inference_dalek/stages/ — 18 pluggable pipeline stages
  • inference_dalek/providers/ — Anthropic / OpenAI backends
  • inference_dalek/verus/ — subprocess runner, error parsing, code surgery
  • inference_dalek/project/ — multi-file project verification with recovery cascade
  • inference_dalek/eval/ — HAB evaluation framework

See CLAUDE.md for conventions, hard rules, and detailed module map.

Related work

  • Yoonho Lee, Roshen Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab, Chelsea Finn. Meta-Harness: End-to-End Optimization of Model Harnesses. arXiv:2603.28052, 2026. — End-to-end optimization of multi-stage LLM pipelines; relevant context for tuning agentic verification harnesses like the one in this repository.

License

BSD 3-Clause.

About

LLM-based Verus formal verification agent for curve25519-dalek

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors