ASSERT.

Adaptive Spec-driven Scoring for Evaluation and Regression Testing
Local-first. Framework-agnostic. Trace-aware.

🚀 Get started | 🌐 Visit project website | 🔌 View supported targets | 📘 CLI Reference | 🧪 Examples

Why ASSERT?

Most AI systems start with a specification: product requirements, policies, system prompts, or launch criteria describing what the system should and should not do.

But evaluation often starts elsewhere: generic scorers, predefined benchmarks, or manual test cases that drift from the original intent.

ASSERT closes that gap. It turns your specified behaviors in natural language into structured, executable evaluations that can be reviewed, run, scored, and improved over time.

From the natural language specification, the ASSERT pipeline derives behavior categories, generates single-turn and multi-turn test cases, inferences them against your target, and uses an LLM judge to score each conversation against your policies.

What you get with ASSERT

Spec-driven coverage - test cases are generated from your product requirements and context, not a generic benchmark. You specify the behaviors that you want to test for
Test any model endpoint via integrations with LiteLLM, supporting 100+ model endpoints from platform providers such as Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM.
Test any agent or multi-agent system via integrations with OpenInference. Evaluate a LangGraph agent, a CrewAI / OpenAI Agents SDK / DSPy / LlamaIndex / AutoGen system, custom multi-agent orchestration, a Python callable, or a hosted model — without rewriting the evaluation orchestration pipeline.
Agent trace-grounded judgment - the recommended integration captures OpenTelemetry spans (OpenInference auto-instruments 33+ frameworks in two lines — from assert_ai import auto_trace; auto_trace.enable() — or you can emit your own with the OTel SDK) so the judge can cite tool calls, routing, model calls, and latency as evidence — not just the final response.
Portable artifacts - every stage writes JSON/JSONL files locally for inspection, CI, and sharing.
Bundled local viewer - browse runs side-by-side, pin a baseline, drill into per-behavior dimension breakdowns, and read judge justifications cited against the captured traces.

Get started

Quick install

pip install -e ".[otel,langgraph]"       # install
cp .env.example .env                     # add your provider key
assert-ai run --config examples/travel_planner_langgraph/eval_config.yaml

🌐 Project website ↗	📝 Technical blog ↗	🚀 Quickstart guide ↗	📚 Documentation ↗
Learn about ASSERT	Read the Command Line post	Follow the full walkthrough	Browse concepts and guides

Acknowledgments

ASSERT's core method is AI-assisted systematization — turning a broad, contested behavior concept into an explicit, measurable specification — following Agarwal et al. (2026), AI-Assisted Systematization for Evaluating GenAI Systems from Microsoft Research. The staged pipeline that turns that specification into generated scenarios, runs them against a target, and judges the results is modeled in spirit on the design of Bloom and Petri, open-source behavioral-evaluation frameworks from the Anthropic alignment team (Safety Research, MIT licensed).

Adapted third-party material and the corresponding license notices are documented in THIRD_PARTY_NOTICES.md. If you use ASSERT in research, please also cite Agarwal et al. (2026) and Bloom (see CITATION.cff).

Team and contributors

ASSERT was built by the Microsoft Responsible AI organization.

Product: Mehrnoosh Sameki, Minsoo Thigpen, Chang Liu, Abby Palia, Hanna Kim
Science: Riccardo Fogliato, Emily Sheng, Alex Dow, Meera Chander, Alex Chouldechova, Sharman Tan, Xiawei Wang, Ahmed Magooda, Mayank Gupta, Jean Garcia-Gathright, Chad Atalla, Dan Vann, Hanna Wallach, Hannah Washington, Meredith Rodden, Nadine Frey, Melissa Kirkwood, Nick Pangakis, Ali Azad, Ahmed Elghory Ghoneim, Shushan Arakleyan
Engineering: Mohamed Elmergawi, Jake Present, Aaron Aspinwall, Yeming Tang
Design: Sooyeon Hwang, Becky Haruyama
Special thanks: Roni Burd, Mohammad A, Heba Elfardy, Sandeep Atluri, Sydney Lister, Ram Shankar Siva Kumar, Andrew Gully

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos is subject to those third party's policies.

Telemetry

This project does not collect or send telemetry to Microsoft by default. Runs write local artifacts under artifacts/results/, and optional OpenTelemetry trace capture is controlled by your configuration and local collector setup, such as Phoenix.

If you configure a target, judge, trace collector, or model provider to send data to an external service, the prompts, responses, traces, metadata, and other evaluation artifacts sent to that service are governed by that service's terms and your configuration.

Disclaimer: Risks and limitations of ASSERT

See the full section in the Concept Doc.

Name		Name	Last commit message	Last commit date
Latest commit History 472 Commits
.azure-pipelines		.azure-pipelines
.devcontainer		.devcontainer
.github		.github
assert_ai		assert_ai
assets		assets
docs		docs
examples		examples
scripts		scripts
tests		tests
viewer		viewer
website		website
.cursorrules		.cursorrules
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
logo.jpg		logo.jpg
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASSERT.

Why ASSERT?

What you get with ASSERT

Get started

Quick install

Acknowledgments

Team and contributors

Trademarks

Telemetry

Disclaimer: Risks and limitations of ASSERT

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ASSERT.

Why ASSERT?

What you get with ASSERT

Get started

Quick install

Acknowledgments

Team and contributors

Trademarks

Telemetry

Disclaimer: Risks and limitations of ASSERT

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages