EMBEDAUDIT

Embedding / vector-store drift and poisoning audit

Data & Datasets — zero-setup quality, lineage, and governance.

pip install cognis-embedaudit

embedaudit scan .            # → prioritized findings in seconds

Usage — step by step

Install the CLI:
```
pip install embedaudit
```
Audit a snapshot of your vector store — a JSONL file of embedding records — for near-duplicates and single-vector domination:
```
embedaudit audit snapshot.jsonl --dup-threshold 0.999 --domination-share 0.30
```
Compare against a trusted baseline to catch drift / poisoning between two snapshots:
```
embedaudit drift baseline.jsonl current.jsonl --drift-threshold 0.15
```
Read the output. Add --format json for a machine-readable report and a non-zero exit code when findings exceed your thresholds:
```
embedaudit audit snapshot.jsonl --format json > report.json
```

Wire it into CI — fail the build when an index regresses:

embedaudit drift baseline.jsonl current.jsonl --format json || exit 1

Why embedaudit?

RAG ops niche

embedaudit is single-purpose, scriptable, and self-hostable: point it at a target, get prioritized results in the format your workflow already speaks (table · JSON · SARIF), gate CI on it, and let agents drive it over MCP.

↑ back to top

Features

✅ Load Jsonl
✅ Audit Store
✅ Drift Report
✅ Runs on Linux/macOS/Windows · Docker · devcontainer
✅ Ports in Python, JavaScript, Go, and Rust (ports/)

↑ back to top

Quick start

pip install cognis-embedaudit

embedaudit --version

embedaudit scan .                       # scan current project

embedaudit scan . --format json         # machine-readable

embedaudit scan . --fail-on high        # CI gate (non-zero exit)

↑ back to top

Example


$ embedaudit scan .

  [HIGH    ] EMB-001  example finding             (./src/app.py)

  [MEDIUM  ] EMB-002  another signal              (./config.yaml)



  2 findings · risk score 5 · 38ms

↑ back to top

Architecture

flowchart LR
  IN[target / manifest] --> P[embedaudit<br/>checks + rules]
  P --> OUT[findings (JSON / SARIF)]

↑ back to top

Use it from any AI stack

embedaudit is interoperable with every popular way of using AI:

MCP server — embedaudit mcp (Claude Desktop, Cursor, Cognis.Studio, uncensored-fleet)
OpenAI-compatible / JSON — pipe embedaudit scan . --format json into any agent or LLM
LangChain · CrewAI · AutoGen · LlamaIndex — wrap the CLI/JSON as a tool in one line
CI / scripts — exit codes + SARIF for non-AI pipelines

↑ back to top

How it compares

| | Cognis embedaudit | RAG security |

|---|:---:|:---:|

| Self-hostable, no account | ✅ | varies |

| Single command, zero config | ✅ | ⚠️ |

| JSON + SARIF for CI | ✅ | varies |

| MCP-native (AI agents) | ✅ | ❌ |

| Polyglot ports (JS/Go/Rust) | ✅ | ❌ |

| Open license | ✅ COCL | varies |

Built in the spirit of RAG security, re-framed the Cognis way. Missing a credit? Open a PR.

↑ back to top

Integrations

Pipes into your stack: SARIF for code-scanning, JSON for anything, an MCP server (embedaudit mcp) for AI agents, and a webhook forwarder for SIEM/Slack/Jira. See docs/INTEGRATIONS.md.

↑ back to top

Install — every way, every platform

pip install "git+https://github.com/cognis-digital/embedaudit.git"    # pip (works today)

pipx install "git+https://github.com/cognis-digital/embedaudit.git"   # isolated CLI

uv tool install "git+https://github.com/cognis-digital/embedaudit.git" # uv

pip install cognis-embedaudit                                          # PyPI (when published)

docker run --rm ghcr.io/cognis-digital/embedaudit:latest --help        # Docker

brew install cognis-digital/tap/embedaudit                             # Homebrew tap

curl -fsSL https://raw.githubusercontent.com/cognis-digital/embedaudit/main/install.sh | sh

|---|---|---|---|---|

↑ back to top

Related Cognis tools

duckprobe — Zero-setup data-quality checks on any file or warehouse via DuckDB
schemadrift — Schema-change detector and data-contract tests
csvlens — Fast CLI for profiling and cleaning huge CSV / Parquet files
piiscan — PII discovery across warehouses and lakes (data-side scanner)
lineagemap — Column-level lineage extracted from SQL and dbt
datasetcard — Auto Dataset Cards / datasheets with Croissant + provenance

Explore the suite → 🗂️ all 170+ tools · ⭐ awesome-cognis · 🔗 cognis-sources · 🤖 uncensored-fleet · 🧠 engram

↑ back to top

Contributing

PRs, new rules, and demo scenarios are welcome under the collaboration-pull model — see CONTRIBUTING.md and SECURITY.md.

⭐ If embedaudit saved you time, star it — it genuinely helps others find it.

Interoperability

{} composes with the 300+ tool Cognis suite — JSON in/out and a shared OpenAI-compatible /v1 backbone. See INTEROP.md for the suite map, composition patterns, and reference stacks.

License

Source-available under the Cognis Open Collaboration License (COCL) v1.0 — free for personal, internal-evaluation, research, and educational use; commercial / production use requires a license (licensing@cognis.digital). See LICENSE.

_{Cognis Digital · one of 170+ tools in the Cognis Neural Suite · Making Tomorrow Better Today}

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.cognis		.cognis
.devcontainer		.devcontainer
.github		.github
demos		demos
deploy		deploy
docs		docs
embedaudit		embedaudit
integrations		integrations
ports		ports
scripts		scripts
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
INTEGRATIONS.md		INTEGRATIONS.md
INTEROP.md		INTEROP.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
VERSION		VERSION
install.sh		install.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EMBEDAUDIT

Embedding / vector-store drift and poisoning audit

Usage — step by step

Contents

Why embedaudit?

Features

Quick start

Example

Architecture

Use it from any AI stack

How it compares

Integrations

Install — every way, every platform

Related Cognis tools

Contributing

⭐ If `embedaudit` saved you time, star it — it genuinely helps others find it.

Interoperability

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EMBEDAUDIT

Embedding / vector-store drift and poisoning audit

Usage — step by step

Contents

Why embedaudit?

Features

Quick start

Example

Architecture

Use it from any AI stack

How it compares

Integrations

Install — every way, every platform

Related Cognis tools

Contributing

⭐ If embedaudit saved you time, star it — it genuinely helps others find it.

Interoperability

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

⭐ If `embedaudit` saved you time, star it — it genuinely helps others find it.

Packages