Skip to content

cognis-digital/embedaudit

EMBEDAUDIT

EMBEDAUDIT

Embedding / vector-store drift and poisoning audit

PyPI CI License: COCL 1.0 Suite

Data & Datasets — zero-setup quality, lineage, and governance.

pip install cognis-embedaudit

embedaudit scan .            # → prioritized findings in seconds

Usage — step by step

  1. Install the CLI:

    pip install embedaudit
  2. Audit a snapshot of your vector store — a JSONL file of embedding records — for near-duplicates and single-vector domination:

    embedaudit audit snapshot.jsonl --dup-threshold 0.999 --domination-share 0.30
  3. Compare against a trusted baseline to catch drift / poisoning between two snapshots:

    embedaudit drift baseline.jsonl current.jsonl --drift-threshold 0.15
  4. Read the output. Add --format json for a machine-readable report and a non-zero exit code when findings exceed your thresholds:

    embedaudit audit snapshot.jsonl --format json > report.json
  5. Wire it into CI — fail the build when an index regresses:

    embedaudit drift baseline.jsonl current.jsonl --format json || exit 1

Contents

Why embedaudit?

RAG ops niche

embedaudit is single-purpose, scriptable, and self-hostable: point it at a target, get prioritized results in the format your workflow already speaks (table · JSON · SARIF), gate CI on it, and let agents drive it over MCP.

Features

  • ✅ Load Jsonl

  • ✅ Audit Store

  • ✅ Drift Report

  • ✅ Runs on Linux/macOS/Windows · Docker · devcontainer

  • ✅ Ports in Python, JavaScript, Go, and Rust (ports/)

Quick start

pip install cognis-embedaudit

embedaudit --version

embedaudit scan .                       # scan current project

embedaudit scan . --format json         # machine-readable

embedaudit scan . --fail-on high        # CI gate (non-zero exit)

Example


$ embedaudit scan .

  [HIGH    ] EMB-001  example finding             (./src/app.py)

  [MEDIUM  ] EMB-002  another signal              (./config.yaml)



  2 findings · risk score 5 · 38ms

Architecture

flowchart LR
  IN[target / manifest] --> P[embedaudit<br/>checks + rules]
  P --> OUT[findings (JSON / SARIF)]
Loading

Use it from any AI stack

embedaudit is interoperable with every popular way of using AI:

  • MCP serverembedaudit mcp (Claude Desktop, Cursor, Cognis.Studio, uncensored-fleet)

  • OpenAI-compatible / JSON — pipe embedaudit scan . --format json into any agent or LLM

  • LangChain · CrewAI · AutoGen · LlamaIndex — wrap the CLI/JSON as a tool in one line

  • CI / scripts — exit codes + SARIF for non-AI pipelines

How it compares

| | Cognis embedaudit | RAG security |

|---|:---:|:---:|

| Self-hostable, no account | ✅ | varies |

| Single command, zero config | ✅ | ⚠️ |

| JSON + SARIF for CI | ✅ | varies |

| MCP-native (AI agents) | ✅ | ❌ |

| Polyglot ports (JS/Go/Rust) | ✅ | ❌ |

| Open license | ✅ COCL | varies |

Built in the spirit of RAG security, re-framed the Cognis way. Missing a credit? Open a PR.

Integrations

Pipes into your stack: SARIF for code-scanning, JSON for anything, an MCP server (embedaudit mcp) for AI agents, and a webhook forwarder for SIEM/Slack/Jira. See docs/INTEGRATIONS.md.

Install — every way, every platform

pip install "git+https://github.com/cognis-digital/embedaudit.git"    # pip (works today)

pipx install "git+https://github.com/cognis-digital/embedaudit.git"   # isolated CLI

uv tool install "git+https://github.com/cognis-digital/embedaudit.git" # uv

pip install cognis-embedaudit                                          # PyPI (when published)

docker run --rm ghcr.io/cognis-digital/embedaudit:latest --help        # Docker

brew install cognis-digital/tap/embedaudit                             # Homebrew tap

curl -fsSL https://raw.githubusercontent.com/cognis-digital/embedaudit/main/install.sh | sh

| Linux | macOS | Windows | Docker | Cloud |

|---|---|---|---|---|

| scripts/setup-linux.sh | scripts/setup-macos.sh | scripts/setup-windows.ps1 | docker run ghcr.io/cognis-digital/embedaudit | DEPLOY.md (AWS/Azure/GCP/k8s) |

Related Cognis tools

  • duckprobe — Zero-setup data-quality checks on any file or warehouse via DuckDB

  • schemadrift — Schema-change detector and data-contract tests

  • csvlens — Fast CLI for profiling and cleaning huge CSV / Parquet files

  • piiscan — PII discovery across warehouses and lakes (data-side scanner)

  • lineagemap — Column-level lineage extracted from SQL and dbt

  • datasetcard — Auto Dataset Cards / datasheets with Croissant + provenance

Explore the suite → 🗂️ all 170+ tools · ⭐ awesome-cognis · 🔗 cognis-sources · 🤖 uncensored-fleet · 🧠 engram

Contributing

PRs, new rules, and demo scenarios are welcome under the collaboration-pull model — see CONTRIBUTING.md and SECURITY.md.

⭐ If embedaudit saved you time, star it — it genuinely helps others find it.

Interoperability

{} composes with the 300+ tool Cognis suite — JSON in/out and a shared OpenAI-compatible /v1 backbone. See INTEROP.md for the suite map, composition patterns, and reference stacks.

License

Source-available under the Cognis Open Collaboration License (COCL) v1.0 — free for personal, internal-evaluation, research, and educational use; commercial / production use requires a license (licensing@cognis.digital). See LICENSE.


Cognis Digital · one of 170+ tools in the Cognis Neural Suite · Making Tomorrow Better Today