Skip to content

cognis-digital/csvlens

CSVLENS

CSVLENS

Fast CLI for profiling and cleaning huge CSV / Parquet files

PyPI CI License: COCL 1.0 Suite

Data & Datasets — zero-setup quality, lineage, and governance.

pip install cognis-csvlens
csvlens scan .            # → prioritized findings in seconds

Usage — step by step

  1. Install the CLI (Python 3.9+):

    pip install csvlens        # or: pip install .   from a checkout
  2. Profile a CSV — the profile subcommand infers column types and reports stats (nulls, distinct, min/max/mean):

    csvlens profile data.csv
  3. Peek at rows or project columns by name:

    csvlens head data.csv -n 20
    csvlens select data.csv -c name,email,signup_date -n 100
  4. Clean a file — trim, dedupe, drop empty rows, and fill nulls, writing to an output path:

    csvlens clean data.csv -o clean.csv --fill-null NA
  5. Read profiles programmatically with the global --format json flag (it precedes the subcommand) and gate data quality in CI:

    csvlens --format json profile data.csv | jq '.column_stats[] | select(.nulls > 0)'

Contents

Why csvlens?

single-binary data utility, viral

csvlens is single-purpose, scriptable, and self-hostable: point it at a target, get prioritized results in the format your workflow already speaks (table · JSON · SARIF), gate CI on it, and let agents drive it over MCP.

Features

  • ✅ Detect Dialect
  • ✅ Profile Csv
  • ✅ Clean Csv
  • ✅ Head Csv
  • ✅ Select Columns
  • ✅ Runs on Linux/macOS/Windows · Docker · devcontainer
  • ✅ Ports in Python, JavaScript, Go, and Rust (ports/)

Quick start

pip install cognis-csvlens
csvlens --version
csvlens scan .                       # scan current project
csvlens scan . --format json         # machine-readable
csvlens scan . --fail-on high        # CI gate (non-zero exit)

Example

$ csvlens scan .
  [HIGH    ] CSV-001  example finding             (./src/app.py)
  [MEDIUM  ] CSV-002  another signal              (./config.yaml)

  2 findings · risk score 5 · 38ms

Architecture

flowchart LR
  IN[input] --> P[csvlens<br/>analyze + score]
  P --> OUT[report]
Loading

Use it from any AI stack

csvlens is interoperable with every popular way of using AI:

  • MCP servercsvlens mcp (Claude Desktop, Cursor, Cognis.Studio, uncensored-fleet)
  • OpenAI-compatible / JSON — pipe csvlens scan . --format json into any agent or LLM
  • LangChain · CrewAI · AutoGen · LlamaIndex — wrap the CLI/JSON as a tool in one line
  • CI / scripts — exit codes + SARIF for non-AI pipelines

How it compares

Cognis csvlens xsv
Self-hostable, no account varies
Single command, zero config ⚠️
JSON + SARIF for CI varies
MCP-native (AI agents)
Polyglot ports (JS/Go/Rust)
Open license ✅ COCL varies

Built in the spirit of xsv / qsv, re-framed the Cognis way. Missing a credit? Open a PR.

Integrations

Pipes into your stack: SARIF for code-scanning, JSON for anything, an MCP server (csvlens mcp) for AI agents, and a webhook forwarder for SIEM/Slack/Jira. See docs/INTEGRATIONS.md.

Install — every way, every platform

pip install "git+https://github.com/cognis-digital/csvlens.git"    # pip (works today)
pipx install "git+https://github.com/cognis-digital/csvlens.git"   # isolated CLI
uv tool install "git+https://github.com/cognis-digital/csvlens.git" # uv
pip install cognis-csvlens                                          # PyPI (when published)
docker run --rm ghcr.io/cognis-digital/csvlens:latest --help        # Docker
brew install cognis-digital/tap/csvlens                             # Homebrew tap
curl -fsSL https://raw.githubusercontent.com/cognis-digital/csvlens/main/install.sh | sh
Linux macOS Windows Docker Cloud
scripts/setup-linux.sh scripts/setup-macos.sh scripts/setup-windows.ps1 docker run ghcr.io/cognis-digital/csvlens DEPLOY.md (AWS/Azure/GCP/k8s)

Related Cognis tools

  • duckprobe — Zero-setup data-quality checks on any file or warehouse via DuckDB
  • schemadrift — Schema-change detector and data-contract tests
  • piiscan — PII discovery across warehouses and lakes (data-side scanner)
  • lineagemap — Column-level lineage extracted from SQL and dbt
  • datasetcard — Auto Dataset Cards / datasheets with Croissant + provenance
  • seedforge — Synthetic test-data generator with referential integrity

Explore the suite → 🗂️ all 170+ tools · ⭐ awesome-cognis · 🔗 cognis-sources · 🤖 uncensored-fleet · 🧠 engram

Contributing

PRs, new rules, and demo scenarios are welcome under the collaboration-pull model — see CONTRIBUTING.md and SECURITY.md.

⭐ If csvlens saved you time, star it — it genuinely helps others find it.

Interoperability

{} composes with the 300+ tool Cognis suite — JSON in/out and a shared OpenAI-compatible /v1 backbone. See INTEROP.md for the suite map, composition patterns, and reference stacks.

License

Source-available under the Cognis Open Collaboration License (COCL) v1.0 — free for personal, internal-evaluation, research, and educational use; commercial / production use requires a license (licensing@cognis.digital). See LICENSE.


Cognis Digital · one of 170+ tools in the Cognis Neural Suite · Making Tomorrow Better Today