Warning
Alpha software. Needle-X is still in active development and test. Install flow, local state layout, CLI details, and output shape may still change.
Turn messy web pages into compact, proof-carrying context for AI agents.
Smaller packets. Fewer hops. Real provenance.
- Smaller output Needle-X returns much less context than extraction-heavy tools.
- Source-backed It carries proof, not just extracted text.
- Less cleanup A downstream agent does less work before it can act.
| Metric | Needle-X | Tavily | Jina | Firecrawl |
|---|---|---|---|---|
| Avg packet bytes | 4436 | 6975 | 30565 | 72166 |
| Claim-to-source steps | 1 | 2 | 2 | 2 |
| Post-processing burden | 0.25 | 1.92 | 1.86 | 2.50 |
| Proof usability | 1.0 | 0 | 0 | 0 |
Needle-X vs Jina:
- about 85.5% smaller packets
This is the current sweet spot:
- compact context
- direct verification
- low-friction agent consumption
Needle-X includes local Discovery Memory backed by SQLite.
The story is simple:
- first run observes and compiles
- later runs reuse local verified evidence
- repeated use improves local retrieval without hosted infra
Discovery Memory is enabled by default and stored in the PAL state root. Dense embeddings are mandatory: the installer creates a PAL SSOT config at <state-root>/configs/needlex.json with local Ollama embeddinggemma as the default no-key embedding backend.
Current verified seeded result on seeded-corpus-v2:
- 100/100 selected-url correctness
- 100/100 proof usability
- 100/100 runtime success
Guardrail:
- seeded-runtime claim
- not a blanket cold-state open-web seedless claim
- Discovery Memory warm-state stress is tracked separately from the seeded runtime score
readquerycrawlproofreplaydiffmemory stats/search/prune/export/import/rebuild-indexanalytics stats/recent/value-report/hosts/providers/failures/daily/exportlogs path/stats/tailsupport bundledoctorconfig path/show/init/set
Default output is AI-first:
- compact packet first
- proof inline when useful
- full diagnostics only on demand
- browser-like fetch by default for real-world targets
- local memory is populated automatically by successful
read,query, andcrawlruns - MCP server accepts both standard
Content-Lengthframing and raw newline-delimited JSON
MCP advertises 9 tools: 7 core web_* tools plus memory and analytics.
The non-core memory and analytics surfaces use an explicit action parameter to avoid bloating agent tool lists with maintenance and observability operations.
needlex read https://example.com --json
needlex query https://example.com --goal "pricing" --json
needlex proof proof_1 --json
needlex analytics stats
needlex analytics value-report
needlex logs stats
needlex support bundle --out /tmp/needlex-support
needlex doctoranalytics stats gives quick operational counters plus saved chars/tokens. analytics value-report is the fuller value view with estimated cost scenarios.
logs stats shows the PAL runtime log state used for clean CLI/MCP diagnostics.
support bundle exports a maintainer-friendly diagnostic directory with doctor, analytics, and runtime logs.
Linux and macOS:
curl -fsSL https://raw.githubusercontent.com/Josepavese/needlex/main/install/install.sh | bashWindows:
irm https://raw.githubusercontent.com/Josepavese/needlex/main/install/install.ps1 | iexInstalled command:
needlex
This installer downloads the right release binary. Full details:
The installer also prepares semantic prerequisites:
- installs or verifies Ollama where the platform supports automated install
- pulls the default embedding model
embeddinggemma:latest - writes the PAL-home SSOT config
- wires the
needlexwrapper to that config so users do not export env vars per command - installs a PAL-local headless render browser
- enables render in the PAL config for JavaScript-rendered sites
Change defaults with:
needlex config show
needlex config set semantic.provider_model nomic-embed-text:latestNeedle-X also ships an optional Codex skill that tells agents when to use Needle-X for web retrieval, when to escalate to browser/raw fetch tools, and how to avoid treating compact context as full DOM coverage.
Skill path:
Codex install helper:
python3 ~/.codex/skills/.system/skill-installer/scripts/install-skill-from-github.py --repo Josepavese/needlex --path skills/needlex-web-retrievalAfter installing the skill, restart Codex so it can discover it.
- browser agent
- search engine
- generic scraper
- LLM-first reader


