Choose your AI coding assistant.
Picky is a RAG-powered chat assistant that helps developers compare AI coding tools — Claude Code, Codex, Cursor, and GitHub Copilot — grounded in official docs, pricing pages, and changelogs.
- Side-by-side tool comparisons — "How does Cursor compare to Claude Code for refactoring?"
- Single-tool questions — "Does GitHub Copilot support custom instructions?"
- Task and migration questions — "I'm moving from Copilot to Cursor — what changes for me?"
- Pricing questions — "What's the cheapest plan that includes Claude Sonnet?"
Every answer is grounded in retrieved chunks from the official sources, with citations.
- Python 3.11+
- FastAPI for the ingestion service and the chat API
- LangChain (text splitters, HF embeddings wrapper, Pinecone vector store) — used lightly; orchestration is plain Python
- Pinecone Serverless (cosine, single namespace, metadata filtering)
- Hugging Face Serverless Inference API for embeddings (
Qwen/Qwen3-Embedding-0.6B, 1024-dim) and generation (Qwen/Qwen3-4B-Instruct-2507) - SQLite (stdlib
sqlite3) for crawl bookkeeping - BeautifulSoup4 + httpx for crawling
Both Python services (ingestion and api) share one virtual environment at the repo root. There is a single requirements.txt at the repo root — no per-service venvs, no per-service requirements files.
- Create the virtual environment:
python -m venv .venv
- Activate it:
- macOS / Linux:
source .venv/bin/activate - Windows:
.venv\Scripts\activate
- macOS / Linux:
- Install dependencies:
pip install -r requirements.txt
- Configure environment variables:
Then fill in
cp .env.example .env
HF_TOKEN(from https://huggingface.co/settings/tokens — read access is enough) andPINECONE_API_KEY. - Create the Pinecone index manually in the Pinecone dashboard:
- Name:
picky - Dimension:
1024 - Metric:
cosine - Type: Serverless, on
aws/us-east-1
- Name:
- Run the smoke test:
It should print the embedding dimension (1024) and the first 5 values, then exit 0.
python scripts/smoke_test_embeddings.py
The same
.venvis shared by the ingestion service and the API (added in later steps). Activate it once per shell session and you're good for both.
The first embedding call may take 15–30 seconds while the serverless model cold-starts on Hugging Face. The smoke test handles this automatically with one retry.
Start the FastAPI app (from the repo root, with the venv activated):
uvicorn ingestion.main:app --reload --port 8001Trigger a crawl — tools is optional (omit it or pass null to crawl every tool in config/seeds.yaml):
curl -X POST http://localhost:8001/ingest/crawl \
-H "Content-Type: application/json" \
-d '{"tools": ["cursor"], "force_recrawl": false}'The endpoint returns 202 immediately and runs the crawl in the background. To re-fetch pages that are already stored, pass "force_recrawl": true.
Check progress:
curl http://localhost:8001/ingest/statusYou'll get the in-progress flag and a breakdown of pages by status (discovered / fetched / failed / skipped).
The first full crawl may take several minutes — the docs sites for these tools link to a lot of pages, and the crawler is intentionally polite (one request at a time per host, with a delay between hits).
The crawler only stores raw HTML in SQLite. Cleaning, chunking, embedding, and Pinecone upserts happen in the pipeline (next section).
Once a crawl has finished and pages are sitting in ingestion.db with status='fetched', kick off the pipeline:
curl -X POST http://localhost:8001/ingest/index \
-H "Content-Type: application/json" \
-d '{"tools": ["cursor"], "force_reindex": false}'The endpoint returns 202 immediately and runs in the background. This is the step that calls the Hugging Face Serverless Inference API for embeddings — the first embedding call in a session may cold-start (15–30 s wait) while the model warms up on HF's side.
What the pipeline does per page:
- Pulls raw HTML from SQLite.
- Cleans it with BeautifulSoup + markdownify (drops nav/footer/scripts/etc.).
- Splits the markdown into chunks (~1500 chars, 150 overlap, heading-aware separators).
- Embeds each chunk with
Qwen/Qwen3-Embedding-0.6B(1024-dim). - Upserts vectors + metadata to Pinecone and records chunks in SQLite.
- Marks the page
status='indexed'and stores a SHA256 of the cleaned text.
Change detection: on the next run, if a page's cleaned-text hash matches what's stored, the pipeline skips re-embedding it (unchanged). Pass "force_reindex": true to bypass that check — useful when you change chunk size or swap the embedding model.
Check vector counts:
curl http://localhost:8001/ingest/statusThe response includes page_counts (with an indexed count) and chunks_count from SQLite. The Pinecone dashboard is the source of truth for the actual vector count in the index.
Once the index is populated, start the chat API (separate from the ingestion service):
uvicorn api.main:app --reload --port 8000On startup the app runs pinecone.verify_index() and instantiates the generator, so misconfiguration fails loud and early. The first /chat request may still wait 15–30 s for the HF generation model (Qwen/Qwen3-4B-Instruct-2507) to cold-start on serverless inference — the generator retries once automatically on a 503.
Ask a question:
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"question": "Does Cursor have a free tier?"}'Restrict retrieval to a subset of tools (any of claude-code, codex, cursor, github-copilot):
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"question": "How does it handle a monorepo?", "tools": ["cursor", "claude-code"]}'Response shape:
{
"answer": "...",
"sources": [
{"tool_name": "cursor", "page_title": "...", "source_url": "...", "section_heading": "...", "score": 0.78}
],
"fallback": false
}fallback: true means retrieval couldn't find anything above RETRIEVAL_MIN_SCORE (default 0.4). In that case the LLM is not called — the answer is a fixed "I don't have enough information" message and sources is empty. Tune RETRIEVAL_MIN_SCORE in .env if you want a stricter or looser threshold.
Two standalone scripts that don't require the FastAPI server to be running:
python scripts/smoke_test_embeddings.py— verifies the HF serverless embedding endpoint works (from the scaffold step).python scripts/smoke_test_crawler.py— wipesingestion.db, crawlshttps://www.cursor.com/pricingas a single page-scope seed, and asserts the page row is stored withstatus='fetched',http_status=200, and non-empty HTML.python scripts/smoke_test_pipeline.py— end-to-end: wipes the DB, crawlshttps://docs.cursor.com/en/welcome, runs the pipeline, then asserts the page isindexed, chunks landed in SQLite, and at least one vector is fetchable from Pinecone with the expectedtool_namemetadata.python scripts/smoke_test_chat.py— exercises the chat API in-process via FastAPI'sTestClient. Hits/health, an in-domain question, an off-topic question (expects fallback), and a tools-filtered question. Requires the Pinecone index to already be populated — run the ingestion crawl + index first, otherwise the in-domain question will trip the fallback branch.
picky/
├── README.md
├── .env.example
├── .gitignore
├── requirements.txt
├── config/ # seed URLs and crawl config
├── ingestion/ # crawler + chunker + indexer service (added later)
├── api/ # chat API service (added later)
├── shared/ # config and embeddings client used by both services
└── scripts/ # one-off scripts (smoke tests, utilities)
- Scaffold ✅
- Ingestion crawler (raw HTML → SQLite) ✅
- Ingestion pipeline (clean → chunk → embed → Pinecone) ✅
- Chat API ✅
- Frontend ⏳