Skip to content

Guidely-org/picky

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Picky

Choose your AI coding assistant.

Picky is a RAG-powered chat assistant that helps developers compare AI coding tools — Claude Code, Codex, Cursor, and GitHub Copilot — grounded in official docs, pricing pages, and changelogs.

What it does

  • Side-by-side tool comparisons — "How does Cursor compare to Claude Code for refactoring?"
  • Single-tool questions — "Does GitHub Copilot support custom instructions?"
  • Task and migration questions — "I'm moving from Copilot to Cursor — what changes for me?"
  • Pricing questions — "What's the cheapest plan that includes Claude Sonnet?"

Every answer is grounded in retrieved chunks from the official sources, with citations.

Tech stack

  • Python 3.11+
  • FastAPI for the ingestion service and the chat API
  • LangChain (text splitters, HF embeddings wrapper, Pinecone vector store) — used lightly; orchestration is plain Python
  • Pinecone Serverless (cosine, single namespace, metadata filtering)
  • Hugging Face Serverless Inference API for embeddings (Qwen/Qwen3-Embedding-0.6B, 1024-dim) and generation (Qwen/Qwen3-4B-Instruct-2507)
  • SQLite (stdlib sqlite3) for crawl bookkeeping
  • BeautifulSoup4 + httpx for crawling

Setup

Both Python services (ingestion and api) share one virtual environment at the repo root. There is a single requirements.txt at the repo root — no per-service venvs, no per-service requirements files.

  1. Create the virtual environment:
    python -m venv .venv
  2. Activate it:
    • macOS / Linux: source .venv/bin/activate
    • Windows: .venv\Scripts\activate
  3. Install dependencies:
    pip install -r requirements.txt
  4. Configure environment variables:
    cp .env.example .env
    Then fill in HF_TOKEN (from https://huggingface.co/settings/tokens — read access is enough) and PINECONE_API_KEY.
  5. Create the Pinecone index manually in the Pinecone dashboard:
    • Name: picky
    • Dimension: 1024
    • Metric: cosine
    • Type: Serverless, on aws / us-east-1
  6. Run the smoke test:
    python scripts/smoke_test_embeddings.py
    It should print the embedding dimension (1024) and the first 5 values, then exit 0.

The same .venv is shared by the ingestion service and the API (added in later steps). Activate it once per shell session and you're good for both.

The first embedding call may take 15–30 seconds while the serverless model cold-starts on Hugging Face. The smoke test handles this automatically with one retry.

Running the ingestion service

Start the FastAPI app (from the repo root, with the venv activated):

uvicorn ingestion.main:app --reload --port 8001

Trigger a crawl — tools is optional (omit it or pass null to crawl every tool in config/seeds.yaml):

curl -X POST http://localhost:8001/ingest/crawl \
  -H "Content-Type: application/json" \
  -d '{"tools": ["cursor"], "force_recrawl": false}'

The endpoint returns 202 immediately and runs the crawl in the background. To re-fetch pages that are already stored, pass "force_recrawl": true.

Check progress:

curl http://localhost:8001/ingest/status

You'll get the in-progress flag and a breakdown of pages by status (discovered / fetched / failed / skipped).

The first full crawl may take several minutes — the docs sites for these tools link to a lot of pages, and the crawler is intentionally polite (one request at a time per host, with a delay between hits).

The crawler only stores raw HTML in SQLite. Cleaning, chunking, embedding, and Pinecone upserts happen in the pipeline (next section).

Running the pipeline

Once a crawl has finished and pages are sitting in ingestion.db with status='fetched', kick off the pipeline:

curl -X POST http://localhost:8001/ingest/index \
  -H "Content-Type: application/json" \
  -d '{"tools": ["cursor"], "force_reindex": false}'

The endpoint returns 202 immediately and runs in the background. This is the step that calls the Hugging Face Serverless Inference API for embeddings — the first embedding call in a session may cold-start (15–30 s wait) while the model warms up on HF's side.

What the pipeline does per page:

  1. Pulls raw HTML from SQLite.
  2. Cleans it with BeautifulSoup + markdownify (drops nav/footer/scripts/etc.).
  3. Splits the markdown into chunks (~1500 chars, 150 overlap, heading-aware separators).
  4. Embeds each chunk with Qwen/Qwen3-Embedding-0.6B (1024-dim).
  5. Upserts vectors + metadata to Pinecone and records chunks in SQLite.
  6. Marks the page status='indexed' and stores a SHA256 of the cleaned text.

Change detection: on the next run, if a page's cleaned-text hash matches what's stored, the pipeline skips re-embedding it (unchanged). Pass "force_reindex": true to bypass that check — useful when you change chunk size or swap the embedding model.

Check vector counts:

curl http://localhost:8001/ingest/status

The response includes page_counts (with an indexed count) and chunks_count from SQLite. The Pinecone dashboard is the source of truth for the actual vector count in the index.

Running the chat API

Once the index is populated, start the chat API (separate from the ingestion service):

uvicorn api.main:app --reload --port 8000

On startup the app runs pinecone.verify_index() and instantiates the generator, so misconfiguration fails loud and early. The first /chat request may still wait 15–30 s for the HF generation model (Qwen/Qwen3-4B-Instruct-2507) to cold-start on serverless inference — the generator retries once automatically on a 503.

Ask a question:

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"question": "Does Cursor have a free tier?"}'

Restrict retrieval to a subset of tools (any of claude-code, codex, cursor, github-copilot):

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"question": "How does it handle a monorepo?", "tools": ["cursor", "claude-code"]}'

Response shape:

{
  "answer": "...",
  "sources": [
    {"tool_name": "cursor", "page_title": "...", "source_url": "...", "section_heading": "...", "score": 0.78}
  ],
  "fallback": false
}

fallback: true means retrieval couldn't find anything above RETRIEVAL_MIN_SCORE (default 0.4). In that case the LLM is not called — the answer is a fixed "I don't have enough information" message and sources is empty. Tune RETRIEVAL_MIN_SCORE in .env if you want a stricter or looser threshold.

Smoke tests

Two standalone scripts that don't require the FastAPI server to be running:

  • python scripts/smoke_test_embeddings.py — verifies the HF serverless embedding endpoint works (from the scaffold step).
  • python scripts/smoke_test_crawler.py — wipes ingestion.db, crawls https://www.cursor.com/pricing as a single page-scope seed, and asserts the page row is stored with status='fetched', http_status=200, and non-empty HTML.
  • python scripts/smoke_test_pipeline.py — end-to-end: wipes the DB, crawls https://docs.cursor.com/en/welcome, runs the pipeline, then asserts the page is indexed, chunks landed in SQLite, and at least one vector is fetchable from Pinecone with the expected tool_name metadata.
  • python scripts/smoke_test_chat.py — exercises the chat API in-process via FastAPI's TestClient. Hits /health, an in-domain question, an off-topic question (expects fallback), and a tools-filtered question. Requires the Pinecone index to already be populated — run the ingestion crawl + index first, otherwise the in-domain question will trip the fallback branch.

Project structure

picky/
├── README.md
├── .env.example
├── .gitignore
├── requirements.txt
├── config/         # seed URLs and crawl config
├── ingestion/      # crawler + chunker + indexer service (added later)
├── api/            # chat API service (added later)
├── shared/         # config and embeddings client used by both services
└── scripts/        # one-off scripts (smoke tests, utilities)

Status

  • Scaffold ✅
  • Ingestion crawler (raw HTML → SQLite) ✅
  • Ingestion pipeline (clean → chunk → embed → Pinecone) ✅
  • Chat API ✅
  • Frontend ⏳

About

Picky helps you choose your AI coding assistant

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages