Notice: Development on Noctis is currently halted. I have other obligations to work on first and won't be actively maintaining or adding features for the time being. The project is open-source and the code is fully functional — feel free to use it, fork it, or contribute, but don't expect timely responses to issues or pull requests.
A clean, self-hosted search frontend built with FastAPI and Jinja2. Aggregates results from SearXNG, DuckDuckGo, and a local full-text index concurrently and presents them through a unified custom UI. No API keys required.
No trackers. No analytics. No search logs.
- Fast, minimal dark-mode UI (light mode available in settings)
- Multi-backend search: SearXNG + DuckDuckGo + local index, run concurrently and merged
- Local full-text search — Tantivy-powered index built from your own crawled pages; no external service
- BM25 ranking with freshness and domain authority boosts — recently crawled, well-linked pages rank higher
- Query expansion — Porter stemming and an optional synonym map for better recall
- Image search with a responsive thumbnail grid — falls back to DuckDuckGo when SearXNG is unavailable
- Results appearing across multiple backends are ranked higher
- Per-user settings via cookies: sources, safe search, language, link behavior, theme
- Graceful per-backend error isolation — one source going down doesn't break the page
- Result filtering: restrict by site/keyword, sort by relevance or domain
- Search result cache backed by SQLite — configurable TTL, zero-latency repeat queries, safe search–aware cache keys
- Web crawler with
robots.txtsupport, full-text body extraction, and domain authority tracking - Priority-queue URL frontier — crawl priority based on inbound link count and depth
- Scheduled crawling via APScheduler — daily by default, configurable cron expression
- Admin dashboard at
/admin/index— index stats, crawl history, cache metrics, manual controls /healthzendpoint for uptime monitoring
| Layer | Tech |
|---|---|
| Backend | FastAPI + Uvicorn |
| Templates | Jinja2 |
| Styling | Plain CSS (no frameworks) |
| HTTP client | httpx (async) |
| Search backends | SearXNG (self-hosted), DuckDuckGo, local Tantivy index |
| Full-text index | Tantivy (embedded, no daemon) |
| Scheduler | APScheduler 3.x (AsyncIOScheduler) |
| Query expansion | snowballstemmer (Porter stemming) |
| Storage | SQLite via aiosqlite |
| HTML parsing | BeautifulSoup4 |
- Python 3.13+
- Rust toolchain (
cargo) — required to build Tantivy from source if no pre-built wheel is available for your platform. Install via rustup.rs. - A running SearXNG instance with the JSON format enabled (optional — DuckDuckGo works without it)
Noctis is developed on CachyOS and tested on Debian-based systems (Debian v12+, Ubuntu 24.04 LTS+, RPi OS v12+). It should work on most Linux distributions.
In your SearXNG settings.yml, make sure json is listed under search.formats:
search:
formats:
- html
- jsonRestart SearXNG after changing this.
One-line install:
bash <(curl -fsSL https://raw.githubusercontent.com/aster1630/Noctis/main/deploy/install.sh)The installer:
- Clones the repo
- Creates a venv and installs dependencies
- Asks for your SearXNG URL (optional)
- Writes
.env - Installs and starts the systemd service
Running it again on an existing install pulls the latest changes and restarts the service.
After the base install, see deploy/README.md for step-by-step guides on:
- Cloudflare Tunnel — public HTTPS on your own domain, no open ports
- Tailscale — private access across your own devices via WireGuard mesh
git clone https://github.com/aster1630/Noctis.git
cd Noctis
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env — set SEARXNG_URL and ADMIN_TOKEN at minimum
uvicorn app.main:app --reloadOpen http://localhost:8000.
Copy .env.example to .env and adjust as needed:
| Variable | Default | Description |
|---|---|---|
SEARXNG_URL |
(unset) | URL of your SearXNG instance — leave unset to use DDG only |
SEARXNG_TIMEOUT |
8 |
SearXNG request timeout in seconds |
ENABLED_BACKENDS |
ddg,searxng |
Comma-separated list of active backends |
| Variable | Default | Description |
|---|---|---|
DB_PATH |
noctis.db |
Path to the SQLite database file |
CACHE_TTL_SECONDS |
604800 |
Search result cache TTL (default: 7 days) |
TANTIVY_INDEX_PATH |
./noctis_index |
Directory for the Tantivy full-text index |
| Variable | Default | Description |
|---|---|---|
CRAWL_SCHEDULE |
0 2 * * * |
Cron expression for the scheduled crawl (daily at 2 AM) |
CRAWL_MAX_PAGES |
200 |
Max pages per scheduled crawl run |
| Variable | Default | Description |
|---|---|---|
ADMIN_TOKEN |
(unset) | Token required to access /admin/index. Leave unset to disable admin. |
| Variable | Default | Description |
|---|---|---|
FRESHNESS_WINDOW_DAYS |
7 |
Docs crawled within this many days get a freshness boost |
FRESHNESS_WEIGHT |
0.1 |
Freshness boost multiplier applied to BM25 score |
AUTHORITY_WEIGHT |
0.2 |
Domain authority boost multiplier |
SYNONYMS_PATH |
config/synonyms.json |
Path to the optional synonym map for query expansion |
| Variable | Default | Description |
|---|---|---|
APP_HOST |
0.0.0.0 |
Bind address |
APP_PORT |
8000 |
Bind port |
| Route | Description |
|---|---|
GET / |
Homepage with search bar |
GET /search?q=…&page=… |
Web results page |
GET /search?q=…&type=images |
Image results grid |
GET /search?q=…&source=local |
Force results from local index only |
GET /search?q=…&source=remote |
Force results from remote backends only |
GET /filters?q=… |
Filter/sort panel for the current search |
GET /settings |
User settings (saved as cookies) |
GET /privacy |
Privacy policy |
GET /terms |
Terms of service |
GET /support |
Support / donate page |
GET /healthz |
Health check — returns {"status": "ok"} |
| Route | Description |
|---|---|
GET /admin/index?token=… |
Admin dashboard (HTML) |
GET /admin/api/index-status?token=… |
Index stats, cache ratio, top queries (JSON) |
GET /admin/api/crawl-status?token=… |
Recent crawl runs and next scheduled time (JSON) |
POST /admin/api/crawl?token=… |
Trigger a one-off crawl run |
POST /admin/api/rebuild-index?token=… |
Rebuild Tantivy index from all crawled pages |
POST /admin/api/clear-cache?token=… |
Delete expired cache rows |
All admin API endpoints also accept the token via X-Admin-Token header for programmatic access.
See docs/admin.md for setup and usage details.
Noctis can build and search its own full-text index from pages your crawler has visited — no SearXNG or DDG required for those queries.
See docs/local-index.md for how to seed the crawler, monitor indexing, and tune ranking.
Noctis/
├── app/
│ ├── main.py # FastAPI app and routes
│ ├── filters.py # Result filtering and sorting logic
│ ├── settings.py # Cookie-based user settings
│ ├── scheduler.py # APScheduler crawl scheduling
│ ├── search/
│ │ ├── __init__.py # Re-exports public search API
│ │ ├── engine.py # Multi-backend aggregator, dedup, cache integration
│ │ ├── ranking.py # Freshness + authority BM25 boosts
│ │ └── query_expansion.py # Porter stemming + synonym map
│ ├── backends/
│ │ ├── __init__.py # BackendRegistry initialisation
│ │ ├── registry.py # Pluggable BackendRegistry class
│ │ ├── base.py # Shared types (Backend protocol, BackendResult)
│ │ ├── local.py # Local Tantivy index backend
│ │ ├── searxng.py # SearXNG JSON API client
│ │ └── ddg.py # DuckDuckGo client
│ ├── indexing/
│ │ ├── schema.py # Tantivy field schema definition
│ │ ├── index.py # Index open/create/reset singleton
│ │ └── writer.py # Async document writer (write-locked)
│ ├── storage/
│ │ ├── db.py # Async SQLite connection + DB init
│ │ ├── cache.py # Result cache (read/write/purge)
│ │ ├── models.py # Storage dataclasses
│ │ └── migrations/
│ │ ├── 001_init.sql # result_cache, crawl_pages, crawl_queue
│ │ └── 002_fts_body.sql # crawl_body, domain_authority, crawl_stats, crawl_frontier
│ ├── crawler/
│ │ ├── crawler.py # Async web crawler (full-text + metadata)
│ │ ├── frontier.py # Priority-queue URL frontier
│ │ ├── robots.py # robots.txt parser (cached per domain)
│ │ └── storage.py # Crawler DB helpers (pages, body, authority)
│ ├── admin/
│ │ ├── __init__.py # Admin APIRouter
│ │ └── api.py # Admin JSON endpoints
│ ├── templates/
│ │ ├── base.html
│ │ ├── index.html
│ │ ├── results.html
│ │ ├── filters.html
│ │ ├── settings.html
│ │ ├── admin/
│ │ │ └── index.html # Admin dashboard
│ │ ├── privacy.html
│ │ ├── terms.html
│ │ └── support.html
│ └── static/
│ ├── style.css
│ └── favicon.ico
├── config/
│ └── synonyms.json # Optional synonym map for query expansion
├── docs/
│ ├── admin.md # Admin dashboard setup and usage
│ ├── local-index.md # Local search index guide
│ └── performance.md # Load test plan (v0.4.0 deferred)
├── tests/
│ ├── conftest.py
│ ├── test_backend_registry.py
│ ├── test_cache.py
│ ├── test_cache_safesearch.py
│ ├── test_crawler.py
│ ├── test_frontier.py
│ ├── test_image_search_isolation.py
│ ├── test_local_backend.py
│ ├── test_query_expansion.py
│ ├── test_ranking.py
│ ├── test_admin_api.py
│ └── test_search_fallback.py
├── deploy/
│ ├── noctis.service # systemd unit
│ ├── cloudflared-config.yml
│ ├── install.sh
│ ├── uninstall.sh
│ └── README.md # Hosting guide (Cloudflare Tunnel + Tailscale)
├── pytest.ini
├── requirements.txt
├── requirements-dev.txt
└── .env.example- The route reads
settings.backendsfrom the user's cookie (or uses the?source=local/remoteoverride). BackendRegistry.get_by_names()returns only backends that are both named and currently available. Thelocalbackend reports as unavailable when the Tantivy index is empty, so it is silently skipped until the crawler has run.- All selected backends run concurrently via
asyncio.gather(). Results are merged and de-duplicated by normalised URL; results appearing in more backends float to the top. - If all selected backends fail,
get_fallback_chain()retries with the remaining available backends before returning an error.
The Tantivy index lives on disk at TANTIVY_INDEX_PATH (default ./noctis_index). It is built incrementally — each crawled page is indexed immediately after being saved. A full rebuild can be triggered from the admin dashboard.
Queries against the local backend are expanded with Porter stemming and an optional synonym map before being sent to Tantivy. BM25 scores are then boosted by freshness (crawled within FRESHNESS_WINDOW_DAYS) and domain authority (inbound link count, normalised 0–1).
Web search results are cached in SQLite keyed on SHA-256(query + page + sorted_backends + safesearch). Cache entries expire after CACHE_TTL_SECONDS. The cache is checked before any backend is called and written through after a successful response. Cache misses and DB errors fall through silently.
The crawler fetches HTML, extracts title, snippet, full body text, h1 and h2 headings, and discovers outbound links. It respects robots.txt per domain (cached in-process with lru_cache). Discovered links are pushed into a priority-queue frontier (crawl_frontier table) and processed by run_crawl_worker().
URL frontier priority is computed as:
domain_authority × 0.6 + (1 / depth) × 0.4
After each crawl batch, domain authority scores are renormalised to [0, 1] based on the maximum inbound link count across all known domains.
The crawler runs on a cron schedule defined by CRAWL_SCHEDULE via APScheduler's AsyncIOScheduler. It starts automatically with the FastAPI app and shuts down cleanly on exit. A one-off crawl can also be triggered from the admin dashboard without restarting the server.
Pull requests are welcome. Please open an issue first to discuss significant changes.
See CONTRIBUTING.md for setup and guidelines.
AGPL-3.0 — © astersworld.xyz