Skip to content

Aster1630/Noctis

Noctis by Aster

Version Python License

Notice: Development on Noctis is currently halted. I have other obligations to work on first and won't be actively maintaining or adding features for the time being. The project is open-source and the code is fully functional — feel free to use it, fork it, or contribute, but don't expect timely responses to issues or pull requests.

A clean, self-hosted search frontend built with FastAPI and Jinja2. Aggregates results from SearXNG, DuckDuckGo, and a local full-text index concurrently and presents them through a unified custom UI. No API keys required.

No trackers. No analytics. No search logs.


Features

  • Fast, minimal dark-mode UI (light mode available in settings)
  • Multi-backend search: SearXNG + DuckDuckGo + local index, run concurrently and merged
  • Local full-text search — Tantivy-powered index built from your own crawled pages; no external service
  • BM25 ranking with freshness and domain authority boosts — recently crawled, well-linked pages rank higher
  • Query expansion — Porter stemming and an optional synonym map for better recall
  • Image search with a responsive thumbnail grid — falls back to DuckDuckGo when SearXNG is unavailable
  • Results appearing across multiple backends are ranked higher
  • Per-user settings via cookies: sources, safe search, language, link behavior, theme
  • Graceful per-backend error isolation — one source going down doesn't break the page
  • Result filtering: restrict by site/keyword, sort by relevance or domain
  • Search result cache backed by SQLite — configurable TTL, zero-latency repeat queries, safe search–aware cache keys
  • Web crawler with robots.txt support, full-text body extraction, and domain authority tracking
  • Priority-queue URL frontier — crawl priority based on inbound link count and depth
  • Scheduled crawling via APScheduler — daily by default, configurable cron expression
  • Admin dashboard at /admin/index — index stats, crawl history, cache metrics, manual controls
  • /healthz endpoint for uptime monitoring

Stack

Layer Tech
Backend FastAPI + Uvicorn
Templates Jinja2
Styling Plain CSS (no frameworks)
HTTP client httpx (async)
Search backends SearXNG (self-hosted), DuckDuckGo, local Tantivy index
Full-text index Tantivy (embedded, no daemon)
Scheduler APScheduler 3.x (AsyncIOScheduler)
Query expansion snowballstemmer (Porter stemming)
Storage SQLite via aiosqlite
HTML parsing BeautifulSoup4

Prerequisites

  • Python 3.13+
  • Rust toolchain (cargo) — required to build Tantivy from source if no pre-built wheel is available for your platform. Install via rustup.rs.
  • A running SearXNG instance with the JSON format enabled (optional — DuckDuckGo works without it)

Noctis is developed on CachyOS and tested on Debian-based systems (Debian v12+, Ubuntu 24.04 LTS+, RPi OS v12+). It should work on most Linux distributions.

Enable SearXNG JSON API

In your SearXNG settings.yml, make sure json is listed under search.formats:

search:
  formats:
    - html
    - json

Restart SearXNG after changing this.


Installation

Self-hosted (Raspberry Pi / Ubuntu)

One-line install:

bash <(curl -fsSL https://raw.githubusercontent.com/aster1630/Noctis/main/deploy/install.sh)

The installer:

  1. Clones the repo
  2. Creates a venv and installs dependencies
  3. Asks for your SearXNG URL (optional)
  4. Writes .env
  5. Installs and starts the systemd service

Running it again on an existing install pulls the latest changes and restarts the service.

Exposing Noctis beyond localhost

After the base install, see deploy/README.md for step-by-step guides on:

  • Cloudflare Tunnel — public HTTPS on your own domain, no open ports
  • Tailscale — private access across your own devices via WireGuard mesh

Local dev

git clone https://github.com/aster1630/Noctis.git
cd Noctis

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

cp .env.example .env
# Edit .env — set SEARXNG_URL and ADMIN_TOKEN at minimum

uvicorn app.main:app --reload

Open http://localhost:8000.


Configuration

Copy .env.example to .env and adjust as needed:

Search backends

Variable Default Description
SEARXNG_URL (unset) URL of your SearXNG instance — leave unset to use DDG only
SEARXNG_TIMEOUT 8 SearXNG request timeout in seconds
ENABLED_BACKENDS ddg,searxng Comma-separated list of active backends

Storage & cache

Variable Default Description
DB_PATH noctis.db Path to the SQLite database file
CACHE_TTL_SECONDS 604800 Search result cache TTL (default: 7 days)
TANTIVY_INDEX_PATH ./noctis_index Directory for the Tantivy full-text index

Crawler & scheduler

Variable Default Description
CRAWL_SCHEDULE 0 2 * * * Cron expression for the scheduled crawl (daily at 2 AM)
CRAWL_MAX_PAGES 200 Max pages per scheduled crawl run

Admin dashboard

Variable Default Description
ADMIN_TOKEN (unset) Token required to access /admin/index. Leave unset to disable admin.

Ranking & query expansion

Variable Default Description
FRESHNESS_WINDOW_DAYS 7 Docs crawled within this many days get a freshness boost
FRESHNESS_WEIGHT 0.1 Freshness boost multiplier applied to BM25 score
AUTHORITY_WEIGHT 0.2 Domain authority boost multiplier
SYNONYMS_PATH config/synonyms.json Path to the optional synonym map for query expansion

Server

Variable Default Description
APP_HOST 0.0.0.0 Bind address
APP_PORT 8000 Bind port

Routes

User-facing

Route Description
GET / Homepage with search bar
GET /search?q=…&page=… Web results page
GET /search?q=…&type=images Image results grid
GET /search?q=…&source=local Force results from local index only
GET /search?q=…&source=remote Force results from remote backends only
GET /filters?q=… Filter/sort panel for the current search
GET /settings User settings (saved as cookies)
GET /privacy Privacy policy
GET /terms Terms of service
GET /support Support / donate page
GET /healthz Health check — returns {"status": "ok"}

Admin (requires ADMIN_TOKEN)

Route Description
GET /admin/index?token=… Admin dashboard (HTML)
GET /admin/api/index-status?token=… Index stats, cache ratio, top queries (JSON)
GET /admin/api/crawl-status?token=… Recent crawl runs and next scheduled time (JSON)
POST /admin/api/crawl?token=… Trigger a one-off crawl run
POST /admin/api/rebuild-index?token=… Rebuild Tantivy index from all crawled pages
POST /admin/api/clear-cache?token=… Delete expired cache rows

All admin API endpoints also accept the token via X-Admin-Token header for programmatic access.

See docs/admin.md for setup and usage details.


Local search index

Noctis can build and search its own full-text index from pages your crawler has visited — no SearXNG or DDG required for those queries.

See docs/local-index.md for how to seed the crawler, monitor indexing, and tune ranking.


Project structure

Noctis/
├── app/
│   ├── main.py              # FastAPI app and routes
│   ├── filters.py           # Result filtering and sorting logic
│   ├── settings.py          # Cookie-based user settings
│   ├── scheduler.py         # APScheduler crawl scheduling
│   ├── search/
│   │   ├── __init__.py      # Re-exports public search API
│   │   ├── engine.py        # Multi-backend aggregator, dedup, cache integration
│   │   ├── ranking.py       # Freshness + authority BM25 boosts
│   │   └── query_expansion.py # Porter stemming + synonym map
│   ├── backends/
│   │   ├── __init__.py      # BackendRegistry initialisation
│   │   ├── registry.py      # Pluggable BackendRegistry class
│   │   ├── base.py          # Shared types (Backend protocol, BackendResult)
│   │   ├── local.py         # Local Tantivy index backend
│   │   ├── searxng.py       # SearXNG JSON API client
│   │   └── ddg.py           # DuckDuckGo client
│   ├── indexing/
│   │   ├── schema.py        # Tantivy field schema definition
│   │   ├── index.py         # Index open/create/reset singleton
│   │   └── writer.py        # Async document writer (write-locked)
│   ├── storage/
│   │   ├── db.py            # Async SQLite connection + DB init
│   │   ├── cache.py         # Result cache (read/write/purge)
│   │   ├── models.py        # Storage dataclasses
│   │   └── migrations/
│   │       ├── 001_init.sql # result_cache, crawl_pages, crawl_queue
│   │       └── 002_fts_body.sql # crawl_body, domain_authority, crawl_stats, crawl_frontier
│   ├── crawler/
│   │   ├── crawler.py       # Async web crawler (full-text + metadata)
│   │   ├── frontier.py      # Priority-queue URL frontier
│   │   ├── robots.py        # robots.txt parser (cached per domain)
│   │   └── storage.py       # Crawler DB helpers (pages, body, authority)
│   ├── admin/
│   │   ├── __init__.py      # Admin APIRouter
│   │   └── api.py           # Admin JSON endpoints
│   ├── templates/
│   │   ├── base.html
│   │   ├── index.html
│   │   ├── results.html
│   │   ├── filters.html
│   │   ├── settings.html
│   │   ├── admin/
│   │   │   └── index.html   # Admin dashboard
│   │   ├── privacy.html
│   │   ├── terms.html
│   │   └── support.html
│   └── static/
│       ├── style.css
│       └── favicon.ico
├── config/
│   └── synonyms.json        # Optional synonym map for query expansion
├── docs/
│   ├── admin.md             # Admin dashboard setup and usage
│   ├── local-index.md       # Local search index guide
│   └── performance.md       # Load test plan (v0.4.0 deferred)
├── tests/
│   ├── conftest.py
│   ├── test_backend_registry.py
│   ├── test_cache.py
│   ├── test_cache_safesearch.py
│   ├── test_crawler.py
│   ├── test_frontier.py
│   ├── test_image_search_isolation.py
│   ├── test_local_backend.py
│   ├── test_query_expansion.py
│   ├── test_ranking.py
│   ├── test_admin_api.py
│   └── test_search_fallback.py
├── deploy/
│   ├── noctis.service       # systemd unit
│   ├── cloudflared-config.yml
│   ├── install.sh
│   ├── uninstall.sh
│   └── README.md            # Hosting guide (Cloudflare Tunnel + Tailscale)
├── pytest.ini
├── requirements.txt
├── requirements-dev.txt
└── .env.example

Architecture

Backend resolution

  1. The route reads settings.backends from the user's cookie (or uses the ?source=local/remote override).
  2. BackendRegistry.get_by_names() returns only backends that are both named and currently available. The local backend reports as unavailable when the Tantivy index is empty, so it is silently skipped until the crawler has run.
  3. All selected backends run concurrently via asyncio.gather(). Results are merged and de-duplicated by normalised URL; results appearing in more backends float to the top.
  4. If all selected backends fail, get_fallback_chain() retries with the remaining available backends before returning an error.

Local index

The Tantivy index lives on disk at TANTIVY_INDEX_PATH (default ./noctis_index). It is built incrementally — each crawled page is indexed immediately after being saved. A full rebuild can be triggered from the admin dashboard.

Queries against the local backend are expanded with Porter stemming and an optional synonym map before being sent to Tantivy. BM25 scores are then boosted by freshness (crawled within FRESHNESS_WINDOW_DAYS) and domain authority (inbound link count, normalised 0–1).

Cache

Web search results are cached in SQLite keyed on SHA-256(query + page + sorted_backends + safesearch). Cache entries expire after CACHE_TTL_SECONDS. The cache is checked before any backend is called and written through after a successful response. Cache misses and DB errors fall through silently.

Crawler

The crawler fetches HTML, extracts title, snippet, full body text, h1 and h2 headings, and discovers outbound links. It respects robots.txt per domain (cached in-process with lru_cache). Discovered links are pushed into a priority-queue frontier (crawl_frontier table) and processed by run_crawl_worker().

URL frontier priority is computed as:
domain_authority × 0.6 + (1 / depth) × 0.4

After each crawl batch, domain authority scores are renormalised to [0, 1] based on the maximum inbound link count across all known domains.

Scheduler

The crawler runs on a cron schedule defined by CRAWL_SCHEDULE via APScheduler's AsyncIOScheduler. It starts automatically with the FastAPI app and shuts down cleanly on exit. A one-off crawl can also be triggered from the admin dashboard without restarting the server.


Contributing

Pull requests are welcome. Please open an issue first to discuss significant changes.

See CONTRIBUTING.md for setup and guidelines.


License

AGPL-3.0 — © astersworld.xyz

About

Self-hosted search frontend aggregating SearXNG and DuckDuckGo. No trackers, no analytics, no search logs.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Contributors