Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ When none are present, `DEFAULT_TENANT` (default `"default"`) is assigned. Every
| Time Series (in-memory) | `internal/tsdb/` | Ring buffer, sliding windows, pre-computed percentiles |
| Graph (in-memory, legacy) | `internal/graph/` | Simple service topology — **being replaced by GraphRAG** |
| Vector (embedded) | `internal/vectordb/` | TF-IDF index for semantic log search (pure Go, no CGO). Retained as a fallback similarity index for SQLite mode and for `SimilarErrors` ranking within a Drain template cluster. |
| Relational (persistent) | `internal/storage/` | GORM-based, multi-DB, single source of truth. Driven by `RetentionScheduler` (hourly batched purge + daily VACUUM/ANALYZE). `logs.body` is plain TEXT (Postgres: `pg_trgm` GIN indexed for substring search); `AttributesJSON` and `AIInsight` remain `CompressedText`. |
| Relational (persistent) | `internal/storage/` | GORM-based, multi-DB, single source of truth. Driven by `RetentionScheduler` (hourly batched purge + daily VACUUM/ANALYZE). `logs.body` is plain TEXT. **Log search**: SQLite uses FTS5 virtual table `logs_fts` (porter+unicode61 tokenizer) ordered by `bm25()`, kept in sync via AFTER INSERT/DELETE/UPDATE triggers; Postgres uses `pg_trgm` GIN on `logs.body` and `logs.service_name`. `AttributesJSON` and `AIInsight` remain `CompressedText`. |

## GraphRAG Architecture

Expand Down
24 changes: 24 additions & 0 deletions docs/OPERATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,30 @@ SQLite is rejected at startup when `APP_ENV=production` unless you explicitly op

**Multi-tenancy.** Every row carries a `tenant_id` column. The write path reads `X-Tenant-ID` (HTTP) or `x-tenant-id` (gRPC metadata) and populates the column. The read path attaches the tenant from the request context to every repository query (`Where("tenant_id = ?", ...)`).

### Log search index

| Driver | Index | Ranking |
|---|---|---|
| SQLite | FTS5 virtual table `logs_fts` over `(body, service_name)`, kept in sync via AFTER INSERT/DELETE/UPDATE triggers on `logs` | `bm25(logs_fts)` ascending (lower = more relevant) |
| Postgres | `pg_trgm` GIN indexes on `logs.body` and `logs.service_name` | Recency (`timestamp desc`) — substring ILIKE |
| MySQL / SQL Server | None — sequential `LIKE` scan | Recency |

The FTS5 path uses `tokenize='porter unicode61 remove_diacritics 2'` — case-insensitive, accent-insensitive, English-stemmed (so `panic` matches `panicked`). User input is escaped and prefix-suffixed (`*`) so partial words like `conn` still match `connection`. If FTS5 errors at query time, the repository transparently falls back to LIKE so a misbehaving index does not surface as a 500 to the API.

The FTS5 table is provisioned automatically by `AutoMigrateModels` on every SQLite boot; setup is idempotent. To rebuild after corruption or a manual schema change:

```sql
INSERT INTO logs_fts(logs_fts) VALUES('rebuild');
```

The Postgres `pg_trgm` path requires the extension; if missing, AutoMigrate logs a warning and ILIKE falls back to a sequential scan. To install:

```sql
CREATE EXTENSION pg_trgm;
```

Phase 3b will add Postgres declarative partitioning as an opt-in adapter; at that point the GIN indexes will be created per-partition. There is no migration required to use FTS5 — existing SQLite databases are backfilled the first time the upgraded binary boots.

---

## Backup & Restore
Expand Down
9 changes: 9 additions & 0 deletions internal/storage/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,15 @@ func AutoMigrateModels(db *gorm.DB, driver string) error {
log.Println("🔓 Dropped legacy FK constraints (no-op on fresh DBs)")
}

// SQLite: provision FTS5 virtual table + triggers on logs.body / logs.service_name.
// Search routes through bm25() ranking on this driver; LIKE remains the fallback
// if FTS5 is unavailable (older SQLite builds without FTS5 compiled in).
if driver == "sqlite" || driver == "" {
if err := setupSQLiteFTS5(db); err != nil {
log.Printf("⚠️ SQLite FTS5 setup failed (%v) — log search will fall back to LIKE", err)
}
}

// Postgres: enable pg_trgm and create a GIN index on logs.body for fuzzy ILIKE search.
// Azure Database for PostgreSQL allows pg_trgm by default. If the role lacks
// CREATE EXTENSION privilege, an operator can pre-create the extension and this
Expand Down
119 changes: 119 additions & 0 deletions internal/storage/fts5.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
package storage

import (
"fmt"
"log"
"strings"

"gorm.io/gorm"
)

// fts5LogsTable is the FTS5 virtual table mirroring `logs.body` and
// `logs.service_name`. It is an external-content table keyed on `logs.id` so it
// stores no extra copy of the body — instead, INSERT/DELETE/UPDATE on `logs`
// are mirrored via the triggers installed in setupSQLiteFTS5.
const fts5LogsTable = "logs_fts"

// setupSQLiteFTS5 provisions the FTS5 virtual table for log search on SQLite
// and the AFTER INSERT/DELETE/UPDATE triggers that keep it in sync with the
// `logs` base table. The implementation is idempotent: it tolerates an
// existing virtual table left over from a previous boot, repairs missing
// triggers, and runs an initial backfill via the `rebuild` command so that
// rows present in `logs` before the FTS table existed (e.g. migrating an
// older OtelContext.db) are included in the BM25 index.
//
// Tokenizer rationale: `porter unicode61 remove_diacritics 2` chosen for:
// - unicode61: case-insensitive, splits on whitespace+punctuation
// - remove_diacritics 2: strips accents (latency vs latência both match)
// - porter: English stemming so "panic" matches "panicked"/"panicking"
//
// All three are pure-SQLite — they do not require external linkage and work
// on the modernc.org/sqlite (glebarez) build used in this project.
func setupSQLiteFTS5(db *gorm.DB) error {
create := `CREATE VIRTUAL TABLE IF NOT EXISTS ` + fts5LogsTable + ` USING fts5(
body,
service_name,
content='logs',
content_rowid='id',
tokenize='porter unicode61 remove_diacritics 2'
)`
if err := db.Exec(create).Error; err != nil {
// FTS5 is included in the modernc.org/sqlite amalgamation by default;
// if this fails, the build was compiled without FTS5. Surface the
// failure so SearchLogs can fall back to LIKE rather than producing
// a confusing "no such table" error later.
return fmt.Errorf("create fts5 virtual table: %w", err)
}

triggers := []struct {
name string
ddl string
}{
{
name: "logs_ai",
ddl: `CREATE TRIGGER IF NOT EXISTS logs_ai AFTER INSERT ON logs BEGIN
INSERT INTO ` + fts5LogsTable + `(rowid, body, service_name) VALUES (new.id, new.body, new.service_name);
END`,
},
{
name: "logs_ad",
ddl: `CREATE TRIGGER IF NOT EXISTS logs_ad AFTER DELETE ON logs BEGIN
INSERT INTO ` + fts5LogsTable + `(` + fts5LogsTable + `, rowid, body, service_name) VALUES ('delete', old.id, old.body, old.service_name);
END`,
},
{
name: "logs_au",
ddl: `CREATE TRIGGER IF NOT EXISTS logs_au AFTER UPDATE ON logs BEGIN
INSERT INTO ` + fts5LogsTable + `(` + fts5LogsTable + `, rowid, body, service_name) VALUES ('delete', old.id, old.body, old.service_name);
INSERT INTO ` + fts5LogsTable + `(rowid, body, service_name) VALUES (new.id, new.body, new.service_name);
END`,
},
}
for _, tr := range triggers {
if err := db.Exec(tr.ddl).Error; err != nil {
return fmt.Errorf("create trigger %s: %w", tr.name, err)
}
}

// Backfill any rows already present in `logs` but not yet in the FTS index.
// `rebuild` is a no-op on a fresh DB and cheap on a populated one — FTS5
// streams the source rows once.
if err := db.Exec(`INSERT INTO ` + fts5LogsTable + `(` + fts5LogsTable + `) VALUES ('rebuild')`).Error; err != nil {
return fmt.Errorf("rebuild fts5 index: %w", err)
}

log.Println("🔎 SQLite: FTS5 BM25 index ready on logs(body, service_name)")
return nil
}

// fts5MatchExpr translates a free-form user search string into an FTS5 MATCH
// expression that approximates the previous LIKE %query% semantics:
//
// - whitespace-separated terms are ANDed together
// - each term is double-quoted so FTS5 treats internal punctuation as
// literal token separators rather than query operators
// - each term is suffixed with `*` for prefix match, so a search for "conn"
// still hits "connection"; combined with the porter stemmer this also
// covers inflectional matches like "panic" → "panicked"
//
// Returns the empty string for empty/whitespace-only input — the caller is
// expected to skip the WHERE-clause attachment in that case.
func fts5MatchExpr(input string) string {
fields := strings.Fields(input)
if len(fields) == 0 {
return ""
}
parts := make([]string, 0, len(fields))
for _, f := range fields {
escaped := strings.ReplaceAll(f, `"`, `""`)
parts = append(parts, `"`+escaped+`"*`)
}
return strings.Join(parts, " ")
}

// fts5Available reports whether the given driver should use the FTS5 path. We
// only enable FTS5 on SQLite because Postgres has its own pg_trgm GIN path
// (see factory.go) and MySQL/SQL Server are out of scope.
func fts5Available(driver string) bool {
return strings.ToLower(driver) == "sqlite"
}
Loading
Loading