Skip to content

krondor-corp/pg-fs

Repository files navigation

pg-fs

A filesystem built on Postgres. One table, POSIX semantics, transactional guarantees.

pg-fs gives you a hierarchical document store with the ergonomics of a unix filesystem and the durability of a relational database. It's designed for applications that need structured, path-addressable content storage without the overhead of a separate file storage service.

Use Cases

Editor Applications

Rich text editors, wikis, note-taking apps, and CMS platforms need a content backend that supports hierarchical organization, fast reads, and atomic writes. pg-fs provides this out of the box:

import { PgFs } from "@pg-fs/pg-fs";

const fs = new PgFs(process.env.DATABASE_URL);
await fs.migrate();

// Create a project structure
await fs.write("/projects/acme/README.md", "# ACME Project\n\nOverview...");
await fs.write("/projects/acme/notes/meeting-2024-01.md", "## Q1 Planning\n...");
await fs.write("/projects/acme/notes/meeting-2024-02.md", "## Q2 Review\n...");

// Read a document
const doc = await fs.cat("/projects/acme/README.md");
console.log(doc.content); // "# ACME Project\n\nOverview..."

// List folder contents
const notes = await fs.ls("/projects/acme/notes");
// [{ path: "/projects/acme/notes/meeting-2024-01.md", type: "document", ... },
//  { path: "/projects/acme/notes/meeting-2024-02.md", type: "document", ... }]

// Reorganize
await fs.mv("/projects/acme", "/archive/acme");

// The entire subtree moves atomically — no broken references
const archived = await fs.cat("/archive/acme/README.md");

Why not just use a files table? Because you'll inevitably need:

  • Hierarchical listing (folders within folders)
  • Atomic subtree moves (rename a folder without breaking children)
  • Path-based constraints (can't create a file where a folder exists)
  • Parent auto-creation (write /a/b/c without manually creating /a and /a/b)

pg-fs handles all of this at the database level with triggers, constraints, and transactions. Your application code stays clean.

Editor integration patterns:

  • Auto-save: Call write() on a debounced interval. The upsert semantics mean you don't need to track whether the document exists yet.
  • File tree UI: ls("/", { deep: true }) returns the full tree in one query. ls("/folder") returns just direct children for lazy-loading.
  • Rename/move: mv() updates the node and all descendants in a single transaction. Your UI just needs to refresh.
  • Trash: rm() with { recursive: true } removes entire subtrees. Or implement soft-delete by adding an archived_at column.

Agent Context & Knowledge Stores

AI agents need persistent, structured memory — competitive intelligence, conversation history. pg-fs provides the storage layer, and because it's just Postgres, you can extend the table with pgvector for semantic search.

import { PgFs } from "@pg-fs/pg-fs";

const fs = new PgFs(process.env.DATABASE_URL);
await fs.migrate();

// Agent writes knowledge during conversation
await fs.write("/knowledge/competitors/acme", "ACME launched a new product in Q3...");

// Agent retrieves all context for its system prompt
const allDocs = await fs.dump();
const context = allDocs
  .filter(n => n.type === "document")
  .map(n => `## ${n.path}\n${n.content}`)
  .join("\n\n");

// Feed into system prompt
const systemPrompt = `You have access to the following knowledge:\n\n${context}`;

Adding vector search with pgvector:

pg-fs uses a single fs_node table. You can extend it directly:

-- Enable pgvector
CREATE EXTENSION IF NOT EXISTS vector;

-- Add embedding column to the existing table
ALTER TABLE fs_node ADD COLUMN embedding vector(1536);

-- Index for fast similarity search
CREATE INDEX idx_fs_node_embedding ON fs_node
  USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Then query with both path-based and semantic operations:

import { sql } from "kysely";

// Write a document with its embedding
await write(db, "/knowledge/icp", "Our ideal customer is...");
await sql`
  UPDATE fs_node SET embedding = ${embedding}::vector
  WHERE path = '/knowledge/icp'
`.execute(db);

// Semantic search across all documents
const results = await sql`
  SELECT path, content, 1 - (embedding <=> ${queryEmbedding}::vector) as similarity
  FROM fs_node
  WHERE type = 'document'
    AND embedding IS NOT NULL
  ORDER BY embedding <=> ${queryEmbedding}::vector
  LIMIT 10
`.execute(db);

// Scoped semantic search — only within a folder
const results = await sql`
  SELECT path, content, 1 - (embedding <=> ${queryEmbedding}::vector) as similarity
  FROM fs_node
  WHERE type = 'document'
    AND path LIKE '/knowledge/competitors/%'
    AND embedding IS NOT NULL
  ORDER BY embedding <=> ${queryEmbedding}::vector
  LIMIT 5
`.execute(db);

Why this works well for agents:

  1. Path = namespace. Agents can organize context hierarchically: /brand/voice, /competitors/acme, /customers/personas/enterprise. The folder structure is the taxonomy.

  2. Full-tree dump for prompts. dump() returns everything in one query — purpose-built for stuffing into a system prompt. No N+1 queries, no joins.

  3. Scoped search with prefix matching. The text_pattern_ops index makes WHERE path LIKE '/competitors/%' fast. Combine with pgvector for scoped semantic search — "find the most relevant competitor intel" without searching your entire knowledge base.

  4. Atomic knowledge updates. When an agent rewrites a section of knowledge, the transaction guarantees no reader sees a half-written state. Parent folders are auto-created, so the agent can write to any path without worrying about structure.

  5. Shared between humans and agents. Humans organize knowledge in the UI (drag-and-drop folders), agents read and write it programmatically. The POSIX API is the shared interface.

Other Applications

  • Configuration stores: Hierarchical config with path-based overrides (/defaults/theme, /users/123/theme)
  • Template libraries: Organize templates in folders, move/rename freely, list by category
  • Multi-tenant content: Add a tenant_id column and a partial unique index on (tenant_id, path) — each tenant gets their own filesystem
  • Audit trails: Add created_by / updated_by columns. The updated_at trigger fires automatically.
  • Version history: Add a fs_node_history table with a trigger that copies the old row on UPDATE

Getting Started

Prerequisites

  • Node.js >= 20
  • Postgres 15+ (for gen_random_uuid())
  • pnpm

Setup

# Clone and install
pnpm install

# Start Postgres
make setup

# Run tests
make test

# Start the web UI
make dev

Quick Start

import { PgFs } from "@pg-fs/pg-fs";

// Connect and create the table + triggers
const fs = new PgFs("postgresql://postgres:postgres@localhost:5433/pg_fs");
await fs.migrate();

// Write documents (parent folders auto-created)
await fs.write("/notes/hello", "Hello, world!");
await fs.write("/notes/todo", "Buy groceries");

// Read
const doc = await fs.cat("/notes/hello");    // { path, content, type, id, ... }
const items = await fs.ls("/notes");          // direct children
const all = await fs.ls("/", { deep: true }); // full tree
const info = await fs.stat("/notes/hello");   // { path, type, size, ... }
const everything = await fs.dump();           // all nodes, sorted

// Organize
await fs.mkdir("/archive");
await fs.mv("/notes/hello", "/archive/hello");
await fs.rm("/notes/todo");
await fs.rm("/archive", { recursive: true });

// Clean up
await fs.destroy();

You can also pass an existing Kysely instance:

import { PgFs, createDb } from "@pg-fs/pg-fs";

const db = createDb(process.env.DATABASE_URL);
const fs = new PgFs(db);

The standalone functions are also exported for use without the class:

import { createDb, migrate, write, cat, ls } from "@pg-fs/pg-fs";

const db = createDb(process.env.DATABASE_URL);
await migrate(db);
await write(db, "/notes/hello", "Hello, world!");

API

PgFs Class

The primary interface. Takes a connection string or an existing Kysely instance.

const fs = new PgFs("postgresql://...");
await fs.migrate();

await fs.write("/path", "content");  // → FsNode
await fs.cat("/path");               // → FsNode
await fs.ls("/path", { deep: true }); // → FsNode[]
await fs.mkdir("/path");              // → FsNode
await fs.rm("/path", { recursive: true });
await fs.mv("/old", "/new");
await fs.stat("/path");              // → FsStat
await fs.dump();                     // → FsNode[]
await fs.resolve("/path");           // → ResolveResult
await fs.destroy();                  // close connection pool

Standalone Functions

All methods are also exported as standalone functions that take (db, path, ...):

Function Signature Description
write (db, path, content) → FsNode Create or update a document. Auto-creates parent folders.
cat (db, path) → FsNode Read a document's content. Throws if not found or not a document.
ls (db, path, opts?) → FsNode[] List children. { deep: true } for recursive.
mkdir (db, path) → FsNode Create a folder. Idempotent (mkdir -p semantics).
rm (db, path, opts?) → void Remove a node. { recursive: true } for folders with children.
mv (db, oldPath, newPath) → void Move/rename. Bulk-rewrites descendant paths for folders.
stat (db, path) → FsStat Get metadata: type, size (content length), timestamps.
dump (db) → FsNode[] All nodes sorted by path. One query.
resolve (db, path) → ResolveResult 4-state path resolution: document, folder, virtual, nothing.

Error Handling

All functions throw FsError with a typed code:

Code Meaning
NOT_FOUND Nothing exists at the given path
ALREADY_EXISTS Destination path is occupied (for mv)
NOT_A_DOCUMENT Path is a folder where a document was expected
NOT_A_FOLDER Path is a document where a folder was expected
NOT_EMPTY Folder has children and recursive wasn't set
INVALID_PATH Path doesn't start with /, or is / where not allowed
import { FsError } from "@pg-fs/pg-fs";

try {
  await cat(db, "/missing");
} catch (e) {
  if (e instanceof FsError && e.code === "NOT_FOUND") {
    // handle gracefully
  }
}

Architecture

Single-Table Design

One table stores both documents and folders:

CREATE TABLE fs_node (
  id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  path       TEXT NOT NULL UNIQUE,
  type       TEXT NOT NULL CHECK (type IN ('document', 'folder')),
  content    TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),

  CHECK ((type = 'folder' AND content IS NULL) OR (type = 'document'))
);

Why one table?

  • Path operations work on a single table — no cross-table joins or consistency issues
  • Folder moves are a single UPDATE ... WHERE path LIKE prefix%
  • Constraints are row-level — no need to coordinate between tables
  • Queries are simple: SELECT * FROM fs_node WHERE path LIKE '/prefix/%'

Database-Level Safety

pg-fs pushes safety guarantees into Postgres itself:

CHECK constraints:

  • type must be 'document' or 'folder'
  • Folders must have content IS NULL

Triggers:

  • fs_node_updated_at — auto-sets updated_at = now() on every UPDATE
  • fs_node_path_conflict_check — enforces the filesystem invariant: a path is either a file or a directory, never both. Prevents:
    • Creating a document at a path that has children (virtual folder conflict)
    • Creating anything under an existing document's path (document-as-directory conflict)

Index:

  • idx_fs_node_path_prefix using text_pattern_ops — fast prefix matching for LIKE '/path/%' queries

Transactions

All write operations (write, mkdir, rm, mv) run inside a Postgres transaction. Multi-step operations like write (which creates parent folders then inserts the document) are atomic — if any step fails, everything rolls back. No orphaned folder sentinels, no half-moved trees.

Path Conventions

  • All paths start with /
  • No trailing slashes — /folder not /folder/
  • Dots and double-dots are resolved: /a/../b/b
  • Multiple slashes are collapsed: /a///b/a/b
  • Folders are auto-created by write and mkdir

Dev Commands

make help          # Show all targets
make install       # pnpm install
make setup         # Start Postgres container
make dev           # Start Next.js dev server
make test          # Run all tests
make build         # Build all packages
make check         # Format check + type check + tests
make wipe          # Drop and recreate database
make teardown      # Stop and remove containers

Extending

Adding Columns

pg-fs is just a Postgres table. Add whatever columns you need:

-- Soft delete
ALTER TABLE fs_node ADD COLUMN archived_at TIMESTAMPTZ;
CREATE UNIQUE INDEX fs_node_path_active ON fs_node (path) WHERE archived_at IS NULL;

-- Multi-tenant
ALTER TABLE fs_node ADD COLUMN tenant_id UUID NOT NULL;
DROP INDEX fs_node_path_key;
CREATE UNIQUE INDEX fs_node_tenant_path ON fs_node (tenant_id, path);

-- Vector embeddings (pgvector)
CREATE EXTENSION IF NOT EXISTS vector;
ALTER TABLE fs_node ADD COLUMN embedding vector(1536);
CREATE INDEX idx_fs_node_embedding ON fs_node
  USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

-- Metadata
ALTER TABLE fs_node ADD COLUMN metadata JSONB DEFAULT '{}';

-- Authorship
ALTER TABLE fs_node ADD COLUMN created_by UUID REFERENCES users(id);
ALTER TABLE fs_node ADD COLUMN updated_by UUID REFERENCES users(id);

Custom Queries

The library exports the Kysely Database type, so you can write custom queries against the same table:

import { createDb } from "@pg-fs/pg-fs";
import type { Database } from "@pg-fs/pg-fs";
import { sql } from "kysely";

const db = createDb(process.env.DATABASE_URL);

// Full-text search
const results = await db
  .selectFrom("fs_node")
  .selectAll()
  .where("type", "=", "document")
  .where(sql`content ILIKE ${"%" + query + "%"}`)
  .orderBy("updated_at", "desc")
  .limit(20)
  .execute();

// Count documents per folder
const counts = await sql`
  SELECT
    regexp_replace(path, '/[^/]+$', '') as folder,
    count(*) as doc_count
  FROM fs_node
  WHERE type = 'document'
  GROUP BY folder
  ORDER BY doc_count DESC
`.execute(db);

Tech Stack

  • Kysely — Type-safe SQL query builder. No ORM, no codegen, no magic.
  • pg — Postgres driver.

License

MIT

About

basic posix implementation for postgres

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors