Skip to content

Storage: retire legacy SQLite stores and add streaming volume access #16

Description

@pskeshu

Status update (2026-06-15) — storage unification largely complete

The file-based migration described across #14/#16 has shipped:

  • FileStore (gently/core/file_store.py) replaced DataStore, TiledStore,
    DatabrokerStore, and ImageManager (all deleted).
  • VizServer is now VisualizationServer (gently/ui/web/server.py) with WS streaming.

Remaining, re-scoped work

  1. Retire legacy SQLite storescore/store.py (GentlyStore, 1127 lines) and
    harness/memory/store.py (ContextStore, 423 lines) survive only as backward-compat
    re-exports in gently/__init__.py and except ImportError fallbacks. Confirm nothing
    instantiates them, then delete (~1,550 lines).
  2. Streaming / memmap volume accessFileStore.get_volume currently does a
    whole-volume tifffile.imread. Add slice-on-demand / memory-mapped reads for large
    volumes (200+ × 2048 × 2048), wired into the VizServer slice endpoints.
  3. Doc reconciliation — CLAUDE.md states "No SQLite databases," which is still false
    while the legacy stores ship. Fix once (1) lands (overlaps [Documentation] Document conceptual framework for sample tracking, hardware abstraction, and generalizable metrics #9).
Original issue body (pre-refactor, kept for history)

Background

Spun off from #14 (device layer refactoring). The data services layer needs engineering attention independent of the hardware abstraction work.

Current State

  • DataStore (gently/core/data_store.py): UID-based persistence with multiple backend options (DatabrokerStore, TiledStore)
  • VizServer (gently/visualization/server.py): Serves volumes via HTTP, maintains its own caching
  • ImageManager (gently/agent/image_manager.py): Agent-side data access, bridges DataStore and agent tools

Areas Needing Work

1. DataStore Interface Cleanup

  • Current interface grew organically; some methods are diSPIM-specific
  • Need clear separation between:
    • Core operations (store, retrieve, delete, query)
    • Backend-specific implementations
    • Lineage/provenance tracking

2. Unified Storage Backend

Current state has multiple storage patterns:

  • TIFF files (raw volumes)
  • Zarr (chunked array storage)
  • Databroker (Bluesky event model)
  • In-memory caches

Questions to resolve:

  • Should we standardize on one format for volumes?
  • How do we handle format conversion transparently?
  • What's the right chunking strategy for large volumes?

3. Streaming Access Patterns

For large volumes (200+ slices × 2048 × 2048):

  • Current: Load entire volume into memory
  • Target: Stream slices on demand, memory-map when possible
  • Affects: VizServer slice endpoints, agent analysis tools

4. Garbage Collection / Retention Policies

  • When should old data be cleaned up?
  • Per-session retention vs. global policies
  • User-configurable cleanup (max age, max size, keep N per session)
  • Crash recovery: reconcile index with actual files on disk

Relationship to #14

The SharedMemoryPool from #14 will become a key component here:

  • Pool handles hot data (recently acquired volumes)
  • DataStore indexes all data (hot and cold)
  • VizServer accesses through unified interface

This issue focuses on the DataStore/VizServer side of that integration.

Proposed Tasks

  • Audit current DataStore interface, document what's generic vs. hardware-specific
  • Design unified volume access API (works whether data is in memory, on disk, or remote)
  • Implement streaming/memory-mapped access for large volumes
  • Add configurable retention policies
  • Update VizServer to use new data access patterns
  • Add crash recovery (index reconciliation on startup)

Files Involved

File Role
gently/core/data_store.py Primary data persistence
gently/visualization/server.py Volume serving
gently/agent/image_manager.py Agent data access
gently/core/memory_pool.py SharedMemoryPool (from #14)

cc @subindevs @pskeshu

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions