Skip to content

Harden FileStore path and YAML loading#25

Open
ceej640 wants to merge 4 commits into
gently-project:developmentfrom
ceej640:ceej/fix-storage-safety
Open

Harden FileStore path and YAML loading#25
ceej640 wants to merge 4 commits into
gently-project:developmentfrom
ceej640:ceej/fix-storage-safety

Conversation

@ceej640

@ceej640 ceej640 commented May 31, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • validate embryo IDs before using them as filesystem path components
  • route dose-import path lookup through FileStore's validated embryo path helper
  • remove the yaml.unsafe_load fallback for legacy Python/numpy constructor tags and fail closed with migration guidance
  • add regression tests for path traversal rejection and unsafe YAML rejection

Related issues

Stacking

This branch is stacked on #23 so the full suite can collect. Once #23 merges into 0.22-dev, this PR should reduce to the FileStore safety commit.

Verification

  • uv run pytest tests/test_file_store_safety.py tests/test_gently_store.py -q
  • uv run pytest -q
    • 560 passed, 4 skipped

@pskeshu

pskeshu commented Jun 1, 2026 via email

Copy link
Copy Markdown
Collaborator

@ceej640

ceej640 commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator Author

Agreed. This PR is only a FileStore safety patch; it does not answer the larger datastore architecture question.

I would treat a possible Gently4 datastore API as the result of an audit, not the starting assumption. The first pass should probably be:

  • inventory every artifact Gently produces or consumes: raw images/volumes, calibration, embryo/sample positions, temperature/setpoints, commands, plans, transcripts, dose/light exposure records, derived meshes/analysis, and operator annotations
  • classify each artifact as durable source data, derived/recomputable data, runtime-only state, or UI/cache state
  • verify which durable data are currently persisted, schema-versioned, and recoverable from disk
  • identify data Gently uses but does not store, or stores but cannot make accessible to a biologist
  • define a biologist-facing browser around sessions, samples/embryos, timepoints, modalities, previews, provenance, and export paths
  • then decide whether Gently3 can evolve by migration or whether a Gently4 store API is justified

The safety work here still matters as a prerequisite: a data browser/API cannot be trustworthy if IDs can escape the store root or if legacy YAML can load unsafe constructors.

@ceej640

ceej640 commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator Author

Follow-up implemented from this thread in commit 9e93225.

I added docs/datastore-audit.md, which turns the GentlyStore/Gently3/Gently4 question into a concrete audit checklist: durable vs derived data, persistence paths, browse/export needs, missing/stored-but-unused data, and criteria for evolving Gently3 versus defining a Gently4 API.

This keeps this PR focused on FileStore safety while documenting the larger datastore decision path.

Verification:

  • git diff --check

@ceej640

ceej640 commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator Author

Follow-up implemented from the datastore-architecture thread in commit aa042ab.

What changed:

  • Added python -m gently.core.datastore_audit ROOT [--json] [--output PATH] to inventory a Gently datastore on disk.
  • The audit counts sessions and major artifact classes, including metadata, logs, snapshots, volumes, volume sidecars, samples, projections, perception traces, debug exports, profiler spans, campaign plans, incoming files, and log files.
  • It flags practical gaps such as missing session metadata, volume TIFFs without .meta.yaml, snapshot TIFFs without sidecars, sample directories without embryo.yaml, and a missing sessions tree.
  • Added docs explaining how to use the command as the first pass before deciding whether Gently3 can evolve or whether a Gently4 API is justified.

Verification:

  • non-writing compile check for gently/core/datastore_audit.py
  • pytest tests/test_datastore_audit.py -q -p no:cacheprovider
  • git diff --check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants