Skip to content

Omnisky v5 impl#3

Open
Keeeeeeeks wants to merge 10 commits into
mainfrom
omnisky-v5-impl
Open

Omnisky v5 impl#3
Keeeeeeeks wants to merge 10 commits into
mainfrom
omnisky-v5-impl

Conversation

@Keeeeeeeks

Copy link
Copy Markdown
Collaborator

Summary

  • Adds the OmniSky v5 collection/crossmatching spec and implementation scaffold. v5 builds on v4, with intent of making data generation more reproducible, resumable, and cluster agnostic.

Phases

Phase 0: feasibility probes
Phase 1: galaxy data collection
Phase 2: stars/AGN scale-out scaffolding.
It's intended to introduce typed source configuration, deterministic object IDs, shard planning, source-level matching, release finalization, validation gates, SLURM wrappers, and a comprehensive data-collection runbook.

Key changes

  • Adds v5 implementation plans & specs for data collection and crossmatching across multiple datasets.
  • Hardens Phase 0 probes for:
    • source reachability,
    • one-pixel LSDB crossmatch,
    • Smith42 concordance validation pinned to an immutable HF revision.
  • Implements core v5 pipeline modules for:
    • object ID assignment,
    • HEALPix helpers,
    • source configuration,
    • atomic writes and DONE markers,
    • source shard planning,
    • matching/adjudication,
    • release validation.
  • Adds source adapters/scaffolding for:
    • MMU LSDB/HATS sources,
    • Legacy HDF5 fallback,
    • ZTF S3/HATS,
    • local FITS-style sources.
  • Adds local TEST_MODE E2E orchestration for synthetic galaxy,star, and AGN data.
  • Adds local LSDB dry run.
  • Makes SLURM wrappers cluster agnostic
  • Adds v5/RUNBOOK.md with step-by-step data-collection operations from environment setup through upload dry run.

Validation

  • Local v5 test suite passes.
  • LSP diagnostics pass for touched Python files.
  • SLURM wrapper syntax validates with bash -n.
  • Local LSDB cone dry run succeeds against DESI/HSC overlap:
    • DESI fetched 414 rows,
    • HSC fetched 1,138 rows,
    • crossmatch produced 16 matched rows.

Notes

This PR does not claim full live cluster-scale validation yet; we still need to make sure this runs in the target compute environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant