[REFACTOR] Refactor dataset conversion pipeline to support runtime-selectable data schemas

### Motivation

During integration testing of the SHE inference branch, it became clear that `wf-psf` currently assumes a fixed dataset structure across training, evaluation, and inference workflows.

This assumption is too restrictive for downstream integration scenarios where different processing stages may provide different subsets of fields at runtime.

For example:

* **Inference workflows** may only provide:

  * `positions`
  * `seds`

* **Training / evaluation workflows** may additionally provide:

  * `sources`
  * `masks`

In upcoming WaveDiff releases, the inference module will also be reused for evaluation workflows, which requires the pipeline to support multiple dataset “contracts” depending on runtime context.

### Proposed Changes

Introduce a schema-driven dataset conversion and validation system allowing dataset requirements to be selected dynamically at runtime.

Key additions include:

* Dataset schema registry for runtime validation
* Support for multiple processing modes:

  * `TRAIN`
  * `EVALUATION`
  * `INFERENCE`
* Separation of:

  * dataset schema definitions
  * field conversion handlers
  * conversion context objects
* Runtime selection of schema mode through configuration
* Relaxation of hard-coded assumptions about required dataset fields
* Structured logging for conversion and validation operations

### New Inference Configuration Parameter

```yaml
schema_mode: INFERENCE
```

Supported modes:

* `INFERENCE`

  * Requires only `positions` and `seds`

* `EVALUATION`

  * Expects additional fields such as `sources` and optional masks

### Benefits

* Enables integration with external pipelines (e.g. SHE)
* Decouples dataset representation from workflow assumptions
* Improves flexibility for future instruments and runtime configurations
* Simplifies reuse of inference components for evaluation workflows
* Makes dataset validation explicit and mode-aware

### Validation

The refactor was validated by re-running:

* training
* evaluation
* inference
* mock SHE pipeline integration

with multiple dataset schemas. Results remained reproducible across workflows.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[REFACTOR] Refactor dataset conversion pipeline to support runtime-selectable data schemas #217

Motivation

Proposed Changes

New Inference Configuration Parameter

Benefits

Validation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[REFACTOR] Refactor dataset conversion pipeline to support runtime-selectable data schemas #217

Description

Motivation

Proposed Changes

New Inference Configuration Parameter

Benefits

Validation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions