[Enhancement]: Add state-first HPC upload ingestion flow with DB-backed dedupe parity

### Is your feature request related to a problem?

For non-NERSC sites that must upload archives over HTTPS, SimBoard cannot currently follow the same state-first dedupe flow used by the NERSC path ingestor.

Current gaps:
- `/api/v1/ingestions/state` only reconstructs state from `HPC_PATH` ingestions.
- `/api/v1/ingestions/from-upload` does not accept caller-provided `processed_execution_ids` or stable case identity metadata.
- Upload ingestions are currently recorded as `BROWSER_UPLOAD`, not a dedicated HPC automation source.
- There is no upload-side automation runner that fetches DB-backed state before deciding which cases to submit.

Result: upload-based HPC automation cannot skip unchanged cases before ingestion, and cannot contribute back to the same DB-backed dedupe state model.

### Describe the solution you'd like

Implement a dedicated HPC upload ingestion flow that mirrors NERSC path-ingestion behavior, using one case per upload request.

Proposed scope:
- Add a dedicated HPC upload ingestion contract or endpoint.
- Require one case archive per request plus stable case identity metadata.
- Accept `processed_execution_ids` from the upload-side automation job.
- Persist those ingestions as `HPC_UPLOAD`.
- Extend `/api/v1/ingestions/state` so it includes both `HPC_PATH` and `HPC_UPLOAD` rows.
- Add an upload-side automation runner that:
  - fetches `/api/v1/ingestions/state`
  - scans local archive contents
  - skips unchanged cases
  - uploads only changed cases
- Document the distinction between browser/manual upload and automated HPC upload.

Acceptance criteria:
- A service-account-driven upload workflow can fetch known execution state before upload.
- HPC upload requests can persist per-case `processed_execution_ids` needed for future dedupe decisions.
- `/api/v1/ingestions/state` reconstructs state from both `HPC_PATH` and `HPC_UPLOAD` records.
- Unchanged cases are skipped by the upload automation runner.
- Browser/manual upload behavior remains supported and clearly separated from HPC automation behavior.

### Describe alternatives you've considered

1. Reuse current browser upload flow.
   This is not enough because the current upload contract does not carry state metadata and persists uploads as `BROWSER_UPLOAD`.

2. Support multi-case HPC upload requests.
   This would require a more complex persistence model because current ingestion audit rows only store one `source_reference` and one flat `processed_execution_ids` list, which is not sufficient for reconstructing per-case state from a multi-case upload.

3. Keep upload flow as-is and rely on duplicate detection during ingest.
   This preserves correctness at the `execution_id` level, but loses the main benefit of state-first dedupe: skipping unchanged cases before upload and ingestion.

### Additional context

Recommendation: implement this in a separate PR from docs-only clarification work.

Reason for preferring one case per HPC upload request:
- Matches current state model keyed by case path.
- Minimizes schema and audit-model complexity.
- Keeps browser/manual multi-case uploads separate from HPC automation semantics.
- Lowest-risk path to parity with the existing NERSC runner.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement]: Add state-first HPC upload ingestion flow with DB-backed dedupe parity #207

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Enhancement]: Add state-first HPC upload ingestion flow with DB-backed dedupe parity #207

Description

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions