Is your feature request related to a problem?
For non-NERSC sites that must upload archives over HTTPS, SimBoard cannot currently follow the same state-first dedupe flow used by the NERSC path ingestor.
Current gaps:
/api/v1/ingestions/state only reconstructs state from HPC_PATH ingestions.
/api/v1/ingestions/from-upload does not accept caller-provided processed_execution_ids or stable case identity metadata.
- Upload ingestions are currently recorded as
BROWSER_UPLOAD, not a dedicated HPC automation source.
- There is no upload-side automation runner that fetches DB-backed state before deciding which cases to submit.
Result: upload-based HPC automation cannot skip unchanged cases before ingestion, and cannot contribute back to the same DB-backed dedupe state model.
Describe the solution you'd like
Implement a dedicated HPC upload ingestion flow that mirrors NERSC path-ingestion behavior, using one case per upload request.
Proposed scope:
- Add a dedicated HPC upload ingestion contract or endpoint.
- Require one case archive per request plus stable case identity metadata.
- Accept
processed_execution_ids from the upload-side automation job.
- Persist those ingestions as
HPC_UPLOAD.
- Extend
/api/v1/ingestions/state so it includes both HPC_PATH and HPC_UPLOAD rows.
- Add an upload-side automation runner that:
- fetches
/api/v1/ingestions/state
- scans local archive contents
- skips unchanged cases
- uploads only changed cases
- Document the distinction between browser/manual upload and automated HPC upload.
Acceptance criteria:
- A service-account-driven upload workflow can fetch known execution state before upload.
- HPC upload requests can persist per-case
processed_execution_ids needed for future dedupe decisions.
/api/v1/ingestions/state reconstructs state from both HPC_PATH and HPC_UPLOAD records.
- Unchanged cases are skipped by the upload automation runner.
- Browser/manual upload behavior remains supported and clearly separated from HPC automation behavior.
Describe alternatives you've considered
-
Reuse current browser upload flow.
This is not enough because the current upload contract does not carry state metadata and persists uploads as BROWSER_UPLOAD.
-
Support multi-case HPC upload requests.
This would require a more complex persistence model because current ingestion audit rows only store one source_reference and one flat processed_execution_ids list, which is not sufficient for reconstructing per-case state from a multi-case upload.
-
Keep upload flow as-is and rely on duplicate detection during ingest.
This preserves correctness at the execution_id level, but loses the main benefit of state-first dedupe: skipping unchanged cases before upload and ingestion.
Additional context
Recommendation: implement this in a separate PR from docs-only clarification work.
Reason for preferring one case per HPC upload request:
- Matches current state model keyed by case path.
- Minimizes schema and audit-model complexity.
- Keeps browser/manual multi-case uploads separate from HPC automation semantics.
- Lowest-risk path to parity with the existing NERSC runner.
Is your feature request related to a problem?
For non-NERSC sites that must upload archives over HTTPS, SimBoard cannot currently follow the same state-first dedupe flow used by the NERSC path ingestor.
Current gaps:
/api/v1/ingestions/stateonly reconstructs state fromHPC_PATHingestions./api/v1/ingestions/from-uploaddoes not accept caller-providedprocessed_execution_idsor stable case identity metadata.BROWSER_UPLOAD, not a dedicated HPC automation source.Result: upload-based HPC automation cannot skip unchanged cases before ingestion, and cannot contribute back to the same DB-backed dedupe state model.
Describe the solution you'd like
Implement a dedicated HPC upload ingestion flow that mirrors NERSC path-ingestion behavior, using one case per upload request.
Proposed scope:
processed_execution_idsfrom the upload-side automation job.HPC_UPLOAD./api/v1/ingestions/stateso it includes bothHPC_PATHandHPC_UPLOADrows./api/v1/ingestions/stateAcceptance criteria:
processed_execution_idsneeded for future dedupe decisions./api/v1/ingestions/statereconstructs state from bothHPC_PATHandHPC_UPLOADrecords.Describe alternatives you've considered
Reuse current browser upload flow.
This is not enough because the current upload contract does not carry state metadata and persists uploads as
BROWSER_UPLOAD.Support multi-case HPC upload requests.
This would require a more complex persistence model because current ingestion audit rows only store one
source_referenceand one flatprocessed_execution_idslist, which is not sufficient for reconstructing per-case state from a multi-case upload.Keep upload flow as-is and rely on duplicate detection during ingest.
This preserves correctness at the
execution_idlevel, but loses the main benefit of state-first dedupe: skipping unchanged cases before upload and ingestion.Additional context
Recommendation: implement this in a separate PR from docs-only clarification work.
Reason for preferring one case per HPC upload request: