[Ops]: Optimize DB-backed ingestion state queries for large performance archives

## Summary

Follow up on the DB-backed ingestion state work from #189 / PR #204.

Today the ingestor fetches the full `/api/v1/ingestions/state` payload for one machine, then compares that full machine-level state against the current archive scan locally. This is correct, but it may become inefficient as the database accumulates many more execution IDs over time.

## Motivation

A concern raised in issue #189 comments was that SimBoard may eventually store tens of thousands of historical execution IDs for a machine, while a given `performance_archive` scan may only contain a small number of current execution directories. In that case, returning the entire machine-level state payload is more data than the ingestor actually needs.

Example:
- DB has 20,000 stored execution IDs for a machine.
- Current archive scan finds 20 execution IDs.
- Current implementation still fetches the full machine-level state response before comparing locally.

This is not a correctness bug and should not block the existing PR, but it is a valid scalability concern.

## Current behavior

- Ingestor scans execution directories.
- Ingestor fetches `/api/v1/ingestions/state?machine_name=...`.
- API returns all known case-path state for that machine.
- Ingestor compares locally and submits only changed cases.

## Goal

Reduce the amount of state returned to ingestors when the DB-backed state for a machine is much larger than the currently scanned archive contents.

## Possible approaches

- Add optional query filters to `/api/v1/ingestions/state`, such as:
  - `since_created_at`
  - `since_execution_date` or similar execution-derived lower bound
  - case-path subset filtering if the ingestor already knows the candidate case paths
- Allow the ingestor to scan first, derive a lower bound or candidate subset, then request only relevant state.
- Measure response-size and runtime impact before and after filtering.

## Acceptance criteria

- We can bound or reduce ingestion-state payload size for large historical machine datasets.
- API behavior remains backwards compatible for existing ingestors unless/until the ingestor is updated to use optional filters.
- Document the tradeoffs and chosen filter strategy.

## Notes

- This issue is about efficiency/scaling, not correctness.
- The current DB-backed state approach was validated on NERSC and should remain the baseline source of truth.
- Multi-machine shared archive handling should remain separate unless we discover overlap in solution design.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ops]: Optimize DB-backed ingestion state queries for large performance archives #205

Summary

Motivation

Current behavior

Goal

Possible approaches

Acceptance criteria

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Ops]: Optimize DB-backed ingestion state queries for large performance archives #205

Description

Summary

Motivation

Current behavior

Goal

Possible approaches

Acceptance criteria

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions