Skip to content

feat(backend): keep only latest preprocessing pipeline version#6813

Draft
corneliusroemer wants to merge 1 commit into
mainfrom
prepro-keep-latest-only
Draft

feat(backend): keep only latest preprocessing pipeline version#6813
corneliusroemer wants to merge 1 commit into
mainfrom
prepro-keep-latest-only

Conversation

@corneliusroemer

@corneliusroemer corneliusroemer commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

The pipeline-version garbage collector (UseNewerProcessingPipelineVersionTask) runs after the backend bumps an organism to a newer preprocessing pipeline version, deleting outdated preprocessed data. Previously it retained the two most recent pipeline versions by passing latestVersion - 1 as the earliest version to keep.

This PR changes it to keep only the latest version (passing latestVersion), so older preprocessed data is no longer retained unnecessarily — keeping only what's needed.

Changes

  • UseNewerProcessingPipelineVersionTask.kt: pass latestVersion instead of latestVersion - 1 to cleanUpOutdatedPreprocessingData. This is the only behavioral change; the deletion is still scoped per-organism and only triggers when a version upgrade actually happens.
  • UseNewerProcessingPipelineVersionTaskTest.kt: updated the existing GC test assertions to expect only the latest version surviving ([2L] after the v2 bump, [3L] after v3), with clarified comments. The per-organism scoping assertions (OTHER_ORGANISM untouched) are unchanged.
  • backend/AGENTS.md: clarified test instructions to check for Docker first and prefer the Docker route, falling back to USE_NONDOCKER_INFRA=true only when Docker is absent or its run fails.

Testing

./gradlew test --tests "...UseNewerProcessingPipelineVersionTaskTest" passes.

🤖 Generated with Claude Code

🚀 Preview: Add preview label to enable

Previously the pipeline-version garbage collector kept the two most
recent preprocessing pipeline versions (passing `latestVersion - 1` as
the earliest version to keep). Change it to keep only the latest version
(`latestVersion`), so outdated preprocessed data is no longer retained
unnecessarily.

Also updates the GC test assertions to expect only the latest version
surviving, and clarifies the backend AGENTS.md test instructions to
prefer the Docker route when Docker is present.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@claude claude Bot added the backend related to the loculus backend component label Jun 30, 2026
@theosanderson

Copy link
Copy Markdown
Member

At first glance I'm not sure about this. I can imagine various race conditions that might result (not necessarily with current code, unsure, but with possible code) and I'm not sure DB size is such a big issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend related to the loculus backend component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants