Feature branch for Raw reads#6817
Draft
anna-parker wants to merge 3 commits into
Draft
Conversation
Partly resolves #6758 Currently, user-submitted data are not read in by the nextclade preprocessing pipeline, causing them to be dropped form the submission process. Since we're now working to implement file sharing on Loculus, we need to make it so user-submitted files make it through the preprocessing pipeline as well. This PR makes it so user-submitted files are forwarded through the nextclade pipeline without any processing or checking of file contents. Future PRs will add functionality to, for example, check for host sequences in submitted raw reads. - `parse_ndjson` now also parses the file related information sent to preprocessing by the backend - `UnprocessedData` and `UnprocessedAfterNextclade` both get a `files` attribute - test factories in `factory_methods.py` can now be given files to add to test objects, also added a test case in `test_nextclade_preprocessing.py` that carries file information I created a preview to test whether files now make it through the nextclade pipeline. When I submit sequences to the preview with attached raw reads, this file now appears in the submission review page: <img width="1594" height="581" alt="grafik" src="https://github.com/user-attachments/assets/45a2c3a7-ef9a-4738-aec7-be8ef5717a1b" /> And also on the sequence details page after the sequence is released: <img width="1124" height="595" alt="grafik" src="https://github.com/user-attachments/assets/578e16c0-6f0e-4e9b-b5e2-2c544b826086" /> One thing I ran in to when implementing this is that you need to add file categories two times in the config: one time under `submissionDataTypes` (file categories that users are allowed to submit) and then again under a top-level `files` field (files accepted as outputs of prepro pipelines): ``` defaultOrganismConfig: &defaultOrganismConfig schema: &schema submissionDataTypes: &defaultSubmissionDataTypes consensusSequences: true maxSequencesPerEntry: 1 files: enabled: true categories: - name: raw_reads displayName: Raw reads ... files: - name: annotations displayName: Annotations - name: raw_reads displayName: Raw reads ``` Would it be nicer to always allow file categories listed under `submissionDataTypes.files` to be output but prepro? Or will we ever have cases where users submit one thing, prepro processes it, and then outputs another filetype? Probably safest to keep as-is for now but just wanted to flag since not doing this properly got me into a weird state where submissions stay in 'processing' indefinitely but never error because the backend doesn't accept the preprocessing output (it only logs errors in the backend). ~- [ ] All necessary documentation has been adapted.~ - [x] The implemented feature is covered by appropriate, automated tests. - [x] Any manual testing that has been done is documented (i.e. what exactly was tested?) 🚀 Preview: https://pass-files-through.loculus.org
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
resolves #4347
Screenshot
PR Checklist
🚀 Preview: Add
previewlabel to enable