Enable raw reads sharing by default#6660
Conversation
|
This PR may be related to: #4347 (Raw read epic) |
Initial testing after enabling file sharing for all organismsThis is some testing I did before any code changes, just after flipping on file sharing (Raw reads) for all organisms. I was able to attach raw reads to one of the west-nile virus example data by submitting via the web interface using the following directory structure: ➜ scratch tree west_nile_files_submission
west_nile_files_submission
└── test_INYO_2_2011
└── ERR17072040.fastq.gzThe raw reads file I'm using belongs to one of the west-nile sequences we have on PPX: https://pathoplexus.org/seq/PP_006UYBK.2 I tried this as both the testuser and the superuser, the file gets uploaded to S3 in both cases and registered in the
...and also attached to the submissions in the However, I don't see them getting attached to the submissions in the
I also don't see them on the sequence details page in the website (presumably because files don't get registered on the processed data entry?)
|
Exactly, this is what I anticipated would happen, there is an open TODO in the nuclino about ensuring the submitted files are passed into prepro :-) |
Partly resolves #6758 Currently, user-submitted data are not read in by the nextclade preprocessing pipeline, causing them to be dropped form the submission process. Since we're now working to implement file sharing on Loculus, we need to make it so user-submitted files make it through the preprocessing pipeline as well. This PR makes it so user-submitted files are forwarded through the nextclade pipeline without any processing or checking of file contents. Future PRs will add functionality to, for example, check for host sequences in submitted raw reads. ## Implementation - `parse_ndjson` now also parses the file related information sent to preprocessing by the backend - `UnprocessedData` and `UnprocessedAfterNextclade` both get a `files` attribute - test factories in `factory_methods.py` can now be given files to add to test objects, also added a test case in `test_nextclade_preprocessing.py` that carries file information ## Manual testing I created a preview to test whether files now make it through the nextclade pipeline. When I submit sequences to the preview with attached raw reads, this file now appears in the submission review page: <img width="1594" height="581" alt="grafik" src="https://github.com/user-attachments/assets/45a2c3a7-ef9a-4738-aec7-be8ef5717a1b" /> And also on the sequence details page after the sequence is released: <img width="1124" height="595" alt="grafik" src="https://github.com/user-attachments/assets/578e16c0-6f0e-4e9b-b5e2-2c544b826086" /> ## Open questions One thing I ran in to when implementing this is that you need to add file categories two times in the config: one time under `submissionDataTypes` (file categories that users are allowed to submit) and then again under a top-level `files` field (files accepted as outputs of prepro pipelines): ``` defaultOrganismConfig: &defaultOrganismConfig schema: &schema submissionDataTypes: &defaultSubmissionDataTypes consensusSequences: true maxSequencesPerEntry: 1 files: enabled: true categories: - name: raw_reads displayName: Raw reads ... files: - name: annotations displayName: Annotations - name: raw_reads displayName: Raw reads ``` Would it be nicer to always allow file categories listed under `submissionDataTypes.files` to be output but prepro? Or will we ever have cases where users submit one thing, prepro processes it, and then outputs another filetype? Probably safest to keep as-is for now but just wanted to flag since not doing this properly got me into a weird state where submissions stay in 'processing' indefinitely but never error because the backend doesn't accept the preprocessing output (it only logs errors in the backend). ### PR Checklist ~- [ ] All necessary documentation has been adapted.~ - [x] The implemented feature is covered by appropriate, automated tests. - [x] Any manual testing that has been done is documented (i.e. what exactly was tested?) 🚀 Preview: https://pass-files-through.loculus.org
|
closing in favor of the feature branch: #6817 |




Related to #4347
PR Checklist
🚀 Preview: Add
previewlabel to enable