Skip to content

Feature branch for Raw reads#6817

Draft
anna-parker wants to merge 3 commits into
mainfrom
raw-reads
Draft

Feature branch for Raw reads#6817
anna-parker wants to merge 3 commits into
mainfrom
raw-reads

Conversation

@anna-parker

@anna-parker anna-parker commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

resolves #4347

Screenshot

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by appropriate, automated tests.
  • Any manual testing that has been done is documented (i.e. what exactly was tested?)

🚀 Preview: Add preview label to enable

maverbiest and others added 2 commits June 30, 2026 20:42
Partly resolves #6758

Currently, user-submitted data are not read in by the nextclade
preprocessing pipeline, causing them to be dropped form the submission
process. Since we're now working to implement file sharing on Loculus,
we need to make it so user-submitted files make it through the
preprocessing pipeline as well.

This PR makes it so user-submitted files are forwarded through the
nextclade pipeline without any processing or checking of file contents.
Future PRs will add functionality to, for example, check for host
sequences in submitted raw reads.

- `parse_ndjson` now also parses the file related information sent to
preprocessing by the backend
- `UnprocessedData` and `UnprocessedAfterNextclade` both get a `files`
attribute
- test factories in `factory_methods.py` can now be given files to add
to test objects, also added a test case in
`test_nextclade_preprocessing.py` that carries file information

I created a preview to test whether files now make it through the
nextclade pipeline. When I submit sequences to the preview with attached
raw reads, this file now appears in the submission review page:

<img width="1594" height="581" alt="grafik"
src="https://github.com/user-attachments/assets/45a2c3a7-ef9a-4738-aec7-be8ef5717a1b"
/>

And also on the sequence details page after the sequence is released:

<img width="1124" height="595" alt="grafik"
src="https://github.com/user-attachments/assets/578e16c0-6f0e-4e9b-b5e2-2c544b826086"
/>

One thing I ran in to when implementing this is that you need to add
file categories two times in the config: one time under
`submissionDataTypes` (file categories that users are allowed to submit)
and then again under a top-level `files` field (files accepted as
outputs of prepro pipelines):

```
defaultOrganismConfig: &defaultOrganismConfig
  schema: &schema
    submissionDataTypes: &defaultSubmissionDataTypes
      consensusSequences: true
      maxSequencesPerEntry: 1
      files:
        enabled: true
        categories:
          - name: raw_reads
            displayName: Raw reads
    ...
    files:
      - name: annotations
        displayName: Annotations
      - name: raw_reads
        displayName: Raw reads
```

Would it be nicer to always allow file categories listed under
`submissionDataTypes.files` to be output but prepro? Or will we ever
have cases where users submit one thing, prepro processes it, and then
outputs another filetype?

Probably safest to keep as-is for now but just wanted to flag since not
doing this properly got me into a weird state where submissions stay in
'processing' indefinitely but never error because the backend doesn't
accept the preprocessing output (it only logs errors in the backend).

~- [ ] All necessary documentation has been adapted.~
- [x] The implemented feature is covered by appropriate, automated
tests.
- [x] Any manual testing that has been done is documented (i.e. what
exactly was tested?)

🚀 Preview: https://pass-files-through.loculus.org
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Raw read epic

2 participants