This guide describes the steps required to configure a Linux environment and upload bulk RNA-seq data to the European Nucleotide Archive (ENA). It is based on a real workflow and highlights practical issues and solutions.
Before uploading any files, you must register your project, study, and samples on ENA.
- Go to: https://www.ebi.ac.uk/ena/submit/webin/login
- Create a Webin account
- Save your Webin username and password (you will need them later for uploads). Also the username will be given automatically via the registration form.
⚠️ Important: For the next steps, study and sample registration, you can check the ENA documentation
-
Log into the Webin portal
-
Clic on Register Study on Study (Project)
-
Provide:
- Release date (when the data will be submitted publicaly) you can change it at any time.
- Study Name
- short descriptive study title (what is the difference, I don't know)
- Abstract
-
After submission, you will receive a Study accession (e.g., PRJEBXXXX)
The best way to register sample is
- Click on Register Samples in Samples
- Download speadsheet to register samples
- Find the best template to you data or go to Other Checklist then ENA default sample checklist.
- Add optional Fields (you can do it later)
- Next
- Download TSV Template.
Then you need to fill the TSV template with the sample information that you want. You can always add a column if needed.
sample_alias title tax_id scientific_name description
sample_1 Sample 1 9606 Homo sapiens Control sample
sample_2 Sample 2 9606 Homo sapiens Treated samplesample_alias: unique identifier for each sample
⚠️ It will be the id used later to identify your RUN !
sample_title: This will be the name that you will see in the Webin and not thesample_aliastax_id: NCBI taxonomy ID (e.g., 9606 for human or 10090 for mouse)scientific_name: must match taxonomy (Mus musculus for example)
- Upload the TSV via Webin on Register Samples in Samples and Upload filled spreadsheet to register samples. If there is problem in your TSV they will give you an error message.
- After submission, each sample will receive a Sample accession (e.g., ERSXXXX)
ENA uploads require several tools. Using outdated versions will cause failures, so follow these steps carefully.
System package managers (apt, yum) usually provide outdated versions.
- Download from: https://openjdk.org/
- Choose the Linux tar.gz version
- Extract it:
tar -xzf openjdk-*.tar.gz- You don't necesserly need to export the path of java you just need to know where it's located and you will provide this information later:
path/to/tarFile/jdk-*/bin- Go to: https://github.com/enasequence/webin-cli/releases
- Download the latest
.jarfile - Again, you just ned to know the path of where you have download the webin-cli-9.0.3.jar
Required for aspera-cli.
System package managers often provide older versions, so use rbenv.
To install rbenv you can follow the doc here.
normally it's really strateforward with
Centos:
sudo yum install rbenvor Ubuntu
sudo apt install rbenvrbenv install 3.1.0
rbenv global 3.1.0gem -v
ruby -vIf Ruby is not detected correctly, ensure your PATH includes:
export PATH="$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH"For fish shell:
fish_add_path $HOME/.rbenv/shims
fish_add_path $HOME/.rbenv/binUsed for fast and reliable file transfer (strongly recommended). you can check the doc here
gem install aspera-cli
ascli config transferd install- The second command installs the FASP transfer engine (ascp)
You must ensure ascp is accessible, otherwise webin-cli will not use it.
Example:
export PATH="$HOME/.aspera/sdk:$PATH"Adjust the path depending on your installation location.
Make sure all tools are available:
java -version
gem -v
ascli --versionIf everything works, your environment is ready for ENA uploads.
To be completed.
This section will describe:
- Preparing metadata for runs and experiments
- Using
webin-clifor validation and submission - Uploading FASTQ files with Aspera
- Handling common errors
- Using Aspera (
ascp) significantly reduces transfer errors compared to FTP - Always validate metadata before attempting upload
- Keep track of all accession numbers (Study, Sample, Run)
- Store scripts and commands used for reproducibility