Multi-omics integration practicum

Adrià Setó Llorens, Predoctoral Researcher at the Barcelona Institute for Global Health (ISGlobal).

Augusto Anguita-Ruiz, Junior Leader Researcher at the Barcelona Institute for Global Health (ISGlobal).

The multi-omics approach aims to integrate diverse layers of biological information—such as genomics, transcriptomics, proteomics, metabolomics, and epigenomics—to achieve a more comprehensive understanding of biological systems and disease mechanisms. Each omic layer captures a distinct yet interconnected level of cellular regulation, and their integration enables the identification of molecular interactions that cannot be detected through single-omic analyses alone. The main advantage of multi-omics integration over traditional single-omic studies lies in its ability to uncover cross-level biological relationships and multi-factorial drivers of phenotypes, improving prediction accuracy and mechanistic insight. This systems-level perspective supports the discovery of key biomarkers, regulatory networks, and potential therapeutic targets.

There are many multi-omics integration algorithms, each suited for different analytical goals, and they can be classified according to whether they are supervised or unsupervised and whether they perform variable selection—in this session, we will focus on the RGCCA (Regularized Generalized Canonical Correlation Analysis) approach.

The objective of this session to offer an introduction to a multi-omics integration analysis using RGCCA. We will:

Load the data
Preprocess the data
Perform multi-omics integration
Understand the results of multi-omics integration
Evaluate the algorithm’s performance

We will integrate multi-omics data — including proteomics, urine and serum metabolomics, gene expression, and DNA methylation — using Regularized Generalized Canonical Correlation Analysis (RGCCA). The outcome variable will be standardized body mass index (zBMI) at 9 years old. The objective of this analysis is to identify multi-omic signatures predictive of BMI in later childhood while gaining a hands-on understanding of the application of RGCCA to multi-omics data integration.

For this practical tutorial, we will use data from the HELIX exposome study. The HELIX study is a collaborative project between six longitudinal, population-based birth cohort studies from six European countries (France, Greece, Lithuania, Norway, Spain and the UK).

Note: The data provided in this introductory course were simulated from the HELIX sub-cohort data. Details of the HELIX project and the origin of the data collected can be found in the following publication: BMJ Open - HELIX and on the project website. Additional details about the dataset can be found in the official repository at https://github.com/isglobal-exposomeHub/ExposomeDataChallenge2021.

Repository guide

The repository contains the following documents:

The WORKSHOP_MULTIOMICS_INTEGRATION.ipynb. It contains the notebook for the practical tutorial with the code needed to perform the multi-omic integration using RGCCA.
The WORKSHOP_MULTIOMICS_INTEGRATION.Rmd file contains the R Markdown tutorial and all the code needed to perform multi-omic integration using RGCCA locally.
The WORKSHOP_MULTIOMICS_INTEGRATION.html file presents the code and results of the tutorial on multi-omic integration using RGCCA.
Functions: This directory contains all the functions used in this session. These functions are stored in separate files to keep the notebook clean and easy to follow. For more details, you can consult the files in this directory.
RGCCA modified package.

This is the dataset we will use:

Exposome data (n=1301): Rdata file containing three objects:
- 1 object for exposures: exposome
- 1 object for covariates: covariates
- 1 object for outcomes: phenotype

The three tables can be linked using ID variable. See the codebook for variable description (variable name, domain, type of variable, transformation, ...)

omic data: Exposome and omic data can be linked using ID variable.
- Proteome: ExpressionSet called metabol_serum of 1170 individuals and 39 proteins (log-transformed) that are annotated in the ExpressionSet object (use fData(proteome) after loading Biobase Bioconductor package).
- Gene expression: ExpressionSet called genexpr (see here what an ExpressionSet is) of 1007 individuals and 28,738 transcripts with annotated gene symbols.
- Methylation: GenomicRatioSet called methy (see here what a GenomicRatioSet is) of 918 individuals and 386,518 CpGs

The variables that are available in the metadata are:

ID: identification number

e3_sex: gender (male, female)

age_sample_years: age (in years)

h_ethnicity_cauc: caucasic? (yes, no)

ethn_PC1: first PCA to address population stratification

ethn_PC2: second PCA to address population stratification

Cell-type estimates (only for methylation): NK_6, Bcell_6, CD4T_6, CD8T_6, Gran_6, Mono_6

Reminder: Introduction to NoteBook

This notebook will guide you step by step, from loading a dataset to analyzing it.

Getting Started:

Open multiomics_integration_tutorial.ipynb and click “Open in Colab” (sign in with your Google account if needed).
Select “Open in draft mode” at the top left so you can run the code safely.
If you see "Warning: This notebook was not created by Google.", don’t worry—just click Run anyway.

How to Use the Notebook:

The notebook mixes text explanations and code cells for hands-on learning.
Always run cells in order to avoid errors.
Click the play button next to a cell, or press Ctrl+Enter (Cmd+Enter on Mac).
Lines starting with # are comments for guidance, they won’t affect the code.
Outputs appear below each cell, showing results and any printed messages.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Functions		Functions
data		data
LICENSE		LICENSE
README.md		README.md
RGCCA-main.zip		RGCCA-main.zip
WORKSHOP_MULTIOMICS_INTEGRATION.Rmd		WORKSHOP_MULTIOMICS_INTEGRATION.Rmd
WORKSHOP_MULTIOMICS_INTEGRATION.html		WORKSHOP_MULTIOMICS_INTEGRATION.html
WORKSHOP_MULTIOMICS_INTEGRATION.ipynb		WORKSHOP_MULTIOMICS_INTEGRATION.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-omics integration practicum

Repository guide

Reminder: Introduction to NoteBook

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-omics integration practicum

Repository guide

Reminder: Introduction to NoteBook

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages