Mach-1

Mach-1 is a long-context RNA foundation model for predicting transcriptome architecture. This repository houses the core model weights alongside the scripts needed to tokenize sequences, train and fine-tune the StripedHyena-based architecture, and run inference workflows.

Repository Structure

mach-1
├── processing-seqs/          # Tokenization configs and CLI tooling for data preparation
│   ├── mach_tokenizer.json    # Tokenizer configuration used across notebooks and scripts
│   ├── prepare_data.R         # RNA sequence preprocessing and formatting utilities
│   └── tokenize_data.py       # Batch tokenizer for genomic fastas/CSVs
├── training-model/           # Configuration, training, and inference scripts
│   ├── configuration_mach.py # Default StripedHyena model definition
│   ├── generate_seqs.py       # Synthetic sequence generation entry point
│   ├── get_embeddings.py      # Embedding extraction for downstream analyses
│   ├── get_likelihoods.py     # Likelihood computation and scoring helpers
│   ├── mach_dependencies.sh   # Environment bootstrap script
│   ├── modeling_mach.py      # Core Hyena architecture implementation
│   └── train_model.py         # Training script for Mach-1 checkpoints
└── model/                    # Pretrained checkpoints and tokenizer artifacts

Getting Started

Install dependencies listed in training-model/mach_dependencies.sh or adapt them to your compute environment.
Use the scripts in processing-seqs/ to prepare and tokenize the RNA sequences of interest.
Train or fine-tune Mach-1 with training-model/train_model.py, or run inference with get_likelihoods.py, get_embeddings.py, and generate_seqs.py.
Transfer the resulting outputs (likelihoods, embeddings, variant scores, synthetic sequences) into the directory structure expected by mach-1-manuscript to reproduce the manuscript analyses.

Companion Repository

The full set of data-processing pipelines, downstream analyses, and figure-generation workflows that accompany the Mach-1 study live in the companion repository mach-1-manuscript.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
model/mach-1-v00		model/mach-1-v00
processing-seqs		processing-seqs
training-model		training-model
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mach-1

Repository Structure

Getting Started

Companion Repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Mach-1

Repository Structure

Getting Started

Companion Repository

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages