The task

Goal

Produce a program in Python programming language, which given read alignments (in BAM format with haplotype tags) and a set of variants (in phased VCF format) computes support for ALT and REF alleles across reads assigned to individual haplotypes.

Note that usage of GenAI coding assistants (GitHub Copilot, Claude Code, etc) is both allowed and encouraged.

Inputs & assumptions

alignments.bam -- BAM file with read alignments. Coordinate-sorted, indexed (matching .bai index is also provided) alignments BAM, with primary alignments optionally haplotagged withHPtag (1 or 2).
variants.vcf(.gz) -- variants in phased VCF format (optionally gzipped). To keep things relatively simple, we only ask you to consider bi-allelic mismatch variants (SNVs), ignoring indels, MNPs, etc.

Output

Produce a TSV file with rows corresponding to SNVs and the following 6 required columns (feel free to include extra columns if you think they might be useful):

chrom – chromosome name
pos– position on chromosome
h<H>_<A> with H in {1,2} and A in {'ALT, 'REF'} (4 columns total) — number of primary alignments with HP==H supporting allele A

Test data

Test data archive contains read alignments (30x coverage) and variants for chr16:28000000-28500000 region of HG002 genome.

Development Setup

This project uses Docker for consistent development environment.

Option 1: VSCode Devcontainer (Recommended)

Open in VSCode
When prompted, click "Reopen in Container"
Wait for container to build and dependencies to install
Start developing!

Option 2: Standalone Docker

# Build the image
docker build -t haplotype-counter .

# Run interactively
docker run -it --rm -v $(pwd):/workspace haplotype-counter

# Or run with specific command
docker run --rm -v $(pwd):/workspace haplotype-counter haplotype-counter --help

Available Commands

# Run tests
pytest

# Run notebook
jupyter nbconvert --execute --to notebook haplotype_analysis.ipynb

Examining Test Data

The Docker image includes samtools, bcftools.

Project Structure

src/haplotype_counter/    # Main package
├── __init__.py
├── haplotype_counter.py             # Core processing logic

test_data/              # Test data directory

tests/                   # Test suite
├── __init__.py
├── test_haplotype_counter.py         # Tests helper methods in haplotype_counter.py
└── test_haplotype_counter.py        # Tests HaplotypeCounter class

output/                  # Output directory
├── haplotype_counts.tsv         # Output TSV file for the data found in test_data/

Dockerfile              # Dockerfile for the project

README.md               # This file

pyproject.toml          # Poetry configuration

haplotype_analysis.ipynb # Jupyter notebook for the analysis.  Finds the haplotype counts for the data found in test_data/ and outputs to output/haplotype_counts.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The task

Goal

Inputs & assumptions

Output

Test data

Development Setup

Option 1: VSCode Devcontainer (Recommended)

Option 2: Standalone Docker

Available Commands

Examining Test Data

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.cursor/rules		.cursor/rules
.devcontainer		.devcontainer
output		output
src/haplotype_counter		src/haplotype_counter
test_data		test_data
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
haplotype_analysis.ipynb		haplotype_analysis.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

The task

Goal

Inputs & assumptions

Output

Test data

Development Setup

Option 1: VSCode Devcontainer (Recommended)

Option 2: Standalone Docker

Available Commands

Examining Test Data

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages