Project status: Work in progress. Feedback and contributions are welcome.
LDlink is an interactive suite of web-based tools for investigating linkage disequilibrium (LD) across ancestral population groups. LDlink uses publicly available 1000 Genomes Project reference haplotypes to calculate population-specific LD, accepts variants as RefSNP (RS) numbers or genomic positions, and references dbSNP for RS identifiers and bi-allelic variant information. Depending on the module, LDlink also incorporates data from resources such as UCSC RefSeq, RegulomeDB, genetic maps, the GTEx Portal, the GWAS Catalog, and FORGEdb.
Internet access and a personal LDlink API token are required for API calls.
LDlinkPy is currently installed from GitHub and is not on PyPI yet. Using a virtual environment is recommended.
- Python 3.10 or newer
python3 -m venv .venv
./.venv/bin/python -m pip install --upgrade pip
./.venv/bin/python -m pip install "https://github.com/timyers/ldlinkpy/archive/refs/heads/main.zip"py -m venv .venv
.\.venv\Scripts\python -m pip install --upgrade pip
.\.venv\Scripts\python -m pip install "https://github.com/timyers/ldlinkpy/archive/refs/heads/main.zip"Request a personal access token at https://ldlink.nih.gov/apiaccess. Once registered, your token will be emailed to you.
LDlinkPy reads your token from the LDLINK_TOKEN environment variable by default. You can also pass token="your_token_here" directly to endpoint functions.
macOS / Linux:
export LDLINK_TOKEN="your_token_here"Windows PowerShell:
$env:LDLINK_TOKEN="your_token_here"macOS / Linux:
./.venv/bin/pythonWindows PowerShell:
.\.venv\Scripts\pythonfrom ldlinkpy import list_pop, list_chips, ldpair, ldproxyList available 1000 Genomes populations:
list_pop()List available genotyping SNP chips:
list_chips()Check LD between two variants:
ldpair("rs3", "rs4", pop="YRI")Find proxy variants for a SNP:
ldproxy("rs7412", pop="CEU")| Function | Purpose |
|---|---|
ldpair |
Query LD statistics for one or more variant pairs. |
ldmatrix |
Create an LD matrix for a set of variants. |
ldproxy |
Find proxy variants for a query variant. |
ldproxy_batch |
Run multiple LDproxy queries and write result files. |
ldtrait |
Query trait associations linked to variants in LD. |
ldexpress |
Query GTEx expression associations for variants in LD. |
ldhap |
Query haplotype and variant tables for a variant set. |
ldpop |
Query LD statistics across populations for two variants. |
snpclip |
Prune variants by LD and minor allele frequency thresholds. |
snpchip |
Identify genotyping arrays containing variants. |
list_pop |
Return available 1000 Genomes population codes. |
list_chips / list_chip_platforms |
Return available genotyping chip/platform codes. |
list_gtex_tissues |
Return GTEx tissue names and LDexpress tissue codes. |
Most endpoint functions return pandas DataFrames by default. Some functions support raw responses, file output, or endpoint-specific return shapes. See the API reference for details.
- API reference: public functions, parameters, return types, and common exceptions.
- Longer usage examples: endpoint-by-endpoint command-line examples for local development and exploratory testing.
- End-to-end examples: includes an LDlinkPy-only workflow examining population-specific LD, haplotype structure, and optional SNPchip coverage for published SNP tags at the Ewing sarcoma 6p25.1/RREB1 susceptibility locus.
LDlinkPy was conceived and overseen by xxxxx xxxxx, with code and documentation assistance from ChatGPT 5.2 Thinking (OpenAI) and Codex (OpenAI). Additional authors and contributors may be added as the project develops.
LDlinkPy is intended to provide Python access to the major LDlink workflows familiar to LDlinkR users. Function names and behavior are generally aligned where practical, while using Python conventions such as pandas DataFrames and keyword arguments.
This package is still being prepared for broader review and release. The current focus is documentation, examples, packaging polish, and human testing before further endpoint behavior changes.

