Skip to content

Jmp1062/examples

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

156 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Burla Examples

Examples, demos, and use cases for running Python at cluster scale with Burla.

Start with the gallery site: https://burla-cloud.github.io/examples/

Burla's core API is small: write normal Python, then use remote_parallel_map() to run it across many CPUs or GPUs. This repo collects the demos, data stories, and practical patterns that show what that looks like in real workloads.

Live Demos

These examples already had GitHub Pages sites, so they are published under this repo:

Demo What it shows
Airbnb at continental scale 1.1M listings, 1.4M photos, and 50M reviews processed with vision and text pipelines.
Amazon Review Distiller 571M reviews ranked and searched with parallel deterministic text analysis.
The Met's Hidden Twins 192K museum artworks embedded to find visual near-duplicates across centuries.
NYC Ghost Neighborhoods 2.76B taxi trips processed to find neighborhoods that changed after the pandemic.
Fossils of the arXiv 2.71M abstracts embedded and clustered to find extinct and emerging research topics.
World Photo Index 9.49M geotagged Flickr photos analyzed to find what every country photographs.
One Million GitHub READMEs 1.2M READMEs classified, summarized, and searched without an LLM.

Examples

Heavy Workloads

Example Focus
gpu-embedding-demo GPU embeddings on A100s.
image-dataset-resize Resizing millions of images in parallel.
bioinformatics-alignment BWA-MEM alignment over many FASTQ files.
gdal-raster-processing GDAL raster jobs across many workers.
ml-inference-batch Batch inference without a serving layer.
ghcn-rainiest-day Scanning billions of weather rows.

Everyday Patterns

Example Focus
parallel-web-scraping Scraping thousands of pages concurrently.
python-etl-no-airflow Simple Python ETL without Airflow.
rate-limited-api-requests Large API jobs with explicit rate limits.
pandas-apply-parallel Scaling slow pandas.apply() functions.
parquet-parallel Processing many Parquet files in parallel.
monte-carlo-simulation Independent Monte Carlo simulations across many cores.

GitHub Pages only deploys examples that already had Pages sites before this repo was created.

About

Burla examples, demos, and use cases

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • HTML 77.3%
  • Python 17.3%
  • CSS 2.5%
  • JavaScript 2.4%
  • Jupyter Notebook 0.4%
  • Makefile 0.1%