Skip to content

bmeares/Meerschaum

Repository files navigation

Meerschaum banner

PyPI GitHub Info Stats
PyPI GitHub Repo stars License Number of plugins
PyPI - Python Version GitHub Sponsors Artifact Hub Number of registered users

Meerschaum demo

What is Meerschaum?

Meerschaum is an ETL framework for time-series data. You define pipes — named data streams — and Meerschaum keeps them in sync: it fetches only the new or changed rows, deduplicates and upserts them, manages the schema, and handles scheduling, serving, and storage.

Write a few lines of fetch logic; Meerschaum handles the rest of the pipeline. No more copy/pasting ETL scripts, hand-rolling incremental windows, or babysitting cron jobs. Drop it into an existing stack or stand up a full database-and-dashboard stack in minutes.

import meerschaum as mrsm

pipe = mrsm.Pipe('plugin:noaa', 'weather', 'atl', instance='sql:local')
pipe.sync()  ### Pulls only what's new since the last sync.

Features

  • ⚡️ Incremental by defaultthe sync engine fetches only new or changed rows and concurrently updates many streams at once. Duplicate rows are ignored; rows with existing keys are updated.
  • 📊 Built for data scientists and analysts — integrate with Pandas, Grafana, and friends; persist DataFrames and always get the latest data. Skip pandas overhead and read rows as plain dicts with Pipe.get_docs().
  • 🗄️ Production-ready, batteries included — one-click deploy a TimescaleDB + Grafana stack, serve data org-wide via FastAPI (uvicorn/gunicorn), and secure API instances with scoped auth tokens. Supports PostGIS geometry (incl. ESRI CRS) for geospatial pipelines.
  • 💼 Jobs and scheduling — run any command as a background job with -d. Built-in scheduler handles cron and interval schedules — no crontab or systemd setup. Execute locally, via systemd, or remotely on an API instance with --executor-keys.
  • 🔌 Easily expandable — ingest any source with a simple plugin: just return a DataFrame. Add any function as a command, define parent/child pipe relationships for composable SQL pipelines, or embed Meerschaum via its Python API.
  • Tailored for your experience — a rich CLI that's surprisingly enjoyable, a web dashboard for the graphically inclined, and connectors for SQL, API, Valkey, and custom backends.
  • 🧳 Portable from the start$MRSM_ROOT_DIR emulates multiple installations and groups instances. No dependencies required (anything needed installs into a virtual environment), and it's uv-compatible: uv tool install meerschaum.

Want to learn more?

Find a wealth of information at meerschaum.io, or read up on Meerschaum in the wild:

Installation

For a more thorough setup guide, visit the Getting Started page at meerschaum.io.

TL;DR

pip install meerschaum # or `uv tool install meerschaum[api]`
mrsm stack up -d
mrsm bootstrap pipes

Usage

Visit meerschaum.io for setup, usage, and troubleshooting information. You can find technical documentation at docs.meerschaum.io.

CLI

### Install the NOAA weather plugin.
mrsm install plugin noaa

### Register a new pipe to the built-in SQLite DB.
### You can instead run `bootstrap pipe` for a wizard.
### Enter 'KATL' for Atlanta when prompted.
mrsm register pipe -c plugin:noaa -m weather -l atl -i sql:local

### Pull data and create the table "plugin_noaa_weather_atl".
mrsm sync pipes -l atl -i sql:local

Python API

import meerschaum as mrsm

pipe = mrsm.Pipe(
    'foo', 'bar',
    instance = 'sql:local',                  ### Built-in SQLite DB.
    columns  = {'datetime': 'dt', 'id': 'id'},
)

### Sync a DataFrame (or list of dicts) — creates the table on first run.
pipe.sync([{'dt': '2024-07-01', 'id': 1, 'val': 10}])

### Duplicates are ignored; rows with existing keys are updated.
pipe.sync([{'dt': '2024-07-01', 'id': 1, 'val': 100}])
assert pipe.get_rowcount() == 1

### Read back as a DataFrame, filtered by time range and params.
df = pipe.get_data(begin='2024-01-01', end='2025-01-01', params={'id': [1]})

### Or skip pandas and read plain dicts.
docs = pipe.get_docs(params={'id': [1]})
### [{'dt': datetime(2024, 7, 1), 'id': 1, 'val': 100}]

For composable in-database SQL pipelines (reference inheritance and {{ Pipe(...) }} table resolution), see the SQL pipes guide.

Plugins

Ingest any source by returning rows from a fetch function — Meerschaum handles the rest:

# ~/.config/meerschaum/plugins/example.py
__version__ = '1.0.0'
required = ['requests']

def register(pipe, **kw):
    return {'columns': {'datetime': 'dt', 'id': 'id'}}

def fetch(pipe, begin=None, end=None, **kw):
    import requests
    rows = requests.get('https://api.example.com/data').json()
    return rows  ### list of dicts or a Pandas DataFrame

Support Meerschaum's Development

For consulting services and to support Meerschaum's development, please considering sponsoring me on GitHub sponsors.

Additionally, you can always buy me a coffee☕!

License

Copyright 2020-2026 Bennett Meares

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.