Streaming-first ingestion for AMFI mutual fund data.
Turn raw AMFI files into clean, schema-safe, analytics-ready Arrow tables — in parallel, without hacks.
AMFI data is:
- inconsistent
- semi-structured
- painful to clean
- not pipeline-friendly
Every existing tool assumes:
“just fetch and parse it”
That breaks at scale.
Don’t query AMFI. Ingest it properly.
Raw AMFI Data
(NAV + Scheme Files)
↓
⚡ amfi-stream
(stream + sanitize + normalize)
↓
Arrow Tables (typed, clean)
↓
Polars / DuckDB / Pandas / Spark
- ⚡ Streaming instead of batch downloads
- 🧼 Automatic normalization (no manual cleaning)
- 🧱 Strong schema via Apache Arrow
- 🧵 Parallel ingestion engine
- 📊 Directly usable in analytics tools
- 🐼 No Pandas dependency
| Tool | Model | Why it breaks |
|---|---|---|
| mfapi.in | API calls | One request per fund → slow |
| navpipe | SDK | Needs pre-known fund list |
| mftool | Scraper | Fragile, breaks silently |
| AMFI site | Raw files | No structure |
amfi-stream:
✔ Dataset-level ingestion
✔ Streaming + parallel
✔ Schema enforced
✔ Built for pipelines
from amfi_stream import (
AMFIPipeline,
stream_latest_nav,
stream_scheme_master,
stream_historical_nav
)
jobs = [
stream_scheme_master(),
stream_latest_nav(),
stream_historical_nav("1-May-2025", "1-May-2026")
]
with AMFIPipeline(max_workers=4) as pipeline:
result = pipeline.run(jobs)
print(result.latest_nav)AMFIResult(
scheme_master=pa.Table | None,
latest_nav=pa.Table | None,
historical_nav=pa.Table | None,
)Typed. Predictable. Analytics-ready.
URLs
↓
Streaming Engine
↓
Sanitizer
↓
Parser
↓
Arrow Tables
↓
Normalizers
↓
Pipeline Output
- Streaming > batch
- Schema > guesswork
- Arrow > DataFrame conversions
- Deterministic > fragile parsing
- Minimal > bloated
- Derived analytics-ready columns
- Enhanced schema layers
- Faster historical ingestion
If you’ve ever fought AMFI data, you already know why this exists.
Open areas:
- Performance tuning
- Enhanced schema creation
- Benchmark comparison
- Tests
- Documentation and docstrings
Give it a star — it helps more people discover a better way to handle AMFI data.
Apache 2.0