⚡ amfi-stream

Streaming-first ingestion for AMFI mutual fund data.

Turn raw AMFI files into clean, schema-safe, analytics-ready Arrow tables — in parallel, without hacks.

🚀 The problem (you already know this)

AMFI data is:

inconsistent
semi-structured
painful to clean
not pipeline-friendly

Every existing tool assumes:

“just fetch and parse it”

That breaks at scale.

⚡ The shift

Don’t query AMFI. Ingest it properly.

🧩 What amfi-stream does

Raw AMFI Data
(NAV + Scheme Files)
        ↓
⚡ amfi-stream
(stream + sanitize + normalize)
        ↓
Arrow Tables (typed, clean)
        ↓
Polars / DuckDB / Pandas / Spark

✨ Why people switch

⚡ Streaming instead of batch downloads
🧼 Automatic normalization (no manual cleaning)
🧱 Strong schema via Apache Arrow
🧵 Parallel ingestion engine
📊 Directly usable in analytics tools
🐼 No Pandas dependency

🆚 Alternatives (quick reality check)

Tool	Model	Why it breaks
mfapi.in	API calls	One request per fund → slow
navpipe	SDK	Needs pre-known fund list
mftool	Scraper	Fragile, breaks silently
AMFI site	Raw files	No structure

amfi-stream:

✔ Dataset-level ingestion
✔ Streaming + parallel
✔ Schema enforced
✔ Built for pipelines

⚡ Quick start

from amfi_stream import (
    AMFIPipeline,
    stream_latest_nav,
    stream_scheme_master,
    stream_historical_nav
)

jobs = [
    stream_scheme_master(),
    stream_latest_nav(),
    stream_historical_nav("1-May-2025", "1-May-2026")
]

with AMFIPipeline(max_workers=4) as pipeline:
    result = pipeline.run(jobs)

print(result.latest_nav)

📦 Output

AMFIResult(
    scheme_master=pa.Table | None,
    latest_nav=pa.Table | None,
    historical_nav=pa.Table | None,
)

Typed. Predictable. Analytics-ready.

🏗 Architecture

URLs
 ↓
Streaming Engine
 ↓
Sanitizer
 ↓
Parser
 ↓
Arrow Tables
 ↓
Normalizers
 ↓
Pipeline Output

🔥 Design principles

Streaming > batch
Schema > guesswork
Arrow > DataFrame conversions
Deterministic > fragile parsing
Minimal > bloated

🔮 Coming soon

Derived analytics-ready columns
Enhanced schema layers
Faster historical ingestion

🤝 Contributing

If you’ve ever fought AMFI data, you already know why this exists.

Open areas:

Performance tuning
Enhanced schema creation
Benchmark comparison
Tests
Documentation and docstrings

⭐ If this helped you

Give it a star — it helps more people discover a better way to handle AMFI data.

📜 License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
amfi_stream		amfi_stream
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ amfi-stream

🚀 The problem (you already know this)

⚡ The shift

🧩 What amfi-stream does

✨ Why people switch

🆚 Alternatives (quick reality check)

⚡ Quick start

📦 Output

🏗 Architecture

🔥 Design principles

🔮 Coming soon

🤝 Contributing

⭐ If this helped you

📜 License

About

Uh oh!

Releases 3

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⚡ amfi-stream

🚀 The problem (you already know this)

⚡ The shift

🧩 What amfi-stream does

✨ Why people switch

🆚 Alternatives (quick reality check)

⚡ Quick start

📦 Output

🏗 Architecture

🔥 Design principles

🔮 Coming soon

🤝 Contributing

⭐ If this helped you

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Contributors

Uh oh!

Languages