Good pipelines move data. Great pipelines distrust it first.
Computer Engineering undergraduate focused on building reliable data systems, with a growing interest in the intersection of data engineering and ML infrastructure.
Currently learning through end-to-end engineering projects centered around:
- ingestion reliability
- data quality validation
- reconciliation systems
- distributed data processing
- analytical infrastructure
I prefer building against messy real-world datasets over tutorial environments — especially systems where correctness, validation, and operational trust matter as much as throughput.
Building a validation-first financial data ingestion pipeline using NSE Bhavcopy data.
Current work includes:
- historical exchange-data ingestion (2021–2026)
- semantic validation of trading-date consistency
- detection of stale-data substitution in NSE archives
- schema normalization for non-standard CSV exports
- quarantine/revalidation workflows for ingestion failures
- data-quality documentation and ADR-driven pipeline design
Key finding so far: NSE archive can silently return previous trading day's data under weekend/holiday filenames — a failure mode capable of corrupting naive backtests without raising exceptions.
- Advanced SQL (window functions, optimization, analytical patterns)
- PySpark and distributed processing fundamentals
- Dimensional modeling and warehouse design
- Ingestion-system architecture and data quality engineering
- Fundamentals of Data Engineering — Reis & Housley
- Designing Data-Intensive Applications — Martin Kleppmann
- The Data Warehouse Toolkit — Ralph Kimball
PySpark · Airflow · dbt · AWS · Docker · Kafka
Long-term interest: reliable ML/data infrastructure systems where data engineering and machine learning operations intersect.
- Email: chaturvediabeer@gmail.com
- LinkedIn: linkedin.com/in/abeerchaturvedi
- Location: India