Skip to content
View AbeerChaturvedi's full-sized avatar
🎯
Focusing
🎯
Focusing
  • India

Highlights

  • Pro

Block or report AbeerChaturvedi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
AbeerChaturvedi/README.md

Abeer Chaturvedi

Good pipelines move data. Great pipelines distrust it first.

Computer Engineering undergraduate focused on building reliable data systems, with a growing interest in the intersection of data engineering and ML infrastructure.

Currently learning through end-to-end engineering projects centered around:

  • ingestion reliability
  • data quality validation
  • reconciliation systems
  • distributed data processing
  • analytical infrastructure

I prefer building against messy real-world datasets over tutorial environments — especially systems where correctness, validation, and operational trust matter as much as throughput.


Current Project

Market Data Reconciliation Pipeline

Building a validation-first financial data ingestion pipeline using NSE Bhavcopy data.

Current work includes:

  • historical exchange-data ingestion (2021–2026)
  • semantic validation of trading-date consistency
  • detection of stale-data substitution in NSE archives
  • schema normalization for non-standard CSV exports
  • quarantine/revalidation workflows for ingestion failures
  • data-quality documentation and ADR-driven pipeline design

Key finding so far: NSE archive can silently return previous trading day's data under weekend/holiday filenames — a failure mode capable of corrupting naive backtests without raising exceptions.


Currently Learning

  • Advanced SQL (window functions, optimization, analytical patterns)
  • PySpark and distributed processing fundamentals
  • Dimensional modeling and warehouse design
  • Ingestion-system architecture and data quality engineering

Reading

  • Fundamentals of Data Engineering — Reis & Housley
  • Designing Data-Intensive Applications — Martin Kleppmann
  • The Data Warehouse Toolkit — Ralph Kimball

Working With

Python SQL DuckDB Git Linux


Building Toward

PySpark · Airflow · dbt · AWS · Docker · Kafka

Long-term interest: reliable ML/data infrastructure systems where data engineering and machine learning operations intersect.


GitHub Stats

GitHub Stats GitHub Streak

Top Languages


Contact

Popular repositories Loading

  1. Syntax-Cartel-DevClash Syntax-Cartel-DevClash Public

    Python 1

  2. EPD EPD Public

    Eye tracker and auto scroll project

    JavaScript

  3. AbeerChaturvedi AbeerChaturvedi Public

  4. attendance_monitoring attendance_monitoring Public

    JavaScript