π Data Engineer @ VNPT AI
πΉ Building scalable Lakehouse & Data Platforms
πΉ Experienced with Big Data, Streaming, and Cloud Infrastructure
πΉ Passionate about Data Infrastructure, APIs, and Workflow Orchestration
- Python (ETL, APIs, data pipelines, orchestration)
- Java (Big Data, Kafka, Flink, Spark ecosystem)
- Apache Iceberg, Delta Lake
- Apache Spark, Apache Flink
- Kafka, Kafka Connect, Debezium (CDC from Postgres/MySQL/MongoDB)
- Google BigQuery, Cloud Scheduler
- AWS S3, MinIO
- GCS (Google Cloud Storage)
- PostgreSQL, MySQL, MongoDB
- Qdrant (Vector Database)
- Apache Superset
- Apache Airflow, Cronjob, Cloud Scheduler
- Docker, Docker Compose
- Kubernetes, Helm
- Terraform, GitHub Actions
- FastAPI
- Git, GitHub (version control & collaboration)
- ποΈ Lakehouse with Iceberg + Spark β End-to-end data lakehouse with schema evolution & time travel
- π CDC Data Pipelines with Debezium + Kafka β Real-time CDC ingestion from Postgres/MySQL/MongoDB into lakehouse
- π BigQuery ETL Framework β Managed ETL workflows using Airflow + BigQuery
- βοΈ Data Platform on GCP β Orchestration with Cloud Scheduler, storage in GCS, analytics in BigQuery
- π BI Dashboard with Superset β Interactive dashboards on top of data warehouse
- π³ K8s Data Service Deployment β Deploying scalable data services with Helm & Kubernetes
- π Vector Search with Qdrant β Semantic search and embedding-powered retrieval pipeline
- Data mesh & federated query engines (Trino/Presto, Dremio)
- Advanced Iceberg optimizations (partitioning, compaction, metadata scaling)
- Hybrid pipelines (batch + streaming with Flink + Spark)
- AI/LLM integration with vector databases (Qdrant)
βοΈ From ducdn