m1ck m1guelozana

👋 Hi, I'm Miguel Ozana

🚀 About Me

I am a Junior Data Engineer focused on data quality, reliability, and governance in Big Data environments.
I work mainly with Python and PySpark, handling large-scale data processing in Data Lakes using Bronze, Silver, and Gold layers.

I am especially interested in building reliable and auditable data pipelines, ensuring data consistency from ingestion to analytical consumption.

🧠 Core Skills

Data Engineering fundamentals
Data validation and profiling
Data comparison between heterogeneous sources
Schema normalization and data type alignment
Data quality checks and reconciliation

🛠️ Technologies & Tools

Languages

Python
SQL

Big Data & Cloud

Apache Spark / PySpark
Azure Synapse Analytics
Azure Data Lake Storage (ADLS Gen2)
AWS (S3, basic services and concepts)
Databricks (fundamentals and notebooks)

Data Formats

Delta Lake
Parquet
CSV

Other Tools

Git & GitHub
Jupyter Notebooks
Kubernetes (basic concepts)

📊 What You'll Find in My Repositories

Generic notebooks for data validation and table comparison
Data profiling scripts
Comparisons between CSV, Parquet, and Delta datasets
Automated Excel reports for data conformity and discrepancies
Practical projects focused on data quality and governance

🎯 Career Goal

To grow as a Data Engineer, strengthening my skills in distributed data processing, modern data architectures, and cloud-based data platforms, while contributing to reliable and scalable data solutions.

📫 Contact

💼 LinkedIn: Miguel Ozana
📧 Email: miguelozana@gmail.com

⭐ If you find something useful here, feel free to star the repository!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly