Skip to content
View m1guelozana's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report m1guelozana

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
m1guelozana/README.md

πŸ‘‹ Hi, I'm Miguel Ozana

Python Apache Spark Azure AWS Databricks Git


πŸš€ About Me

I am a Junior Data Engineer focused on data quality, reliability, and governance in Big Data environments.
I work mainly with Python and PySpark, handling large-scale data processing in Data Lakes using Bronze, Silver, and Gold layers.

I am especially interested in building reliable and auditable data pipelines, ensuring data consistency from ingestion to analytical consumption.


🧠 Core Skills

  • Data Engineering fundamentals
  • Data validation and profiling
  • Data comparison between heterogeneous sources
  • Schema normalization and data type alignment
  • Data quality checks and reconciliation

πŸ› οΈ Technologies & Tools

Languages

  • Python
  • SQL

Big Data & Cloud

  • Apache Spark / PySpark
  • Azure Synapse Analytics
  • Azure Data Lake Storage (ADLS Gen2)
  • AWS (S3, basic services and concepts)
  • Databricks (fundamentals and notebooks)

Data Formats

  • Delta Lake
  • Parquet
  • CSV

Other Tools

  • Git & GitHub
  • Jupyter Notebooks
  • Kubernetes (basic concepts)

πŸ“Š What You'll Find in My Repositories

  • Generic notebooks for data validation and table comparison
  • Data profiling scripts
  • Comparisons between CSV, Parquet, and Delta datasets
  • Automated Excel reports for data conformity and discrepancies
  • Practical projects focused on data quality and governance

🎯 Career Goal

To grow as a Data Engineer, strengthening my skills in distributed data processing, modern data architectures, and cloud-based data platforms, while contributing to reliable and scalable data solutions.


πŸ“« Contact


⭐ If you find something useful here, feel free to star the repository!

Pinned Loading

  1. comparador_e_validador_de_tabelas_datalake_sas comparador_e_validador_de_tabelas_datalake_sas Public

    Comparador e validador de tabelas entre datalake azure e sas.

    Jupyter Notebook 1

  2. ETL ETL Public

    ETL feito com python

    Jupyter Notebook 2