Skip to content
View fasfous92's full-sized avatar

Highlights

  • Pro

Block or report fasfous92

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
fasfous92/README.md

πŸ‘‹ Hi, I'm Youssef Sidhom

Welcome to my GitHub profile! I'm passionate about:

  • AI Engineering
  • Data Engineering
  • Data Science
  • Machine Learning

I love solving real-world problems through code


πŸ”­ Current Work


🏫 Education

  • Γ‰cole Polytechnique, Master in Data Science

    • September 2025 - December 2026 (Ongoing)
    • Relevant Courses:
      • Optimization for Data Science
      • Deep Learning (PyTorch, Keras)
      • Reinforcement Learning
      • Advanced AI for Text and Graphs (LoRA, RAG, Graph AI)
  • INSA Lyon, Software Engineer

    • September 2020 - July 2025 (Validated)
    • Relevant Courses:
      • Foundation of Data Engineering
      • Machine Learning and Data Analytics
      • Object Oriented Programming (C++)
      • 2 years of STEM classes

πŸ’» Projects

Here are some of the notable projects I've worked on during my academic journey:

Multi-Agent Structural Bias Assessment for Educational Content

  • πŸ₯‡ First Prize Winner at the IPAI Foundation Hackathon on Education. Uses multiple LLM agents debating each other through a Socratic framework to identify different types of bias in educational text.
  • Agents analyze text from different angles and synthesize their findings to surface nuanced bias patterns with quantitative scores. The debate approach helps catch biases that single-model analysis would miss.
  • Evaluates content across customizable bias dimensions, combining agent outputs into consolidated results with visual summaries and detailed breakdowns by dimension.

Vision-Language Fine-tuning for Automated Document Verification

  • Capstone project applying LoRA fine-tuning to Qwen2.5-VL and InternVL 2.0 to improve visual grounding and bounding box detection on document images.
  • Evaluated four detection paradigms (YOLO11n, RT-DETR, Mistral OCR 3, and MLLMs), achieving F1-scores up to 95.3% and significantly outperforming zero-shot baselines.
  • Developed parametrized JSON output format for precise bounding-box coordinates, enabling scalable document authentication and validation workflows.

Bridging Structured Chemical Graphs and Natural Language

  • Multi-modal system that translates 2D molecular structures into human-readable scientific descriptions, automating chemical database enrichment and drug discovery reporting.
  • Dual-tower architecture combining ChEmbed (BASF-AI) with a Graph Transformer using global self-attention, aligning symbolic graph representations with semantic text through trainable adapter layers.
  • Trained with InfoNCE contrastive loss, hard negative mining via Tanimoto similarity, and Matryoshka representation learning for robust multi-modal alignment.
Click to view other projects

Real-time RAG Agent for Public Transport

  • AI assistant that helps users navigate the Paris transport network using LLM tool-calling (Llama 3.1 405B via NVIDIA NIM) to query live APIs for itineraries and traffic disruptions.
  • Microservices architecture deployed with Docker Compose, integrating Apache Kafka (KRaft) and ElasticSearch for real-time data streaming and retrieval.
  • Combines RAG, tool-calling, and live API integration to deliver accurate, up-to-date transport guidance.

Time-series Event Classification on 2025 Roland Garros Final

  • Classifies "Hit", "Bounce", and "Air" states from raw (x,y) coordinates by engineering kinematic features (Acceleration, Jerk, Turn Angle) to capture physical "shocks".
  • Implemented an optimized LightGBM model that outperformed CatBoost and XGBoost baselines in handling extreme class imbalance.
  • Built an unsupervised pipeline using UMAP embeddings and Gaussian Mixture Models (GMM) to cluster events without labels.

ETL Pipeline for Mental Health Data Analysis

  • Built a robust data ingestion pipeline to scrape posts from Reddit and HealthUnlocked, using Redis for deduplication and MongoDB for storage.
  • Used LLMs (Mistral and Ollama 1B) for sentiment analysis, keyword extraction, gender inference, and detection of self-diagnosis and self-medication mentions to enrich the dataset.
  • Designed a common database schema and cleaned the augmented data with pandas for efficient querying, visualization, and reporting.

Industrial Data Collection Protocol for Production Lines

  • Set up a data collection protocol within the production lines of Geberit's factory in Haldensleben, Germany, working closely with stakeholders to align with business goals.
  • Designed the SQL Server database schema and chose communication protocols (OPC-UA and SAP Plant Connectivity) suited to the factory environment.
  • Built interactive dashboards in C# and CSHTML using the MVC model to highlight KPIs and provide real-time data visualization.

🌟 Skills

  • Programming Languages: Python, C++, C, Java, JavaScript, C#
  • AI / ML Frameworks: PyTorch, TensorFlow, Scikit-learn, Hugging Face, LightGBM, XGBoost, CatBoost
  • LLM & GenAI: LoRA Fine-tuning, RAG, Tool-calling, Multi-Agent Systems, Prompt Engineering, Vision-Language Models (Qwen2.5-VL, InternVL), Llama, Mistral, Gemini, Ollama
  • Data Engineering: Apache Kafka, Airflow, Docker, ElasticSearch, ETL Pipelines
  • Databases: MySQL, MS SQL Server, MongoDB, Redis, Neo4j
  • Computer Vision: YOLO, RT-DETR, OCR, Object Detection, Bounding Box Detection
  • Data Science: Pandas, NumPy, UMAP, Gaussian Mixture Models, Feature Engineering, Time-series Analysis
  • Tools & Other: Git, Docker Compose, Vue.js, ASP.NET, OPC-UA

πŸ“« How to Reach Me


⚑ Fun Fact

πŸ€ I have played basketball all my life and I am only 175cm (5.7ft) tall πŸ€

Pinned Loading

  1. NourJadiri/mental_health_disorders_analysis NourJadiri/mental_health_disorders_analysis Public

    Analysis of the frequency of self diagnosed people with mental health disorders

    Jupyter Notebook

  2. public_transport_RAG public_transport_RAG Public

    Python 2

  3. QSA_tennis_bounce_hit QSA_tennis_bounce_hit Public

    Python

  4. Geberit_WebHMI Geberit_WebHMI Public archive

    HTML 1

  5. Molecular_graph_captionning Molecular_graph_captionning Public

    Python

  6. Object_detection_in_documents Object_detection_in_documents Public

    Jupyter Notebook 1