This repository contains the end-to-end experimental pipeline for Figurative Language Understanding based on the BESSTIE dataset.
Our work replicates standard encoder and decoder baselines and implements two extension architectures: Mixture-of-Adapters (MoA) and Variety-Aware Tensor-of-Cues (VA-ToC).
Figurative language understanding (sarcasm detection and sentiment analysis) remains a major challenge for NLP models due to its dependence on implicit meaning and context. In national varieties of English (e.g., Australian en-AU, Indian en-IN, British en-UK), the expression of sarcasm and sentiment is heavily influenced by regional pragmatics, slang, and cultural cues. Models trained on general datasets frequently experience significant performance drop-offs when evaluated across these regional boundaries.
The key objectives are:
- Benchmark standard models (BERT, RoBERTa, Mistral) under variety and domain shifts.
- Measure generalization drops across regional English dialects (AU, IN, UK) and platforms (Google Reviews, Reddit).
- Develop targeted model extensions (Mixture-of-Adapters and Tensor-of-Cues) to improve model robustness.
Our project introduces two extension frameworks targeting task-specific constraints and geographical varieties (en-AU, en-IN, en-UK):
- Mixture-of-Adapters (MoA): Adds low-rank adapter modules for each target task at the transformer layer level, utilizing a routing/gating network to weight adapter representations based on text embeddings.
- Variety-Aware Tensor-of-Cues (VA-ToC): A structured prompting method for decoder models (Mistral) where a variety-routing classifier (trained using XLM-RoBERTa) predicts the national variety of the text. This prediction is mapped to hyperbolic variety embeddings and injected into instruction prompts to guide the language model's predictions.
Our proposed architectures show performance improvements over standard fine-tuned baselines, particularly under variety and domain shifts:
- Zero-Shot Sarcasm Generalization (+89.6% Gain): The Variety-Aware Tensor-of-Cues (VA-ToC) strategy increases zero-shot sarcasm Macro-F1 on the Mistral-7B model from 0.20 to 0.38. This indicates that structured pragmatic cues and hyperbolic variety embeddings guide decoder predictions under zero-shot transfer.
- Geographical Variety Robustness (+33.7% Gain): In cross-variety sarcasm transfer (e.g., training on Australian English and testing on British/Indian English), the Mixture-of-Adapters (MoA) model increases RoBERTa's Macro-F1 from 0.49 to 0.65. Conditional adapter routing prevents representation collapse under regional shifts.
- In-Variety Adaptation (+24.0% Gain): Within matched variety partitions, VA-ToC increases sarcasm Macro-F1 on Mistral from 0.54 to 0.67 and sentiment classification from 0.81 to 0.92, showing successful local adaptation.
The pipeline evaluates two binary classification tasks:
sentiment(positive vs. negative)sarcasm(sarcastic vs. non-sarcastic)
We evaluate under the following experimental setups:
- In-domain: Trained and tested within the same source/domain (Google vs. Reddit).
- Cross-domain: Trained on Google Reviews, tested on Reddit (and vice versa) for sentiment. Sarcasm follows the Reddit-only benchmark setup.
- FULL: Trained on the combined training pool and evaluated on the full validation set.
- Cross-variety matrix: Trained on one national variety (e.g., en-AU) and tested on each national variety (en-AU, en-IN, en-UK) to measure geographical shift.
All models report: Macro-F1, Accuracy, Precision, Recall, and per-variety breakdown tables.
data/: Raw source datasets and standardized processed outputs.src/figlang/: Reusable Python package modules (data loading, remapping, preprocessing, training loops, visualization).notebooks/: Phase-based Jupyter Notebooks used for remote Google Colab execution.models/: Saved metrics, predictions, plots, and training checkpoints.docs/: Project report PDF and paper documentation.scripts/: Standalone Python execution scripts wrapping the package logic.
To install the project dependencies and the figlang package in editable mode locally:
# Clone the repository
git clone https://github.com/ameermasood/FigLangUnderstanding.git
cd FigLangUnderstanding
# Install package and requirements
pip install -e .The pipeline can be run in two modes: script-wise (local command-line) and notebook-wise (Google Colab).
We provide a unified CLI tool flu to run pipeline stages locally. Execute these commands in order:
- Run sanity checks (validates notebook formatting, compiles code, and runs tests):
flu check
- Run preprocessing (reads raw data, applies schemas, and splits train/validation index files):
flu preprocess --overwrite
- Inspect generated indexes (validates local paths and remapped Colab configs):
flu inspect
- Run baseline training scripts:
# BERT Baseline flu train-bert --run-source # RoBERTa Baseline flu train-roberta --run-source # Mistral Baseline (requires GPU / unsloth environment) flu train-mistral --run-source
- Run extension training scripts:
- Extension 1 (RoBERTa MoA):
flu train-roberta-moa --run-source
- Extension 2 (Mistral Hyperbolic-ToC):
First, train the prerequisite variety routing classifier:
Then run the Mistral Hyperbolic prompt training loop:
flu train-variety-router --run-source
flu train-mistral-toc --run-source
- Extension 1 (RoBERTa MoA):
- Evaluate and compile metrics:
# Compare metric files side-by-side flu compare --best flu compare --deltas # Plot Macro-F1 heatmaps flu plot --task sentiment --out models/heat_sentiment.png flu plot --task sarcasm --out models/heat_sarcasm.png
If running on Colab, notebooks assume a configurable project root BASE = Path("/content/drive/MyDrive/DNLP") and mount Google Drive automatically. Run the notebooks in the following order:
- Phase P0 (Data Collection & Exploration):
notebooks/P0_01_data_collection.ipynbnotebooks/P0_02_data_exploration.ipynb
- Phase P1 (Data Preprocessing):
notebooks/P1_01_data_preprocessing.ipynb
- Phase P2 (Baselines):
notebooks/P2_01_baseline_bert.ipynbnotebooks/P2_02_baseline_roberta.ipynbnotebooks/P2_03_baseline_mistral.ipynb
- Phase P3 (Extensions):
- Extension 1 (RoBERTa MoA): Run
notebooks/P3_01_extension_roberta_mixture_of_adapters.ipynb. - Extension 2 (Mistral ToC): Run the prerequisite
notebooks/P3_02_extension_mistral_variety_classifier.ipynbfirst, followed by the main notebooknotebooks/P3_03_extension_mistral_hyperbolic_toc.ipynb.
- Extension 1 (RoBERTa MoA): Run
This project was developed by:
- Amir Masoud Almasi — amirmasoud.almasi@studenti.polito.it
- Balzhan Dosmukhametova — balzhan.dosmukhametova@studenti.polito.it
- Mehdi Nickzamir — mehdi.nickzamir@studenti.polito.it
- Ashkan Shafiei — ashkan.shafiei@studenti.polito.it
This project was developed as part of the Deep Natural Language Processing (DNLP) curriculum at Politecnico di Torino. We would like to express our gratitude to Professor Luca Cagliero and teaching assistants Ali Yassine and Giuseppe Gallipoli for their guidance, valuable feedback, and support throughout the project.
