This repository contains a token-classification model trained on the DAMASHA-MAS benchmark, introduced in:
DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution
The model aims to segment mixed human–AI text at token level – i.e., decide for each token whether it was written by a human or an LLM, even under syntactic adversarial attacks.
HF model RMC: saiteja33/DAMASHA
- Base encoders:
- Architecture (high level): RoBERTa + ModernBERT feature fusion → BiGRU + CRF with the Info-Mask gating mechanism from the paper.
- Task: Token classification (binary authorship: human vs AI).
- Language: English
- License (this model): MIT
- Training data license: CC-BY-4.0 via the DAMASHA dataset.
If you use this model, please also cite the DAMASHA paper and dataset (see Citation section).
-
Fine-grained mixed-authorship detection
Predicts authorship per token, allowing reconstruction of human vs AI spans in long documents. -
Adversarially robust
Trained and evaluated on syntactically attacked texts (misspelling, Unicode substitutions, invisible characters, punctuation swaps, case perturbations, and “all-mixed” attacks). -
Human-interpretable Info-Mask
The architecture incorporates stylometric features (perplexity, POS density, punctuation density, lexical diversity, readability) via an Info-Mask module that gates token representations in an interpretable way. -
Strong reported performance (from the paper)
On DAMASHA-MAS, the RMC* model (RoBERTa + ModernBERT + CRF + Info-Mask) achieves:- Token-level: Accuracy / Precision / Recall / F1 ≈ 0.98
- Span-level (strict): SBDA ≈ 0.45, SegPre ≈ 0.41
- Span-level (relaxed IoU ≥ 0.5): ≈ 0.82
⚠️ The exact numbers for this specific checkpoint may differ depending on training run and configuration. The values above are from the paper’s best configuration (RMC*).
-
Research on human–AI co-authorship
- Studying where LLMs “take over” in mixed texts.
- Analysing robustness of detectors under adversarial perturbations.
-
Tooling / applications (with human oversight)
- Assisting editors, educators, or moderators to highlight suspicious spans rather than making final decisions.
- Exploring interpretability overlays (e.g., heatmaps over tokens) when combined with Info-Mask outputs.
- Automated “cheating detector” / plagiarism court.
- High-stakes decisions affecting people’s livelihood, grades, or reputation without human review.
- Non-English or heavily code-mixed text (training data is English-centric).
Use this model as a signal, not a judge.
The model is trained on the MAS benchmark released with the DAMASHA paper and hosted as the Hugging Face dataset:
- Dataset:
saiteja33/DAMASHA
MAS consists of mixed human–AI texts with explicit span tags:
-
Human text comes from several corpora for domain diversity, including:
- Reddit (M4-Reddit)
- Yelp & /r/ChangeMyView (MAGE-YELP, MAGE-CMV)
- News summaries (XSUM)
- Wikipedia (M4-Wiki, MAGE-SQuAD)
- ArXiv abstracts (MAGE-SciGen)
- QA texts (MAGE-ELI5)
-
AI text is generated by multiple modern LLMs:
- DeepSeek-V3-671B (open-source)
- GPT-4o, GPT-4.1, GPT-4.1-mini (closed-source)
Authorship is marked using explicit tags around AI spans:
<AI_Start>…</AI_End>denote AI-generated segments within otherwise human text.- The dataset stores text in a
hybrid_textcolumn, plus metadata such ashas_pair, and adversarial variants includeattack_name,tag_count, andattacked_text. - Tags are sentence-level in annotation, but the model is trained to output token-level predictions for finer segmentation.
During training, these tags are converted into token labels (2 labels total; see
config.id2labelin the model files).
MAS includes multiple syntactic attacks applied to the mixed text:
- Misspelling
- Unicode character substitution
- Invisible characters
- Punctuation substitution
- Upper/lower case swapping
- All-mixed combinations of the above
These perturbations make tokenization brittle and test robustness of detectors in realistic settings.
The model follows the Info-Mask RMC* architecture described in the DAMASHA paper:
- Dual encoders
- RoBERTa-base and ModernBERT-base encode the same input sequence.
- Feature fusion
- Hidden states from both encoders are fused into a shared representation.
- Stylometric Info-Mask
- Hand-crafted style features (perplexity, POS density, punctuation density, lexical diversity, readability) are projected, passed through multi-head attention, and turned into a scalar mask per token.
- This mask gates the fused encoder states, down-weighting style-irrelevant tokens and emphasizing style-diagnostic ones.
- Sequence model + CRF
- A BiGRU layer captures sequential dependencies, followed by a CRF layer for structured token labeling with a sequence-level loss.
Key hyperparameters used for the Info-Mask models on MAS:
- Number of labels: 2
- Max sequence length: 512
- Batch size: 64
- Epochs: 5
- Optimizer: AdamW (with cosine annealing LR schedule)
- Weight decay: 0.01
- Gradient clipping: 1.0
- Dropout: Dynamic 0.1–0.3 (initial 0.1)
- Warmup ratio: 0.1
- Early stopping patience: 2
Hardware & compute (as reported):
- AWS EC2 g6e.xlarge, NVIDIA L40S (48GB) GPU, Ubuntu 24.04
- ≈ 400 GPU hours for experiments.
The exact training script used for this checkpoint is available in the project GitHub:
https://github.com/saitejalekkala33/DAMASHA