OldSlavNet-Modernized

Neural Dependency Parser for Old Church Slavonic and Old East Slavic

Modern neural dependency parser for Old Slavic texts, combining state-of-the-art neural NLP with symbolic grammar validation.

Overview

OldSlavNet-Modernized is a modernization of the original OldSlavNet parser by Nilo Pedrazzini, updated to use:

Stanza framework (Stanford NLP)
DiaParser (biaffine attention)
PyTorch (modern deep learning)
Prolog validation (neural-symbolic hybrid)

Adapted from the Coptic dependency parser architecture.

Features

✨ Modern NLP Stack

Python 3.9+ compatible (no compilation needed!)
PyTorch backend
Easy installation with pip

🧠 Neural + Symbolic

Neural parsing with Stanza/DiaParser
Prolog-based grammatical validation
Handles Old Slavic complexity:
- 7 cases (nom, gen, dat, acc, inst, loc, voc)
- Dual number
- Complex aspect system (aorist, imperfect, perfect)

📚 Trained on TOROT

Uses Tromsø Old Russian and OCS Treebank
Supports multiple Old Slavic varieties

Installation

Prerequisites

Python 3.9 or higher
SWI-Prolog (for grammatical validation, optional)

Quick Start

Clone the repository:

git clone https://github.com/YOUR-USERNAME/oldslavnet-modernized.git
cd oldslavnet-modernized

Create virtual environment:

python3.9 -m venv .venv
source .venv/bin/activate  # On Linux/macOS
# .venv\Scripts\activate  # On Windows

Install dependencies:

pip install -r requirements.txt

Install SWI-Prolog (optional, for validation):

# Ubuntu/Debian:
sudo apt install swi-prolog

# macOS:
brew install swi-prolog

# Then install Python bindings:
pip install pyswip

Usage

Command Line

python oldslavic_parser.py \
  --input path/to/input.conllu \
  --output path/to/output.conllu \
  --model-dir path/to/models  # optional

Python API

from oldslavic_parser import OldSlavicParser

# Initialize parser
parser = OldSlavicParser(use_prolog=True)

# Parse text
text = "Въ начѧлѣ бѣ слово"  # "In the beginning was the Word"
result = parser.parse(text)

# Access parsed data
for sentence in result['sentences']:
    for word in sentence:
        print(f"{word['text']}\t{word['lemma']}\t{word['upos']}\t{word['deprel']}")

Pre-trained Model

A trained DiaParser model is available in data/models/oldslavic_parser/ with strong performance on TOROT:

UAS (Unlabeled Attachment Score): 86.47%
LAS (Labeled Attachment Score): 81.48%
Training data: ~30K sentences from TOROT treebank
Vocabulary: 22,191 words, 43 dependency relations

Evaluate the Model

python scripts/evaluate_parser.py \
  --model data/models/oldslavic_parser/model \
  --test data/training/test.conllu

Training Your Own Model

To retrain or train on your own data:

1. Data Preparation

# Prepare TOROT data (or your own CoNLL-U files)
python scripts/prepare_torot_data.py \
  --torot-dir /path/to/torot \
  --output-dir data/training

2. Train DiaParser Model

python scripts/train_diaparser.py \
  --train data/training/train.conllu \
  --dev data/training/dev.conllu \
  --test data/training/test.conllu \
  --output data/models/oldslavic_parser

Training takes ~5-6 hours on CPU for 100 epochs.

Architecture

Neural Component

Tokenizer: Stanza
POS Tagger: BiLSTM-CRF (Stanza)
Lemmatizer: Sequence-to-sequence (Stanza)
Parser: Biaffine attention (DiaParser)

Symbolic Component (Prolog) - In Development

A Prolog-based validation layer is under development to detect and correct neural parser errors. The framework includes rules for:

Case agreement rules (7-case system)
Number agreement (singular/dual/plural)
Genitive of negation (objects → genitive with negated verbs)
Participle agreement (case/number/gender)
Dependency relation constraints

Status: Rule definitions complete (oldslavic_prolog_rules.py), integration with parser pipeline planned for v2.0.

This follows the proven neural-symbolic architecture of the Coptic dependency parser, which uses Janus Prolog for error detection and hallucination prevention.

Comparison with Original OldSlavNet

Feature	Original (2021)	Modernized (2025)
Framework	dynet	Stanza/PyTorch
Python	3.4-3.9	3.9-3.12
Installation	Complex compilation	Simple pip install
Grammar validation	None	Prolog rules
GUI	No	Planned
Maintenance	Archived	Active

Project Status

Current (v1.0):

Core parser architecture
DiaParser model trained (86.47% UAS, 81.48% LAS on TOROT)
Command-line interface
Prolog rule framework (foundation laid)

Planned Development:

Neural-symbolic integration (connect DiaParser → Prolog validator)
- Implement error detection for parser hallucinations
- Automatic correction based on Old Slavonic grammar rules
- Following proven Coptic parser architecture
Extended Prolog rules (aspect, word order, clitics)
Stanza tokenizer/tagger models (optional enhancement)
Web demo (Hugging Face Spaces)
GUI application
Unified ancient language framework

Credits

Original OldSlavNet:

Nilo Pedrazzini (2020-2021)
Based on jPTDP architecture
Trained on TOROT treebank

Modernization:

Architecture adapted from Coptic dependency parser
Neural-symbolic integration inspired by Coptic parser's Prolog validation

Data:

TOROT - Tromsø Old Russian and Old Church Slavonic Treebank
Universal Dependencies framework

Contributing

Contributions welcome! Areas needing work:

Model training - Train Stanza models on TOROT
Prolog rules - Expand Old Slavic grammar coverage
Testing - Validation on historical texts
Documentation - Usage examples, tutorials
GUI - Desktop application with visualization

License

CC BY-NC-SA 4.0 - See LICENSE file

Citation

If you use this parser, please cite:

Original OldSlavNet:

@inproceedings{pedrazzini2020oldslavnet,
  title={Exploiting Cross-Dialectal Gold Syntax for Low-Resource Historical Languages},
  author={Pedrazzini, Nilo},
  booktitle={CHR 2020},
  year={2020}
}

Modernization (paper forthcoming)

Contact

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: relanir@bluewin.ch

Related Projects

Original OldSlavNet - dynet-based version
Coptic Parser - Sister project for Coptic
TOROT - Training data
Stanza - NLP framework

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
docs		docs
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md
oldslavic_parser.py		oldslavic_parser.py
oldslavic_prolog_rules.py		oldslavic_prolog_rules.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OldSlavNet-Modernized

Overview

Features

Installation

Prerequisites

Quick Start

Usage

Command Line

Python API

Pre-trained Model

Evaluate the Model

Training Your Own Model

1. Data Preparation

2. Train DiaParser Model

Architecture

Neural Component

Symbolic Component (Prolog) - In Development

Comparison with Original OldSlavNet

Project Status

Credits

Contributing

License

Citation

Contact

Related Projects

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OldSlavNet-Modernized

Overview

Features

Installation

Prerequisites

Quick Start

Usage

Command Line

Python API

Pre-trained Model

Evaluate the Model

Training Your Own Model

1. Data Preparation

2. Train DiaParser Model

Architecture

Neural Component

Symbolic Component (Prolog) - In Development

Comparison with Original OldSlavNet

Project Status

Credits

Contributing

License

Citation

Contact

Related Projects

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages