AXEN-M is a transformer-based framework designed for large-scale language modeling, fine-tuning, and efficient attention handling. Built by Vinkura, it integrates Infini-Attention with LoRA-based fine-tuning to provide scalability and optimized performance for long-context tasks.
-
Advanced Attention Mechanisms
- Infini-Attention implementation for extended sequence handling
- Optimized memory usage with sliding window attention
- Support for sequences up to 32K tokens
-
Model Architecture
- LoRA (Low-Rank Adaptation) integration for efficient fine-tuning
- Compatible with LLaMA model architectures
- Modular transformer components
-
Training Capabilities
- 4-bit quantization support
- Gradient checkpointing
- Mixed-precision training
- Distributed training support
-
Development Tools
- Comprehensive data preprocessing pipeline
- Attention pattern visualization
- Performance metrics and evaluation
- Extensive test coverage
- Python 3.8 or higher
- CUDA 11.7+ (for GPU support)
- 16GB RAM minimum (32GB recommended)
- Linux, macOS, or Windows (via WSL2)
- Git
- PyTorch 2.0+
- CUDA toolkit (for GPU support)
- cuDNN (for GPU support)
infini-attention/
│
│
│
├── configs/
│ ├── single_node.yaml # Config for single-node training
│ ├── two_node.yaml # Config for two-node distributed training
│ └── zero3_offload.json # DeepSpeed Zero3 offload config
│
├── scripts/
│ └── train.sh # Shell script for training
│
├── src/ # Source code directory
│ ├── __init__.py
│ ├── main.py
│ ├── fine_tune.py
│ ├── llama_model.py
│ ├── temp_utils.py
│ └── train.py
│
├── tests/
│ └── test_llama_model.py
├ └── test_utils.py
│
├── .gitignore
├── requirements.txt # List of dependencies
├── README.md # Complete project documentation
└── LICENSE
- Clone the repository:
git clone https://github.com/vinkuraai/axen-m.git
cd axen-m- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Linux/macOS
# or
venv\Scripts\activate # On Windows- Install the package:
# For basic installation
pip install -e .
# For development installation with extra tools
pip install -e ".[dev]"- Download CUDA Toolkit 11.7 or higher from NVIDIA website
- Install CUDA Toolkit following NVIDIA's instructions
- Verify CUDA installation:
nvcc --versionpip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117Run the verification script:
python -c "from axen_m import Model; print(Model.version_check())"# Add CUDA to PATH (Linux/macOS)
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
# Add CUDA to PATH (Windows)
set PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin;%PATH%# Reduce number of build workers
pip install -e . --no-cache-dir --no-build-isolation- Install development dependencies:
pip install -e ".[dev,test,docs]"- Install pre-commit hooks:
pre-commit install- Setup documentation build environment:
cd docs
make html# Required for distributed training
export MASTER_ADDR="localhost"
export MASTER_PORT="12355"
export WORLD_SIZE="1"
export LOCAL_RANK="0"
# Optional performance tweaks
export CUDA_LAUNCH_BLOCKING="1"
export CUDA_VISIBLE_DEVICES="0,1"Create config.yaml in your project root:
model:
type: "llama"
size: "7b"
quantization: 4
training:
batch_size: 32
gradient_accumulation: 4
mixed_precision: "fp16"
system:
num_workers: 4
pin_memory: trueSolution: Update your GPU drivers or modify CUDA architecture flags:
export TORCH_CUDA_ARCH_LIST="7.5;8.0;8.6"
pip install -e .Solution: Enable gradient checkpointing in your configuration:
model_config = {
"gradient_checkpointing": True,
"max_memory": {0: "12GB"}
}| Model Size | Sequence Length | Memory Usage | Training Speed | Inference Speed |
|---|---|---|---|---|
| 7B | 2048 | 14GB | 32 samples/s | 50 tokens/s |
| 7B | 8192 | 20GB | 24 samples/s | 45 tokens/s |
| 7B | 32768 | 28GB | 16 samples/s | 35 tokens/s |
We welcome contributions!
- Code style and standards
- Pull request process
- Development setup
- Testing requirements
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/This project is licensed under the MIT License. See LICENSE for details.
@software{axen_m2025,
author = {VinkuraAI},
title = {AXEN-M: Attention eXtended Efficient Network},
year = {2025},
publisher = {GitHub},
url = {https://github.com/vinkuraAI/axen-m}
}- Documentation: https://axen-m.readthedocs.io/
- Issue Tracker: GitHub Issues
- Discussions: GitHub Discussions
- The LLaMA team for their foundational work
- The LoRA authors for their efficient fine-tuning approach
- The open-source AI community

