╔══════════════════════════════════════════════════════════════════════════════════╗
║ ⚠️ LEGAL DISCLAIMER & WARNING — READ BEFORE USE ⚠️ ║
╠══════════════════════════════════════════════════════════════════════════════════╣
║ ║
║ This tool is intended STRICTLY for authorized, ethical, and lawful use only. ║
║ The developer bears NO responsibility for any damage, data loss, legal ║
║ consequences, or misuse arising from direct or indirect use of this software. ║
║ ║
║ • Do NOT use this tool on systems or networks you do not own or have ║
║ explicit written authorization to test. ║
║ • Red team / penetration testing features are for AUTHORIZED ENGAGEMENTS ║
║ only. Unauthorized use may violate local, federal, or international laws. ║
║ • The author is NOT responsible for any illegal activities performed using ║
║ this software. You assume FULL responsibility for your actions. ║
║ • AI output may contain inaccuracies. Always verify code before deployment ║
║ in production or sensitive environments. ║
║ ║
║ By using NEXUS-CODER, you agree to these terms unconditionally. ║
╚══════════════════════════════════════════════════════════════════════════════════╝
NEXUS-CODER is a high-performance, fully offline terminal-based AI assistant engineered for professionals who demand raw speed, precision, and depth. Built on top of huihui-ai/Qwen2.5-Coder-7B-Instruct-abliterated — a fine-tuned, uncensored variant of Alibaba's elite coding model — NEXUS-CODER runs entirely on your hardware with no API costs, no rate limits, and no internet dependency after setup.
Designed with a zero-compromise philosophy: every parameter is tuned to minimize hallucinations and maximize factual, executable output. Whether you're reverse engineering a binary, writing a complex ETL pipeline, solving linear algebra proofs, or scripting automated recon tools — NEXUS-CODER delivers expert-tier answers with live streaming output directly in your terminal.
"Not a chatbot. A co-pilot for those who know what they're doing."
|
|
| Component | Minimum |
|---|---|
| OS | Linux (Ubuntu 20.04+), macOS 12+, Windows 10+ (WSL2 recommended) |
| Python | 3.10 or higher |
| RAM | 16 GB system RAM |
| Storage | 20 GB free disk space (model weights) |
| CPU | x86-64, 4+ cores, AVX2 support |
| Internet | Required only for initial model download |
⚠️ CPU-only mode is functional but significantly slower. Expect 2–8 tokens/sec depending on hardware.
| Component | Recommended |
|---|---|
| OS | Ubuntu 22.04 LTS / Debian 12 |
| Python | 3.11 |
| GPU | NVIDIA RTX 3080 / 4070 or better (8 GB+ VRAM) |
| VRAM | 8 GB minimum · 12 GB+ ideal for 4-bit NF4 |
| RAM | 32 GB |
| Storage | NVMe SSD, 25 GB free |
| CUDA | 11.8 or 12.1+ |
| cuDNN | 8.x+ |
| Driver | NVIDIA 525.xx or newer |
✅ With an RTX 3090 or 4090, expect 35–80 tokens/sec with 4-bit NF4 quantization.
╔─────────────────────────────────────────────────────────╗
│ 🏆 BATTLE-TESTED OPTIMAL SETUP │
├─────────────────────────────────────────────────────────┤
│ GPU → NVIDIA RTX 4090 / A100 / 3090 Ti │
│ VRAM → 16–24 GB │
│ RAM → 64 GB DDR5 │
│ CPU → AMD Ryzen 9 / Intel i9 (12+ cores) │
│ Drive → NVMe Gen4 SSD │
│ OS → Ubuntu 22.04 LTS │
│ CUDA → 12.1 │
│ Speed → 60–90 tok/s sustained │
╚─────────────────────────────────────────────────────────╝
git clone https://github.com/im-aswajith/nexus-coder.git
cd nexus-coder# Create environment
python3 -m venv .venv
# Activate — Linux / macOS
source .venv/bin/activate
# Activate — Windows
.venv\Scripts\activateFor NVIDIA GPU (CUDA 12.1):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121For NVIDIA GPU (CUDA 11.8):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118For CPU only:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpupip install -r requirements.txt💡 Tip: Install
bitsandbytesfor quantization support (strongly recommended):pip install bitsandbytes>=0.43.0
python -c "import torch; print('CUDA:', torch.cuda.is_available()); print('GPU:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A')"# Default session
python nexus_chat.py
# Named persistent session
python nexus_chat.py --session redteam_ops
# List all saved sessions
python nexus_chat.py --list-sessionsOnce launched, you interact directly in the terminal. The input prompt looks like this:
me: █
Type your question and press Enter. The AI streams its response token-by-token in real time:
me: write a python function to find all prime numbers up to n using the sieve method
ai:
## Sieve of Eratosthenes
```python
def sieve_of_eratosthenes(n: int) -> list[int]:
"""
Return all primes up to n (inclusive).
Time: O(n log log n) | Space: O(n)
"""
if n < 2:
return []
is_prime = bytearray([1]) * (n + 1)
is_prime[0] = is_prime[1] = 0
for i in range(2, int(n**0.5) + 1):
if is_prime[i]:
is_prime[i*i::i] = bytearray(len(is_prime[i*i::i]))
return [i for i, v in enumerate(is_prime) if v]
print(sieve_of_eratosthenes(50))
# [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
↳ 187 tokens · 54.3 tok/s · 3.44s
---
### Built-in Commands
| Command | What It Does |
|---------|-------------|
| `/help` | Display all available commands |
| `/new` | Create a fresh session (auto-saves current) |
| `/list` | Browse all saved sessions |
| `/load <name>` | Switch to a named session |
| `/delete <name>` | Permanently remove a session |
| `/export` | Save current session as a `.md` file |
| `/clear` | Wipe session history (keep session name) |
| `/stats` | Token usage, turn count, context budget |
| `/info` | Live hardware + model config readout |
| `/exit` | Quit gracefully |
---
### Keyboard Shortcuts
| Key | Action |
|-----|--------|
| `↑` / `↓` | Navigate input history |
| `Ctrl + C` | Interrupt / exit |
| `Ctrl + L` | Clear terminal screen |
---
<div align="center">
## 🗂️ Project Structure
</div>
nexus-coder/ │ ├── nexus_chat.py ← Main application entry point ├── requirements.txt ← Python dependencies ├── README.md ← This file │ └── ~/.nexus_chat/ ← Auto-created user data directory ├── memory.db ← SQLite persistent session storage ├── input_history ← Shell-style input history └── exports/ ← Markdown exports from /export command
---
<div align="center">
## 🛠️ Anti-Hallucination Configuration
</div>
NEXUS-CODER is precision-tuned to produce **accurate, deterministic output** — especially critical for security research and production code.
| Parameter | Value | Why |
|-----------|-------|-----|
| `temperature` | `0.15` | Near-deterministic output; eliminates creative drift |
| `top_p` | `0.90` | Nucleus sampling; discards improbable tokens |
| `repetition_penalty` | `1.15` | Prevents loops, verbose filler, and self-repetition |
| `max_new_tokens` | `2048` | Hard limit on runaway generation |
| `context_window` | `last 10 turns` | Focused, relevant conversation history |
> 🔧 Advanced users: edit the constants at the top of `nexus_chat.py` to tune for your specific workflow.
---
<div align="center">
## 🔒 GPU Quantization Strategy
</div>
CUDA Detected? │ ├─ YES ──► bitsandbytes available? │ │ │ ├─ YES ──► 4-bit NF4 (fastest · least VRAM) ✅ │ └─ NO ──► BFloat16 (fast · moderate VRAM) │ └─ NO ──► bitsandbytes available? │ ├─ YES ──► 8-bit INT8 (CPU · compressed) └─ NO ──► FP32 (CPU · no compression)
---
<div align="center">
## 🌐 Supported Domains
</div>
<div align="center">
















</div>
---
<div align="center">
## ❓ Troubleshooting
</div>
<details>
<summary><b>🔴 CUDA out of memory error</b></summary>
<br>
Reduce context or lower output limits:
```python
# In nexus_chat.py, edit these constants:
HISTORY_WIN = 6 # fewer turns in context
MAX_NEW = 1024 # fewer output tokens
Or free VRAM by closing other GPU applications before launching.
🟡 Model download is very slow
Use a HuggingFace mirror:
export HF_ENDPOINT=https://hf-mirror.com
python nexus_chat.py🟡 bitsandbytes not working on Windows
Use WSL2 (Ubuntu 22.04) for best compatibility. Native Windows support for bitsandbytes is limited.
# Inside WSL2:
pip install bitsandbytes --prefer-binary🟠 Very slow generation on CPU
CPU inference is expected to be slow (2–8 tok/s). To speed things up:
- Reduce
MAX_NEWto512 - Reduce
HISTORY_WINto4 - Consider using
llama.cppwith the GGUF version of this model for faster CPU inference - Even a used RTX 3060 (8 GB) will be 8–15x faster than CPU
🔵 Context history grows too large
Use /clear to wipe session history while keeping the session name, or /new to start completely fresh.
NEXUS-CODER is a research and professional productivity tool.
The developer, contributors, and any affiliated parties expressly disclaim all liability for:
- Any direct, indirect, incidental, or consequential damages
- Data loss, system compromise, or unauthorized access attempts
- Legal consequences arising from misuse of this software
- Inaccurate, incomplete, or harmful AI-generated content
- Any actions taken based on output produced by this tool
This software is provided "AS IS" without warranty of any kind, express or implied, including but not limited to warranties of merchantability, fitness for a particular purpose, or non-infringement.
Use of red team, security, or offensive features must comply with all applicable laws and must only be performed on systems for which you have explicit written authorization.
The developer is not responsible for any issues, damages, legal actions, or losses — financial, reputational, or otherwise — resulting from the use or misuse of this tool.
You assume full and sole responsibility for all actions taken using this software.
Contributions, issues, and feature requests are welcome!
- Fork the repository
- Create your feature branch:
git checkout -b feature/your-feature - Commit your changes:
git commit -m 'Add: your feature description' - Push to the branch:
git push origin feature/your-feature - Open a Pull Request
Please ensure all code follows PEP8, includes type hints, and passes basic testing before submitting.
© All Rights Reserved.
This software and its source code are the exclusive intellectual property of the author. No part of this project may be copied, modified, distributed, sublicensed, or used in any form — commercial or otherwise — without explicit written permission from the author.
Viewing the source code on GitHub does not grant any rights to use, reproduce, or build upon it.
This project is independent and not affiliated with Alibaba, Hugging Face, or any other organization. Model weights belong to their respective owners.
A deep thank you to the incredible open-source community that made this possible:
| Project | Contribution |
|---|---|
| Alibaba / Qwen Team | Qwen2.5-Coder base model |
| huihui-ai | Abliterated fine-tune variant |
| HuggingFace | transformers, accelerate, model hosting |
| bitsandbytes | Quantization engine (4-bit / 8-bit) |
| Rich | Beautiful terminal rendering & syntax highlighting |
| prompt_toolkit | Powerful terminal input handling |
| PyTorch | The deep learning backbone |
Special thanks to every open-source developer, security researcher, and data scientist who shares knowledge freely — you make tools like this possible.
Built with 🖤 for the terminal. By someone who lives in it.