FLAI β a fully local personal assistant powered by artificial intelligence. Run your own AI stack entirely on-premises with no cloud dependencies.
- π¬ Intelligent Chat β smart request routing (fast models for simple queries, powerful models for complex reasoning)
- π§ Advanced Reasoning β dedicated model for calculations, code generation, creative writing
- π Multimodal Analysis β upload images and ask questions about their content (llama.cpp + mmproj)
- π¨ Image Generation β create images from text using stable-diffusion.cpp with automatic prompt optimization
- βοΈ Image Editing β upload an image and ask to edit it (Flux.2 Klein 4B model: change colors, remove objects, stylize)
- π€ Voice Transcription β convert voice messages to text using Whisper ASR (faster_whisper)
- π£οΈ Text-to-Speech β hear responses spoken aloud via Piper TTS (male/female voice)
- π RAG with Qdrant β upload documents (PDF, DOC, DOCX, TXT) and ask questions about their content
- ποΈ Chat Sessions β multiple independent conversations with auto-titling
- πΎ Export Chats β save conversations as HTML files with embedded media
- πΉ Camera Surveillance β request snapshots from IP cameras and analyze them with multimodal models
- π Access Control β granular camera permissions per user via admin panel
- π 100% Local β all processing happens on your hardware; no data leaves your network
- π Session-based Auth β secure user authentication with password hashing (Werkzeug)
- π‘οΈ File Access Control β uploaded files are served only to authorized users
- π§Ή Data Isolation β each user's sessions, messages, and documents are strictly separated
- π CSRF Protection β Cross-Site Request Forgery protection for all forms
- π¦ Rate Limiting β brute-force attack protection on login (5 attempts/minute)
- π Session Security β HttpOnly and SameSite cookies, secure flag for HTTPS
- π Audit Logging β login attempts and admin actions are logged
- π HMAC-signed Queue β Redis queue tasks are signed to prevent tampering
- π‘οΈ Input Validation - Strict validation of user inputs (logins, passwords, model parameters) to prevent injection attacks and malformed data.
- π Multi-language Support β full interface and AI responses in Russian and English
- π Dark/Light Theme β toggle between themes with persistent preference storage
- ποΈ Voice Gender Selection β choose male or female voice for TTS responses
- π Request Queue β real-time status tracking with position indicators for queued requests
- π File Attachments β support for images, audio files, and documents in conversations
- π Notifications β unread message indicators and blinking status icons for processing/queued requests
- π€ User Management β add, edit, delete users; change passwords; assign service classes
- π Camera Permissions β control which users can access which cameras (Optional)
- π€ Model Management β select and configure GGUF models for chat, reasoning, multimodal, and embedding directly from the admin panel
- πΎ Backup & Restore β create and restore full or user-only backups directly from the admin interface
- π System Monitoring β view database sizes and system statistics
- π§ CLI Tools β manage admin password via Flask CLI command
FLAI v8.0 is a modular Flask application that orchestrates self-hosted AI services built on the llama.cpp ecosystem.
| v7.5 (Old) | v8.0 (New) | Notes |
|---|---|---|
| Ollama | llama.cpp (router mode) | Single server, dynamic model switching via --models-dir |
| Automatic1111 | stable-diffusion.cpp | Z_image_turbo (generation), Flux.2 Klein 4B (editing) |
Ollama /api/chat |
OpenAI-compatible /v1/chat/completions |
Standard API format |
Ollama /api/embed |
OpenAI-compatible /v1/embeddings |
Standard API format |
| Component | Purpose | Technology | Default Port |
|---|---|---|---|
| Flask Web | Web interface, routing, API | Python | 5000 |
| llama.cpp | LLM inference (chat, reasoning, multimodal, embedding) | C++ + CUDA | 8033 |
| stable-diffusion.cpp | Image generation (Z_image_turbo) and editing (Flux.2 Klein 4B) | C++ + CUDA | 7860 |
| Whisper ASR | Speech-to-text transcription | faster_whisper | 9000 |
| Piper TTS | Text-to-speech synthesis | ONNX + Piper | 18888 |
| Qdrant | Vector database for RAG | Rust | 6333 |
| Redis | Request queue management | C | 6379 |
| PostgreSQL | User accounts, sessions, messages | SQL | 5432 |
| Resource Manager | Adaptive GPU/CPU/RAM management, prevents OOM errors, coordinates GPU access | Python | |
| Circuit Breaker | Prevents cascading failures by blocking calls to failing services (llama.cpp, sd.cpp, Whisper) after repeated errors | Python |
All services run on one machine with GPU sharing:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FLAI Web (Flask) β
β Redis Queue β Model Router β Response β
ββββββββ¬βββββββββββ¬βββββββββββββ¬ββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
llama.cpp sd.cpp Whisper/Piper/Qdrant
:8033 :7860 (separate containers)
(router mode: dynamic model switching)
Router Mode: llama.cpp runs in --models-dir mode, dynamically loading/unloading GGUF models from a shared directory. Only one model occupies VRAM at a time, with automatic switching on demand.
| Component | Minimum | Recommended | Optimal |
|---|---|---|---|
| RAM | 16 GB | 32 GB | 32+ GB |
| CPU | 4 cores | 4+ cores | 8+ cores |
| GPU | NVIDIA 8-12 GB VRAM | NVIDIA 16 GB VRAM | NVIDIA 24+ GB VRAM |
| Storage | 40 GB | 60+ GB SSD | 100+ GB SSD NVMe |
- Linux server with NVIDIA GPU (CUDA support required)
- NVIDIA drivers installed on host
- NVIDIA Container Toolkit installed
- Docker Engine β₯ 20.10
- Docker Compose β₯ 2.0
- Internet connection (only for initial model downloads)
π‘ Note: After downloading GGUF models, FLAI works completely offline.
FLAI can operate on CPU-only servers using automatic detection in the deployment script. When no NVIDIA GPU is found, the script will use CPU-optimized images for llama.cpp and stable-diffusion.cpp. Performance will be significantly slower, but all features remain functional.
- Chat and reasoning: works, but may be 3-10x slower.
- Image generation: works, but generation time can be 10-30 minutes per image.
- Voice processing (Whisper, Piper) and document search (RAG) are unaffected.
To force CPU mode even if a GPU is present, you can manually run:
docker compose -f docker-compose.cpu.yml --profile with-image-gen --profile with-voice --profile with-rag up -dπ‘ Note: You must have the NVIDIA drivers and NVIDIA Container Toolkit installed.
A single deployment script handles everything: environment setup, model downloads, building, and launching.
git clone https://github.com/barval/flai.git
cd flai
# Core chat + llama.cpp only
./deploy.sh --download-models
# + Image generation/editing
./deploy.sh --download-models --with-image-gen
# + Voice (Whisper ASR + Piper TTS)
./deploy.sh --download-models --with-image-gen --with-voice
# Everything including RAG (Qdrant)
./deploy.sh --download-models --with-image-gen --with-voice --with-rag
# Run tests after deployment
./deploy.sh --download-models --with-image-gen --run-testsIf you prefer step-by-step control:
# Clone the repository
git clone https://github.com/barval/flai.git
cd flai
# Create directories and specify the owner
sudo mkdir -p data \
data/uploads \
data/documents
sudo chown -R 1000:1000 data
# Copy environment template
cp .env.example .env
# Generate a secure secret key
sed -i "s|^SECRET_KEY=.*|SECRET_KEY=$(python3 -c "import secrets; print(secrets.token_hex(32))")|" .env
# Generate an API key for Qdrant
sed -i "s|^QDRANT_API_KEY=.*|QDRANT_API_KEY=$(python3 -c "import secrets; print(secrets.token_hex(32))")|" .env
# Edit .env with your settings (timezone, API URLs, etc.)
nano .envmkdir -p services/llamacpp/models
# Chat model (fast responses)
wget -O services/llamacpp/models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf \
"https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507-GGUF/resolve/main/Qwen3-4B-Instruct-2507-Q4_K_M.gguf"
# Reasoning model (complex tasks)
wget -O services/llamacpp/models/gpt-oss-20b-Q4_K_M.gguf \
"https://huggingface.co/unsloth/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b-Q4_K_M.gguf"
# Multimodal model (image analysis) β must be in subdirectory with mmproj
mkdir -p services/llamacpp/models/Qwen3VL-8B-Instruct-Q4_K_M
wget -O services/llamacpp/models/Qwen3VL-8B-Instruct-Q4_K_M/Qwen3VL-8B-Instruct-Q4_K_M.gguf \
"https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct-GGUF/resolve/main/Qwen3VL-8B-Instruct-Q4_K_M.gguf"
wget -O services/llamacpp/models/Qwen3VL-8B-Instruct-Q4_K_M/mmproj-F16.gguf \
"https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct-GGUF/resolve/main/mmproj-Qwen3VL-8B-Instruct-F16.gguf"
# Embedding model (RAG)
wget -O services/llamacpp/models/bge-m3-Q8_0.gguf \
"https://huggingface.co/gpustack/bge-m3-GGUF/resolve/main/bge-m3-Q8_0.gguf"mkdir -p services/sd_cpp/models/{diffusion_models,vae,text_encoders}
# Diffusion model
wget -O services/sd_cpp/models/diffusion_models/z_image_turbo-Q8_0.gguf \
"https://huggingface.co/bartowski/Z-Image-Turbo-GGUF/resolve/main/z_image_turbo-Q8_0.gguf"
# VAE
wget -O services/sd_cpp/models/vae/ae.safetensors \
"https://huggingface.co/bartowski/Z-Image-Turbo-GGUF/resolve/main/ae.safetensors"
# LLM text encoder (shared with chat)
wget -O services/llamacpp/models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf \
"https://huggingface.co/bartowski/Qwen3-4B-Instruct-2507-GGUF/resolve/main/Qwen3-4B-Instruct-2507-Q4_K_M.gguf"# Diffusion model for editing
wget -O services/sd_cpp/models/diffusion_models/flux-2-klein-4b-Q8_0.gguf \
"https://huggingface.co/bartowski/FLUX.2-Klein-dev-GGUF/resolve/main/flux-2-klein-4b-Q8_0.gguf"
# VAE for editing
wget -O services/sd_cpp/models/vae/flux2_ae.safetensors \
"https://huggingface.co/bartowski/FLUX.2-dev-GGUF/resolve/main/flux2_ae.safetensors"The text encoder
Qwen3-4B-Instruct-2507-Q4_K_M.ggufis shared between generation and editing. Download it once.
β οΈ Important: Multimodal models must be placed in a subdirectory named after the model, with themmproj-*.gguffile inside. The llama.cpp router automatically discovers and loads the projector.
# Chat and reasoning only (no image generation)
docker compose -f docker-compose.gpu.yml up -d
# With image generation
docker compose -f docker-compose.gpu.yml --profile with-image-gen up -d
# With voice features
docker compose -f docker-compose.gpu.yml --profile with-voice up -d
# Full stack: chat + images + voice + RAG
docker compose -f docker-compose.gpu.yml --profile with-image-gen --profile with-voice --profile with-rag up -dβ±οΈ First build takes time: stable-diffusion.cpp is compiled from source (~5-10 minutes). Subsequent builds use the cache.
docker exec flai-web flask admin-password YourSecurePassword123- Open
http://localhost:5000and log in asadmin - Go to Admin Panel β Models tab
- For each module (Chat, Reasoning, Multimodal, Embedding):
- Check the Local checkbox (URL auto-fills to
http://flai-llamacpp:8033) - Click π Refresh to load available models from llama.cpp router
- Select the GGUF model from the dropdown
- Adjust parameters if needed (Context Length, Temperature, Top P, Timeout)
- Click Save
- Check the Local checkbox (URL auto-fills to
- For Image Generation: Ensure
SD_CPP_URL=http://flai-sd:7860is set in.env
- π¬ Have conversations with AI
- π¨ Generate images (if stable-diffusion.cpp is configured)
- π€ Send voice messages and listen to responses (if Piper/Whisper is configured)
- π Upload documents for search (if RAG profile is enabled)
Required:
SECRET_KEY=your_secret_key_here # Flask session secret
TIMEZONE=Europe/Moscow # Your timezoneService URLs:
LLAMACPP_URL=http://flai-llamacpp:8033 # llama.cpp router (replaces Ollama)
SD_CPP_URL=http://flai-sd:7860 # stable-diffusion.cpp server
WHISPER_API_URL=http://flai-whisper:9000/asr
PIPER_URL=http://flai-piper:8888/tts
QDRANT_URL=http://flai-qdrant:6333
QDRANT_API_KEY=your_qdrant_api_key
CAMERA_API_URL=http://flai-room-snapshot-api:5000Image Generation Defaults:
SD_CPP_DEFAULT_WIDTH=1024
SD_CPP_DEFAULT_HEIGHT=1024
SD_CPP_DEFAULT_CFG_SCALE=1.0 # 1.0 for flow-matching models (Z_image_turbo)
SD_CPP_DEFAULT_STEPS=10 # 10 for Z_image_turbo
SD_CPP_TIMEOUT=300Service Retry Settings:
SERVICE_RETRY_ATTEMPTS=15
SERVICE_RETRY_DELAY=2Session Security:
# Set to true ONLY when deployed behind reverse proxy (nginx) with HTTPS enabled
HTTPS_ENABLED=false
PERMANENT_SESSION_LIFETIME=28800 # 8 hoursRedis Queue:
REDIS_RESULT_TTL=3600
QUEUE_MAX_WAIT_TIME=300Debug:
DEBUG_API_ENABLED=false # Set to 'true' only for development/testingGunicorn Settings (Dockerfile):
CMD ["gunicorn", \
"--bind", "0.0.0.0:5000", \
"--workers", "1", \
"--threads", "4", \
"--worker-class", "gthread", \
"--timeout", "120", \
"--keep-alive", "5", \
"wsgi:app"]Why 1 worker Γ 4 threads?
- Minimal RAM usage (+40MB vs 1/1)
- Handles 4 concurrent connections
- Optimal for I/O bound (waiting for AI responses)
- Saves 280MB vs 4 workers
# Start all services
docker compose -f docker-compose.gpu.yml --profile with-image-gen --profile with-voice --profile with-rag up -d
# Chat + voice only
docker compose -f docker-compose.gpu.yml --profile with-voice up -d
# Chat only (no images, no voice)
docker compose -f docker-compose.gpu.yml up -d
# Stop all services
docker compose -f docker-compose.gpu.yml down --remove-orphans
# View logs
docker compose -f docker-compose.gpu.yml logs -f webllama.cpp runs in router mode (--models-dir), dynamically loading models from a shared directory:
services/llamacpp/models/
βββ Qwen3-4B-Instruct-2507-Q4_K_M.gguf # Chat
βββ gpt-oss-20b-Q4_K_M.gguf # Reasoning
βββ bge-m3-Q8_0.gguf # Embedding
βββ Qwen3VL-8B-Instruct-Q4_K_M/ # Multimodal (subdirectory!)
βββ Qwen3VL-8B-Instruct-Q4_K_M.gguf
βββ mmproj-F16.gguf # Vision projector
β οΈ Multimodal models require a subdirectory with the projector file namedmmproj-*.ggufinside. The router auto-discovers and loads it.
- Log in as admin and go to
/adminβ Models tab - For each module (Chat, Reasoning, Multimodal, Embedding):
- Step 1: Check Local (URL auto-fills to
http://flai-llamacpp:8033) - Step 2: Click π Refresh to fetch models from the router
- Step 3: Select model, set parameters, click Save
- Step 1: Check Local (URL auto-fills to
π‘ Changing the embedding model triggers automatic re-indexing of all documents.
| Parameter | Chat | Reasoning | Multimodal | Embedding |
|---|---|---|---|---|
| Context Length | 8192 | 32768 | 8192 | 512 |
| Temperature | 0.1 | 0.7 | 0.7 | β |
| Top P | 0.1 | 0.9 | 0.9 | β |
| Timeout (s) | 60 | 300 | 120 | 30 |
The project uses Z_image_turbo as the only image generation model:
| Model | Steps | CFG Scale | Resolution | Notes |
|---|---|---|---|---|
| Z_image_turbo | 10 | 1.0 | 1024Γ1024 | Fast, flow-matching |
Configure via SD_MODEL_TYPE in .env:
SD_MODEL_TYPE=z_image_turboUpload an image and ask to edit it (e.g., "change the pupils to green", "remove the second sun"). The system uses:
- Multimodal model (Qwen3VL) to analyze the image and generate an edit prompt
- Flux.2 Klein 4B model via stable-diffusion.cpp to perform the edit
- The original image is preserved except for the requested changes
Editing uses separate model files and runs independently from generation β no conflict between the two.
The sd_cpp service is built from source during first docker compose up:
- Clones
https://github.com/leejet/stable-diffusion.cpp - Initializes git submodules (
ggml,thirdparty/*) - Compiles with CUDA 12.8.1 (
cmake -DSD_CUDA=ON) - Produces
sd-serverandsd-clibinaries
β±οΈ First build: ~5-10 minutes depending on CPU. Subsequent builds use Docker cache.
# sd-wrapper HTTP API (port 7861)
SD_WRAPPER_URL=http://flai-sd:7861
SD_CPP_TIMEOUT=900 # Timeout for gen/edit operations (seconds)Uses onerahmet/openai-whisper-asr-webservice (faster_whisper engine).
# Enable voice features
docker compose -f docker-compose.gpu.yml --profile with-voice up -dUses ONNX Piper models for text-to-speech.
# Download voice models
mkdir -p services/piper/models
# English (female)
curl -L -o services/piper/models/en_US-lessac-medium.onnx \
"https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx"
# Russian (male)
curl -L -o services/piper/models/ru_RU-ruslan-medium.onnx \
"https://huggingface.co/rhasspy/piper-voices/resolve/main/ru/ru_RU/ruslan/medium/ru_RU-ruslan-medium.onnx"
---
## π RAG (Document Search) Setup
### 1. Configure RAG in Admin Panel
After starting the services, log in as admin and go to **Admin Panel β Models** tab. Scroll down to the **Chunks** section. Here you can fine-tune RAG behavior:
- **Chunk Size (characters):** How documents are split into pieces for indexing.
- **Chunk Overlap (characters):** Number of overlapping characters between consecutive chunks.
- **Chunk Strategy:** `fixed` (by character count) or `recursive` (by headings/paragraphs).
- **Number of chunks (top_k):** Maximum number of chunks to retrieve from Qdrant per query.
- **Threshold (documents):** Minimum similarity score for general document queries.
- **Threshold (reasoning):** Minimum similarity score when RAG is triggered from a reasoning request.
Click **Save** to apply changes. If chunking parameters (size or strategy) are modified, a background reindex of all documents is triggered automatically.
> **Note:** Environment variables like `RAG_CHUNK_SIZE` in `.env` are only used as initial defaults before the first configuration save. The primary configuration is stored in the database.
### 2. Enable in Docker Compose
```bash
docker compose -f docker-compose.gpu.yml --profile with-rag up -d
- Log in to web interface
- Click Documents tab in sidebar
- Click β to upload PDF, DOC, DOCX, or TXT files
- Wait for indexing to complete (status: β Indexed)
The camera module connects to a separate room-snapshot-api service. See services/README.md and services/room-snapshot-api/README.md for deployment guides.
CAMERA_API_URL=http://flai-room-snapshot-api:5000
CAMERA_ENABLED=true
CAMERA_API_TIMEOUT=15
CAMERA_CHECK_INTERVAL=30In Admin Panel β Users tab, assign camera codes:
tam (tambour), hal (hallway), cor (corridor), spa (bedroom),
off (office), chi (children's), liv (living room), kit (kitchen), bal (balcony)
| Feature | Description |
|---|---|
| π€ User Operations | Create, edit, delete user accounts |
| π Password Management | Reset passwords for any user |
| π Camera Permissions | Grant/revoke camera access per user |
| π€ Model Management | Configure GGUF models per module type |
| π System Stats | Monitor database and storage sizes |
| ποΈ Service Classes | Set queue priority (0=highest, 2=lowest) |
# Set admin password
docker exec flai-web flask admin-password NewPassword123
# View help
docker exec flai-web flask --helpFLAI includes a built-in backup system accessible from the Admin Panel β Backups tab.
Backup Types:
- Users only: Backs up the
userstable only (user accounts, permissions, settings). - Full: Backs up all data: users, chat sessions, messages, documents, uploaded files, and model configurations.
Operations:
- Create: Select the backup type and click Β«Create backupΒ». The archive is saved to
data/db_backups/. - Restore: Click Β«RestoreΒ» on a backup file to replace the current database and files with the backup content. Warning: This overwrites existing data.
- Download: Download the backup archive to your local machine.
- Delete: Remove old backup files.
Backup files are stored as .tar.gz archives containing SQL dumps and file directories. Restoration requires confirmation and is logged for audit purposes.
curl http://localhost:5000/healthResponse:
{
"status": "ok",
"timestamp": "2026-04-08T23:00:00.000000+00:00",
"services": {
"web": "ok",
"database": "ok",
"redis": "ok",
"llamacpp": "ok"
}
}curl http://localhost:5000/metrics# Run all tests
pytest
# Run with coverage report
pytest --cov=app --cov=modules --cov-report=html
# Run specific test category
pytest tests/test_admin_routes.py
pytest tests/test_image_module.py
pytest tests/test_sd_cpp_module.py# Web interface
locust -f tests/load/locustfile.py --host http://localhost:5000
# Headless mode
locust -f tests/load/locustfile.py --headless -u 10 -r 2 --run-time 1m- llama.cpp router mode (
--models-dir) replaces Ollama β single server with dynamic model switching - stable-diffusion.cpp replaces Automatic1111 β Z-Image-Turbo for generation, Flux.2 Klein 4B for editing
- OpenAI-compatible API (
/v1/chat/completions,/v1/embeddings) - Multimodal support via mmproj in subdirectories
- Dynamic model switching with
--models-max 1 - Individual model parameters via
models-preset.ini - GGUF model management via admin panel
- All translations updated for llama.cpp terminology
- Piper TTS optimization for large text synthesis β chunked processing with seamless audio transitions
- Long-term dialog memory (cross-session context)
- Advanced RAG: metadata filtering, hybrid search
- Mobile-responsive UI optimizations
- Plugin architecture for custom modules
- Multi-GPU support
- Advanced queue prioritization
- User activity analytics
| Model | Purpose | License | Approx. Size |
|---|---|---|---|
| Qwen3-4B-Instruct-2507-Q4_K_M | Chat (fast responses) | Qwen License | ~2.5 GB |
| gpt-oss-20b-Q4_K_M | Reasoning (complex tasks) | OpenAI License | ~12 GB |
| Qwen3VL-8B-Instruct-Q4_K_M | Multimodal (image analysis) | Qwen License | ~5 GB + mmproj ~1.1 GB |
| bge-m3-Q8_0 | Embedding (RAG) | MIT License | ~0.6 GB |
| Model | Purpose | License | Approx. Size |
|---|---|---|---|
| Z-Image-Turbo (z_image_turbo-Q8_0) | Image generation | Model-specific | ~6.2 GB |
| ae.safetensors (VAE) | Variational autoencoder for Z-Image | Model-specific | ~0.3 GB |
| Qwen3-4B-Instruct-2507-Q4_K_M | Text encoder for Z-Image | Qwen License | ~2.5 GB (shared with chat) |
| Model | Purpose | License | Approx. Size |
|---|---|---|---|
| Flux.2 Klein 4B (flux-2-klein-4b-Q8_0) | Image editing (change colors, remove objects, stylize) | Flux License | ~4.5 GB |
| flux2_ae.safetensors | VAE for Flux.2 editing | Flux License | ~0.3 GB |
| Model | Purpose | License | Approx. Size |
|---|---|---|---|
| en_US-lessac-medium | English TTS (female) | BSD-3-Clause (Piper) | ~75 MB |
| ru_RU-ruslan-medium | Russian TTS (male) | BSD-3-Clause (Piper) | ~75 MB |
| Whisper medium | Speech recognition | MIT (OpenAI) | ~1.5 GB |
| Configuration | Approx. Download |
|---|---|
| Chat only (Qwen3-4B) | ~2.5 GB |
| Chat + Reasoning | ~14.5 GB |
| Chat + Multimodal | ~8 GB |
| Full LLM stack | ~22 GB |
| + Image generation | ~28 GB |
| + Image editing | ~31 GB |
| + Voice (TTS + Whisper) | ~35 GB |
Note: After downloading models, FLAI works completely offline. No external scripts or modules are loaded at runtime.
FLAI includes comprehensive testing for all key components and load testing for the web interface.
# Run all tests
pytest
# Run with coverage report
pytest --cov=app --cov=modules --cov-report=html
# Run specific test category
pytest tests/test_admin_routes.py
pytest tests/test_image_module.py
pytest tests/test_sd_cpp_module.py
pytest tests/test_queue.py
pytest tests/test_security.py
pytest tests/test_integration.pyLoad tests use Locust to simulate concurrent users.
# Install Locust (if not already installed)
pip install locust
# Web interface β open http://localhost:8089
locust -f tests/load/locustfile.py --host http://localhost:5000
# Headless mode β 10 users, spawn 2/sec, run 1 minute
locust -f tests/load/locustfile.py --headless -u 10 -r 2 --run-time 1m
# Using the convenience script
./tests/load/run_load_test.sh --host http://localhost:5000 --users 10 --spawn-rate 2 --run-time 1mSee tests/load/README.md for detailed load testing instructions.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License. See LICENSE for details.