English | 简体中文
Datasets Deputy is a desktop workspace for organizing, annotating, checking, and exporting image training datasets. It is built with Tauri 2, React, TypeScript, Rust, and SQLite, and aims to bring image preview, folder management, multi-version annotations, batch editing, AI-assisted annotation, and pre-training checks into one local tool.
The project is still under active development. Before it leaves Beta, features and data structures prioritize clean source code, performance, and real workflows over backward compatibility.
- Dataset browsing: import or mount image folders, then work through a project tree, grid view, table view, and single-image preview.
- Multiple data modes: asset databases, dynamic linked databases, and workspace folders cover archival, indexed, and direct folder-editing workflows.
- Annotation and instruction editing: edit per-image annotations, table drafts, unsaved states, exit guards, and multiple annotation types for different models or targets.
- Batch text operations: add fields, find and replace text, normalize annotations, convert between Booru Tag / Anima / natural language formats, and rewrite natural-language annotations.
- AI-assisted annotation: supports Gemini, OpenAI, Anthropic, Grok, local LM Studio, Ollama, Textgen, and WD14-style taggers.
- Local model support: configure Python runtimes, managed virtual environments, WD14 models, CLIP similarity models, and PyTorch / ONNX Runtime dependencies.
- Pre-training tools: image format validation, training cache cleanup, and duplicate / similar image detection.
- Import and export: dataset import/export, SQLite database import/export, and database zip packages with images.
- History: undo and redo for common text edits, batch actions, annotation type operations, and file organization operations.
Asset databases copy source images into the app-managed asset library and store image indexes, annotation types, annotation text, and instructions in SQLite. This mode fits long-term archives, cross-device migration, and datasets that should not depend on the original source paths.
Dynamic linked databases store image indexes, annotation types, annotation text, and instructions in SQLite while reading images from their original paths. This mode fits datasets whose images still change frequently while annotations need to be managed centrally by Datasets Deputy.
Workspace folders mount a local folder directly and stay close to native file-manager behavior. Annotations are written as same-name .txt files beside each image, and per-image instructions are written as same-name .inst.txt files. Removing a mounted path does not delete local files, while real delete or rename operations inside a workspace folder also update the related image, annotation, and instruction files.
Currently supported image extensions are jpg, jpeg, png, webp, bmp, and gif.
Remote and local LLM backends:
- Gemini API
- OpenAI API
- Anthropic API
- Grok API
- LM Studio
- Ollama
- Textgen
Local image tagging:
- WD14 Tagger, with ONNX or PyTorch / Hugging Face-style model folders.
- Configurable general tag threshold, character tag threshold, character/copyright tag inclusion, and underscore replacement.
Some features require configuring a Python runtime and model paths first. Similar image detection depends on a CLIP image embedding model. WD14 batch annotation depends on Python, Pillow, NumPy, and either PyTorch or ONNX Runtime.
Dataset export copies images and writes one .txt annotation file per image. Workspace folder mode exports from existing TXT annotations in the folder; database modes export the selected annotation type.
Database export supports two modes:
- Database only: exports a single
.sqlitefile and keeps referencing the original image paths recorded in the database. - With images: exports a
.zippackage containingdatabase.sqliteand image copies underimages/.
Runtime folders are derived from the executable location:
DatasetsDeputy/
|-- DatasetsDeputy.exe
|-- model/ # Local models
|-- config/ # API, proxy, Python, model, thumbnail, and other settings
|-- datasets/ # App-managed dataset assets
|-- runtime/ # SQLite databases, managed Python venv, and runtime resources
|-- app/ # Packaged app resources
|-- log/ # Logs
`-- temp/ # Thumbnails, similarity cache, and temporary files
For daily development, prefer the root script:
.\dev.ps1Common options:
.\dev.ps1 -Install: install frontend dependencies before starting..\dev.ps1 -WebOnly: start only the Vite frontend dev server..\dev.ps1 -ResetCache: reset the script cache before starting.
The underlying npm / Tauri commands are still available:
npm install
npm run dev
npm run tauri:dev
npm run build
npm run tauri:buildDesktop development requires Rust, Cargo, and the system dependencies required by Tauri. The Vite dev server is started through the Tauri config by default; see src-tauri/tauri.conf.json for the port.
For release builds, prefer the root script:
.\publish.ps1Common options:
.\publish.ps1 -Clean: clean the output directory before publishing..\publish.ps1 -Install: install frontend dependencies before publishing..\publish.ps1 -Bundle: create a zip package after generating the release directory..\publish.ps1 -OutputDir <release-folder>: choose the release directory. The default isrelease/DatasetsDeputy..\publish.ps1 -ZipPath <zip-path>: choose the zip output path. The default isrelease/DatasetsDeputy.zip.
The script builds the desktop app and prepares the release directory with model, config, datasets, runtime, app, log, and temp folders. The lower-level release layout script can still be called directly with npm run prepare:release -- <release-folder>.
- Desktop framework: Tauri 2
- Frontend: React 19, TypeScript, Vite, Tailwind CSS
- State management: Zustand
- Tables and virtualization: TanStack Table, TanStack Virtual
- Animation and icons: Framer Motion, Lucide React
- Backend: Rust, Tokio, Rayon, rusqlite, notify, image
- Local inference support: Python, PyTorch, ONNX Runtime, Transformers, Pillow, NumPy