This repo is no longer maintained.
Face, audio, and LLM related APIs have been moved to dedicated repos:
- face_power - Face detection, parsing, restore, swap, etc.
- audio_power - ASR, TTS, voice conversion, audio translation
- llm_power - LLM fine-tuning, RAG, LangChain, ChatGLM
Convenience API collection for computer vision, motion capture, segmentation, AIGC, and Stable Diffusion inference.
Pretrained models: BaiDuPan pwd: ibgg
| Project | Description |
|---|---|
| cv2box | CV utility functions used across all projects |
| apstone | Base inference engine for all model wrappers |
| Model | Source | Function |
|---|---|---|
| YOLOX-tiny/s | MMDetection | Body bbox |
| HRNetV2-w32 | ModelScope | Body keypoints |
| BlazePose | PINTO_model_zoo | Body keypoints |
| Lightweight-Pose | lightweight-human-pose-estimation | Body keypoints |
| MoveNet | TFHub | Body keypoints |
| KAPAO | kapao | Body keypoints |
| Model | Source | Function |
|---|---|---|
| YOLOX-tiny | MMDetection | Hand 21 keypoints |
| hand_detector.d2 | hand_detector.d2 | Hand bbox |
| MediaPipe Hands | MediaPipe | Hand landmarks |
| YOLOX* | MMDetection | Hand detection |
| Minimal-Hand | minimal-hand | Hand mesh |
| FrankMoCap | frankmocap | Hand pose regressor |
| Module | Description |
|---|---|
| SPIN | Body shape regress |
| MMPose | Whole body keypoints (r50, hrnet_w48_384_dark, etc.) |
| MediaPipe Holistic | Whole body holistic |
| Calibration | Camera calibration |
| Smooth Filter | Temporal smoothing |
| Triangulate | Multi-view triangulation |
| Module | Description |
|---|---|
| CarveKit | Cloth segmentation |
| CIHP-PGN | Human parsing |
| U2Net | Object segmentation / saliency |
| PPMattingV2 | Portrait matting |
| Green Screen Matting | Chroma-key video matting (BackgroundMattingV2) |
| SegFormer B2 | Cloth segmentation |
| MODNet | Portrait matting |
| RAFT | Optical flow |
| Model | Function |
|---|---|
| DCTNet | Style transfer |
| LaMa | Image inpainting |
| TPSMM | Talking head synthesis |
| SadTalker | Talking head synthesis |
| Wav2Lip | Lip sync |
| DINet | Talking head synthesis |
| Module | Description |
|---|---|
| ControlNet | ControlNet inference |
| IP-Adapter | IP-Adapter for image-prompted generation |
| Prompt2Prompt | Prompt-based image editing |
| CLIP Encoder | Text encoding |
| DDIM Inversion | Image inversion |
| Tagger | Image tagging/WD14 |
| Model | Description |
|---|---|
| PaddleOCR | General-purpose OCR |
COCO format conversion, dataset preprocessing, visualization utilities.
Affine transforms, Gaussian filters, K-means, timing tools, path helpers.