Source link
https://huggingface.co/openbmb/MiniCPM-V-4.6
Related official sources:
Target platform
Both
Use case
I'd like to request Core AI Models support for MiniCPM-V 4.6, with iOS support as the highest-value target and macOS support for development, testing, and desktop apps.
MiniCPM-V 4.6 is a compact vision-language model that is well aligned with on-device use cases: private image understanding, document/OCR assistance, screenshot-to-action workflows, accessibility helpers, local camera agents, and personal knowledge capture where user images should stay on device.
For Apple platforms, the appeal is that developers could build multimodal app features around local images and screenshots without routing visual data to a server. A first-party Core AI export recipe would also make it easier to compare and deploy compact VLMs through the same runtime path already used by the LLM catalog.
Preferred precision / compression
For iOS: mixed 4/8-bit palettized if feasible, with a fixed context preset suitable for image + short instruction/chat use.
For macOS: fp16/bf16 baseline plus 4-bit weight compression would both be useful.
No strong preference beyond whatever path works best with Core AI's VLM/runtime constraints.
Additional context
MiniCPM-V 4.6 is designed for efficient multimodal inference:
- Model size is about 1.3B parameters, using a SigLIP2-400M vision encoder and a Qwen3.5-0.8B language model according to the official README.
- It supports single-image, multi-image, and video understanding.
- It exposes 4x / 16x visual token compression modes, which is especially relevant for on-device latency and memory tradeoffs.
- Existing community-facing deployment paths include Transformers, vLLM, SGLang, llama.cpp, Ollama, and GGUF packages with the paired vision projection component.
If the multimodal export path is not ready yet, a useful smaller first step would be a text-only MiniCPM5-1B preset:
Thanks for considering MiniCPM support. It would be great to have a small open multimodal model represented in the Core AI Models catalog alongside the current LLM, vision, audio, and diffusion examples.
Source link
https://huggingface.co/openbmb/MiniCPM-V-4.6
Related official sources:
Target platform
Both
Use case
I'd like to request Core AI Models support for MiniCPM-V 4.6, with iOS support as the highest-value target and macOS support for development, testing, and desktop apps.
MiniCPM-V 4.6 is a compact vision-language model that is well aligned with on-device use cases: private image understanding, document/OCR assistance, screenshot-to-action workflows, accessibility helpers, local camera agents, and personal knowledge capture where user images should stay on device.
For Apple platforms, the appeal is that developers could build multimodal app features around local images and screenshots without routing visual data to a server. A first-party Core AI export recipe would also make it easier to compare and deploy compact VLMs through the same runtime path already used by the LLM catalog.
Preferred precision / compression
For iOS: mixed 4/8-bit palettized if feasible, with a fixed context preset suitable for image + short instruction/chat use.
For macOS: fp16/bf16 baseline plus 4-bit weight compression would both be useful.
No strong preference beyond whatever path works best with Core AI's VLM/runtime constraints.
Additional context
MiniCPM-V 4.6 is designed for efficient multimodal inference:
If the multimodal export path is not ready yet, a useful smaller first step would be a text-only MiniCPM5-1B preset:
Thanks for considering MiniCPM support. It would be great to have a small open multimodal model represented in the Core AI Models catalog alongside the current LLM, vision, audio, and diffusion examples.