Skip to content

Model request: MiniCPM-V 4.6 (compact vision-language model) for macOS and iOS #59

Description

@ZMXJJ

Source link

https://huggingface.co/openbmb/MiniCPM-V-4.6

Related official sources:

Target platform

Both

Use case

I'd like to request Core AI Models support for MiniCPM-V 4.6, with iOS support as the highest-value target and macOS support for development, testing, and desktop apps.

MiniCPM-V 4.6 is a compact vision-language model that is well aligned with on-device use cases: private image understanding, document/OCR assistance, screenshot-to-action workflows, accessibility helpers, local camera agents, and personal knowledge capture where user images should stay on device.

For Apple platforms, the appeal is that developers could build multimodal app features around local images and screenshots without routing visual data to a server. A first-party Core AI export recipe would also make it easier to compare and deploy compact VLMs through the same runtime path already used by the LLM catalog.

Preferred precision / compression

For iOS: mixed 4/8-bit palettized if feasible, with a fixed context preset suitable for image + short instruction/chat use.

For macOS: fp16/bf16 baseline plus 4-bit weight compression would both be useful.

No strong preference beyond whatever path works best with Core AI's VLM/runtime constraints.

Additional context

MiniCPM-V 4.6 is designed for efficient multimodal inference:

  • Model size is about 1.3B parameters, using a SigLIP2-400M vision encoder and a Qwen3.5-0.8B language model according to the official README.
  • It supports single-image, multi-image, and video understanding.
  • It exposes 4x / 16x visual token compression modes, which is especially relevant for on-device latency and memory tradeoffs.
  • Existing community-facing deployment paths include Transformers, vLLM, SGLang, llama.cpp, Ollama, and GGUF packages with the paired vision projection component.

If the multimodal export path is not ready yet, a useful smaller first step would be a text-only MiniCPM5-1B preset:

Thanks for considering MiniCPM support. It would be great to have a small open multimodal model represented in the Core AI Models catalog alongside the current LLM, vision, audio, and diffusion examples.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions