Model request: MiniCPM-V 4.6 (compact vision-language model) for macOS and iOS


### Source link

https://huggingface.co/openbmb/MiniCPM-V-4.6

Related official sources:
- https://github.com/OpenBMB/MiniCPM-V
- https://huggingface.co/openbmb/MiniCPM-V-4.6-gguf
- https://huggingface.co/openbmb/MiniCPM5-1B
- https://huggingface.co/openbmb/MiniCPM5-1B-MLX

### Target platform

Both

### Use case

I'd like to request Core AI Models support for MiniCPM-V 4.6, with iOS support as the highest-value target and macOS support for development, testing, and desktop apps.

MiniCPM-V 4.6 is a compact vision-language model that is well aligned with on-device use cases: private image understanding, document/OCR assistance, screenshot-to-action workflows, accessibility helpers, local camera agents, and personal knowledge capture where user images should stay on device.

For Apple platforms, the appeal is that developers could build multimodal app features around local images and screenshots without routing visual data to a server. A first-party Core AI export recipe would also make it easier to compare and deploy compact VLMs through the same runtime path already used by the LLM catalog.

### Preferred precision / compression

For iOS: mixed 4/8-bit palettized if feasible, with a fixed context preset suitable for image + short instruction/chat use.

For macOS: fp16/bf16 baseline plus 4-bit weight compression would both be useful.

No strong preference beyond whatever path works best with Core AI's VLM/runtime constraints.

### Additional context

MiniCPM-V 4.6 is designed for efficient multimodal inference:

- Model size is about 1.3B parameters, using a SigLIP2-400M vision encoder and a Qwen3.5-0.8B language model according to the official README.
- It supports single-image, multi-image, and video understanding.
- It exposes 4x / 16x visual token compression modes, which is especially relevant for on-device latency and memory tradeoffs.
- Existing community-facing deployment paths include Transformers, vLLM, SGLang, llama.cpp, Ollama, and GGUF packages with the paired vision projection component.

If the multimodal export path is not ready yet, a useful smaller first step would be a text-only MiniCPM5-1B preset:

- Source: https://huggingface.co/openbmb/MiniCPM5-1B
- Existing Apple Silicon package: https://huggingface.co/openbmb/MiniCPM5-1B-MLX
- Existing local inference package: https://huggingface.co/openbmb/MiniCPM5-1B-GGUF
- It is a compact 1B text model and should be closer to the current LLM export flow than the VLM.

Thanks for considering MiniCPM support. It would be great to have a small open multimodal model represented in the Core AI Models catalog alongside the current LLM, vision, audio, and diffusion examples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model request: MiniCPM-V 4.6 (compact vision-language model) for macOS and iOS #59

Source link

Target platform

Use case

Preferred precision / compression

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Model request: MiniCPM-V 4.6 (compact vision-language model) for macOS and iOS #59

Description

Source link

Target platform

Use case

Preferred precision / compression

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions