Convenience wrappers for Apple's official CoreML vision models, packaged for engineering-drawing workflows (OCCTDesignLoop and friends).
This repo does not train or convert any model. It re-packages four Apple-published CoreML artefacts behind high-level Swift APIs and Python stdin/stdout bridges, adds engineering-drawing-focused benchmarks, and documents the mixed license landscape up front.
| Model | Swift wrapper | Python bridge | Input size | License |
|---|---|---|---|---|
| SAM 2 tiny | SAMSegmenter |
sam_bridge.py |
1024x1024 | Apache-2.0 |
| MobileCLIP S2 | DrawingEmbedder |
mobileclip_bridge.py |
256x256 | Apple Sample Code License |
| FastVLM 1.5B (hybrid CoreML + MLX) | EngineeringVLM |
fastvlm_bridge.py |
1024x1024 | Apple Sample Code License |
| DETR ResNet-50 (semantic seg.) | LayoutDetector |
layout_bridge.py |
448x448 | Apache-2.0 |
See NOTICE for the full per-model license matrix and the pinned upstream commits.
Package.swift:
dependencies: [
.package(url: "https://github.com/gsdali/coreml-occt-models", from: "0.1.0"),
],
targets: [
.target(name: "YourApp", dependencies: [
.product(name: "CoremlOcctModels",
package: "coreml-occt-models"),
]),
]Download the artefacts once:
brew install git-lfs && git lfs install
./scripts/download_artefacts.sh
# FastVLM is not pre-built on HuggingFace; run this as well if you need it:
./scripts/export_fastvlm_vision.shThen use any model in <10 lines:
import CoremlOcctModels
// Segment around the centre of a drawing.
let seg = try SAMSegmenter()
let out = try seg.segment(
imageURL: URL(fileURLWithPath: "drawing.png"),
points: [.init(x: 1024, y: 768, label: .foreground)]
)
// CLIP embed for retrieval.
let emb = try DrawingEmbedder()
let vec = try emb.embed(imageURL: URL(fileURLWithPath: "drawing.png"))
// Semantic segmentation for coarse layout hints.
let det = try LayoutDetector()
let layout = try det.detect(imageURL: URL(fileURLWithPath: "drawing.png"))pip install -e .
# or just: pip install coremltools pillow numpy open_clip_torchOne-shot subprocess:
echo '{"image_path": "test_data/sample_drawing.png",
"points": [{"x": 800, "y": 600, "label": 1}]}' \
| python Python/sam_bridge.py
# -> {"ok": true, "mask_shape": [256, 256], "mask_rle": "...", "score": 0.92}Or via the unified CLI (after pip install -e .):
coreml-occt segment drawing.png --x 800 --y 600
coreml-occt embed drawing.png --text "mechanical drawing of a gear"
coreml-occt caption drawing.png
coreml-occt layout drawing.png
coreml-occt versions # show pinned upstream commitsPackage.swift SwiftPM manifest (4 wrappers + 4 benchmarks)
Sources/
CoremlOcctModels/ umbrella module that re-exports everything
SAMSegmenter/ 3-stage SAM 2 tiny pipeline
DrawingEmbedder/ MobileCLIP S2 image/text embeddings
EngineeringVLM/ FastVLM vision encoder (decoder via Python)
LayoutDetector/ DETR ResNet-50 semantic segmentation
Python/
sam_bridge.py stdin/stdout JSON, one-shot SAM2 call
mobileclip_bridge.py stdin/stdout JSON, image + text embedding
fastvlm_bridge.py stdin/stdout JSON, hybrid CoreML+MLX VLM
layout_bridge.py stdin/stdout JSON, DETR segmentation
cli.py unified CLI (coreml-occt …)
benchmarks/
BenchSAM/ bench_sam.swift (p50/p95 latency)
BenchMobileCLIP/ ...
BenchFastVLM/ vision encoder only; see Python bridge for e2e
BenchLayout/ ...
results.json schema; run scripts/run_benchmarks.sh to populate
scripts/
download_artefacts.sh mirror the 3 pre-built .mlpackages
export_fastvlm_vision.sh build the 4th (FastVLM vision encoder)
run_benchmarks.sh build + run all four benchmarks
artefacts/ populated by scripts (empty in repo)
test_data/sample_drawing.png engineering-drawing reference fixture
FastVLM is the only one of the four that is hybrid:
- Vision encoder (
fastvithd) — CoreML, runs on the Apple Neural Engine. Produces a sequence of vision tokens. - Language decoder — runs via MLX on the GPU. Consumes the vision tokens + a text prompt and emits a generation.
This wrapper ships the Swift side (EngineeringVLM) for the CoreML
half only. For end-to-end text generation, use
Python/fastvlm_bridge.py, which coordinates both halves — set
FASTVLM_REPO to a local clone of apple/ml-fastvlm
so the bridge can reuse their MLX harness and prompt templates.
Apple does not publish a pre-built fastvithd.mlpackage on
HuggingFace, only the underlying safetensors checkpoint. Run
scripts/export_fastvlm_vision.sh once to produce it.
Wrapper code (everything under Sources/, Python/, benchmarks/,
scripts/) is MIT.
Upstream models are mixed — two Apache-2.0 and two under the Apple Sample Code License (ASCL):
| License | Models |
|---|---|
| Apache-2.0 | SAM 2 tiny, DETR ResNet-50 semantic seg |
| Apple Sample Code License | MobileCLIP S2, FastVLM 1.5B |
ASCL is more restrictive than MIT/Apache. If you plan to ship MobileCLIP or FastVLM outputs in a product, read the full ASCL text distributed alongside the upstream artefacts and consult your legal team. See NOTICE for the detailed matrix and upstream source URLs.
This repo pins each upstream model to a specific commit:
| Repo | Pinned commit |
|---|---|
apple/coreml-sam2-tiny |
6d04587b4937500c26afbdeeb9777a336efaeef6 |
apple/coreml-mobileclip |
3e0a7bfb9fe83da8a3efaa3fd8f7df24214bb947 |
apple/FastVLM-1.5B |
dd6608dfa0e17b050e1dde2856c3437fcba197ac |
apple/coreml-detr-semantic-segmentation |
7c771f8867a479d1441ac5fb0a8de31feea76bb6 |
When Apple pushes updates, bump the SHAs in
scripts/download_artefacts.sh, scripts/export_fastvlm_vision.sh,
Sources/CoremlOcctModels/CoremlOcctModels.swift, Python/cli.py,
and NOTICE, re-run the download scripts, and re-run the
benchmarks.
The four models here are general-purpose Apple vision models. Drawing-specific converted models live in sibling repos and are the first-choice tools if you need engineering-drawing accuracy rather than broad-coverage pretraining:
gsdali/coreml-hawp— HAWP-v2 line detectiongsdali/coreml-edocr2— eDOCr2 dimension/OCRgsdali/coreml-detr-doclayout— DETR fine-tuned on drawing layoutgsdali/coreml-rt-detr— RT-DETR for symbol detectiongsdali/coreml-paddleocr— PP-OCRv5 for general OCRgsdali/occt-design-loop-models— our trained GNN + RF models