coreml-occt-models

Convenience wrappers for Apple's official CoreML vision models, packaged for engineering-drawing workflows (OCCTDesignLoop and friends).

This repo does not train or convert any model. It re-packages four Apple-published CoreML artefacts behind high-level Swift APIs and Python stdin/stdout bridges, adds engineering-drawing-focused benchmarks, and documents the mixed license landscape up front.

Model	Swift wrapper	Python bridge	Input size	License
SAM 2 tiny	`SAMSegmenter`	`sam_bridge.py`	1024x1024	Apache-2.0
MobileCLIP S2	`DrawingEmbedder`	`mobileclip_bridge.py`	256x256	Apple Sample Code License
FastVLM 1.5B (hybrid CoreML + MLX)	`EngineeringVLM`	`fastvlm_bridge.py`	1024x1024	Apple Sample Code License
DETR ResNet-50 (semantic seg.)	`LayoutDetector`	`layout_bridge.py`	448x448	Apache-2.0

See NOTICE for the full per-model license matrix and the pinned upstream commits.

Quickstart (Swift)

Package.swift:

dependencies: [
    .package(url: "https://github.com/gsdali/coreml-occt-models", from: "0.1.0"),
],
targets: [
    .target(name: "YourApp", dependencies: [
        .product(name: "CoremlOcctModels",
                 package: "coreml-occt-models"),
    ]),
]

Download the artefacts once:

brew install git-lfs && git lfs install
./scripts/download_artefacts.sh
# FastVLM is not pre-built on HuggingFace; run this as well if you need it:
./scripts/export_fastvlm_vision.sh

Then use any model in <10 lines:

import CoremlOcctModels

// Segment around the centre of a drawing.
let seg = try SAMSegmenter()
let out = try seg.segment(
    imageURL: URL(fileURLWithPath: "drawing.png"),
    points: [.init(x: 1024, y: 768, label: .foreground)]
)

// CLIP embed for retrieval.
let emb = try DrawingEmbedder()
let vec = try emb.embed(imageURL: URL(fileURLWithPath: "drawing.png"))

// Semantic segmentation for coarse layout hints.
let det = try LayoutDetector()
let layout = try det.detect(imageURL: URL(fileURLWithPath: "drawing.png"))

Quickstart (Python)

pip install -e .
# or just: pip install coremltools pillow numpy open_clip_torch

One-shot subprocess:

echo '{"image_path": "test_data/sample_drawing.png",
       "points": [{"x": 800, "y": 600, "label": 1}]}' \
  | python Python/sam_bridge.py
# -> {"ok": true, "mask_shape": [256, 256], "mask_rle": "...", "score": 0.92}

Or via the unified CLI (after pip install -e .):

coreml-occt segment drawing.png --x 800 --y 600
coreml-occt embed drawing.png --text "mechanical drawing of a gear"
coreml-occt caption drawing.png
coreml-occt layout drawing.png
coreml-occt versions          # show pinned upstream commits

Repository layout

Package.swift              SwiftPM manifest (4 wrappers + 4 benchmarks)
Sources/
  CoremlOcctModels/        umbrella module that re-exports everything
  SAMSegmenter/            3-stage SAM 2 tiny pipeline
  DrawingEmbedder/         MobileCLIP S2 image/text embeddings
  EngineeringVLM/          FastVLM vision encoder (decoder via Python)
  LayoutDetector/          DETR ResNet-50 semantic segmentation
Python/
  sam_bridge.py            stdin/stdout JSON, one-shot SAM2 call
  mobileclip_bridge.py     stdin/stdout JSON, image + text embedding
  fastvlm_bridge.py        stdin/stdout JSON, hybrid CoreML+MLX VLM
  layout_bridge.py         stdin/stdout JSON, DETR segmentation
  cli.py                   unified CLI (coreml-occt …)
benchmarks/
  BenchSAM/        bench_sam.swift (p50/p95 latency)
  BenchMobileCLIP/ ...
  BenchFastVLM/    vision encoder only; see Python bridge for e2e
  BenchLayout/     ...
  results.json     schema; run scripts/run_benchmarks.sh to populate
scripts/
  download_artefacts.sh        mirror the 3 pre-built .mlpackages
  export_fastvlm_vision.sh     build the 4th (FastVLM vision encoder)
  run_benchmarks.sh            build + run all four benchmarks
artefacts/                     populated by scripts (empty in repo)
test_data/sample_drawing.png   engineering-drawing reference fixture

Why FastVLM is special

FastVLM is the only one of the four that is hybrid:

Vision encoder (fastvithd) — CoreML, runs on the Apple Neural Engine. Produces a sequence of vision tokens.
Language decoder — runs via MLX on the GPU. Consumes the vision tokens + a text prompt and emits a generation.

This wrapper ships the Swift side (EngineeringVLM) for the CoreML half only. For end-to-end text generation, use Python/fastvlm_bridge.py, which coordinates both halves — set FASTVLM_REPO to a local clone of apple/ml-fastvlm so the bridge can reuse their MLX harness and prompt templates.

Apple does not publish a pre-built fastvithd.mlpackage on HuggingFace, only the underlying safetensors checkpoint. Run scripts/export_fastvlm_vision.sh once to produce it.

Licensing — read this before shipping

Wrapper code (everything under Sources/, Python/, benchmarks/, scripts/) is MIT.

Upstream models are mixed — two Apache-2.0 and two under the Apple Sample Code License (ASCL):

License	Models
Apache-2.0	SAM 2 tiny, DETR ResNet-50 semantic seg
Apple Sample Code License	MobileCLIP S2, FastVLM 1.5B

ASCL is more restrictive than MIT/Apache. If you plan to ship MobileCLIP or FastVLM outputs in a product, read the full ASCL text distributed alongside the upstream artefacts and consult your legal team. See NOTICE for the detailed matrix and upstream source URLs.

Tracking Apple's updates

This repo pins each upstream model to a specific commit:

Repo	Pinned commit
`apple/coreml-sam2-tiny`	`6d04587b4937500c26afbdeeb9777a336efaeef6`
`apple/coreml-mobileclip`	`3e0a7bfb9fe83da8a3efaa3fd8f7df24214bb947`
`apple/FastVLM-1.5B`	`dd6608dfa0e17b050e1dde2856c3437fcba197ac`
`apple/coreml-detr-semantic-segmentation`	`7c771f8867a479d1441ac5fb0a8de31feea76bb6`

When Apple pushes updates, bump the SHAs in scripts/download_artefacts.sh, scripts/export_fastvlm_vision.sh, Sources/CoremlOcctModels/CoremlOcctModels.swift, Python/cli.py, and NOTICE, re-run the download scripts, and re-run the benchmarks.

Related repos in the gsdali/ family

The four models here are general-purpose Apple vision models. Drawing-specific converted models live in sibling repos and are the first-choice tools if you need engineering-drawing accuracy rather than broad-coverage pretraining:

gsdali/coreml-hawp — HAWP-v2 line detection
gsdali/coreml-edocr2 — eDOCr2 dimension/OCR
gsdali/coreml-detr-doclayout — DETR fine-tuned on drawing layout
gsdali/coreml-rt-detr — RT-DETR for symbol detection
gsdali/coreml-paddleocr — PP-OCRv5 for general OCR
gsdali/occt-design-loop-models — our trained GNN + RF models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

coreml-occt-models

Quickstart (Swift)

Quickstart (Python)

Repository layout

Why FastVLM is special

Licensing — read this before shipping

Tracking Apple's updates

Related repos in the gsdali/ family

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Python		Python
Sources		Sources
Tests/CoremlOcctModelsTests		Tests/CoremlOcctModelsTests
artefacts		artefacts
benchmarks		benchmarks
scripts		scripts
test_data		test_data
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
Package.swift		Package.swift
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

coreml-occt-models

Quickstart (Swift)

Quickstart (Python)

Repository layout

Why FastVLM is special

Licensing — read this before shipping

Tracking Apple's updates

Related repos in the gsdali/ family

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages