Skip to content

gsdali/coreml-occt-models

Repository files navigation

coreml-occt-models

Convenience wrappers for Apple's official CoreML vision models, packaged for engineering-drawing workflows (OCCTDesignLoop and friends).

This repo does not train or convert any model. It re-packages four Apple-published CoreML artefacts behind high-level Swift APIs and Python stdin/stdout bridges, adds engineering-drawing-focused benchmarks, and documents the mixed license landscape up front.

Model Swift wrapper Python bridge Input size License
SAM 2 tiny SAMSegmenter sam_bridge.py 1024x1024 Apache-2.0
MobileCLIP S2 DrawingEmbedder mobileclip_bridge.py 256x256 Apple Sample Code License
FastVLM 1.5B (hybrid CoreML + MLX) EngineeringVLM fastvlm_bridge.py 1024x1024 Apple Sample Code License
DETR ResNet-50 (semantic seg.) LayoutDetector layout_bridge.py 448x448 Apache-2.0

See NOTICE for the full per-model license matrix and the pinned upstream commits.

Quickstart (Swift)

Package.swift:

dependencies: [
    .package(url: "https://github.com/gsdali/coreml-occt-models", from: "0.1.0"),
],
targets: [
    .target(name: "YourApp", dependencies: [
        .product(name: "CoremlOcctModels",
                 package: "coreml-occt-models"),
    ]),
]

Download the artefacts once:

brew install git-lfs && git lfs install
./scripts/download_artefacts.sh
# FastVLM is not pre-built on HuggingFace; run this as well if you need it:
./scripts/export_fastvlm_vision.sh

Then use any model in <10 lines:

import CoremlOcctModels

// Segment around the centre of a drawing.
let seg = try SAMSegmenter()
let out = try seg.segment(
    imageURL: URL(fileURLWithPath: "drawing.png"),
    points: [.init(x: 1024, y: 768, label: .foreground)]
)

// CLIP embed for retrieval.
let emb = try DrawingEmbedder()
let vec = try emb.embed(imageURL: URL(fileURLWithPath: "drawing.png"))

// Semantic segmentation for coarse layout hints.
let det = try LayoutDetector()
let layout = try det.detect(imageURL: URL(fileURLWithPath: "drawing.png"))

Quickstart (Python)

pip install -e .
# or just: pip install coremltools pillow numpy open_clip_torch

One-shot subprocess:

echo '{"image_path": "test_data/sample_drawing.png",
       "points": [{"x": 800, "y": 600, "label": 1}]}' \
  | python Python/sam_bridge.py
# -> {"ok": true, "mask_shape": [256, 256], "mask_rle": "...", "score": 0.92}

Or via the unified CLI (after pip install -e .):

coreml-occt segment drawing.png --x 800 --y 600
coreml-occt embed drawing.png --text "mechanical drawing of a gear"
coreml-occt caption drawing.png
coreml-occt layout drawing.png
coreml-occt versions          # show pinned upstream commits

Repository layout

Package.swift              SwiftPM manifest (4 wrappers + 4 benchmarks)
Sources/
  CoremlOcctModels/        umbrella module that re-exports everything
  SAMSegmenter/            3-stage SAM 2 tiny pipeline
  DrawingEmbedder/         MobileCLIP S2 image/text embeddings
  EngineeringVLM/          FastVLM vision encoder (decoder via Python)
  LayoutDetector/          DETR ResNet-50 semantic segmentation
Python/
  sam_bridge.py            stdin/stdout JSON, one-shot SAM2 call
  mobileclip_bridge.py     stdin/stdout JSON, image + text embedding
  fastvlm_bridge.py        stdin/stdout JSON, hybrid CoreML+MLX VLM
  layout_bridge.py         stdin/stdout JSON, DETR segmentation
  cli.py                   unified CLI (coreml-occt …)
benchmarks/
  BenchSAM/        bench_sam.swift (p50/p95 latency)
  BenchMobileCLIP/ ...
  BenchFastVLM/    vision encoder only; see Python bridge for e2e
  BenchLayout/     ...
  results.json     schema; run scripts/run_benchmarks.sh to populate
scripts/
  download_artefacts.sh        mirror the 3 pre-built .mlpackages
  export_fastvlm_vision.sh     build the 4th (FastVLM vision encoder)
  run_benchmarks.sh            build + run all four benchmarks
artefacts/                     populated by scripts (empty in repo)
test_data/sample_drawing.png   engineering-drawing reference fixture

Why FastVLM is special

FastVLM is the only one of the four that is hybrid:

  • Vision encoder (fastvithd) — CoreML, runs on the Apple Neural Engine. Produces a sequence of vision tokens.
  • Language decoder — runs via MLX on the GPU. Consumes the vision tokens + a text prompt and emits a generation.

This wrapper ships the Swift side (EngineeringVLM) for the CoreML half only. For end-to-end text generation, use Python/fastvlm_bridge.py, which coordinates both halves — set FASTVLM_REPO to a local clone of apple/ml-fastvlm so the bridge can reuse their MLX harness and prompt templates.

Apple does not publish a pre-built fastvithd.mlpackage on HuggingFace, only the underlying safetensors checkpoint. Run scripts/export_fastvlm_vision.sh once to produce it.

Licensing — read this before shipping

Wrapper code (everything under Sources/, Python/, benchmarks/, scripts/) is MIT.

Upstream models are mixed — two Apache-2.0 and two under the Apple Sample Code License (ASCL):

License Models
Apache-2.0 SAM 2 tiny, DETR ResNet-50 semantic seg
Apple Sample Code License MobileCLIP S2, FastVLM 1.5B

ASCL is more restrictive than MIT/Apache. If you plan to ship MobileCLIP or FastVLM outputs in a product, read the full ASCL text distributed alongside the upstream artefacts and consult your legal team. See NOTICE for the detailed matrix and upstream source URLs.

Tracking Apple's updates

This repo pins each upstream model to a specific commit:

Repo Pinned commit
apple/coreml-sam2-tiny 6d04587b4937500c26afbdeeb9777a336efaeef6
apple/coreml-mobileclip 3e0a7bfb9fe83da8a3efaa3fd8f7df24214bb947
apple/FastVLM-1.5B dd6608dfa0e17b050e1dde2856c3437fcba197ac
apple/coreml-detr-semantic-segmentation 7c771f8867a479d1441ac5fb0a8de31feea76bb6

When Apple pushes updates, bump the SHAs in scripts/download_artefacts.sh, scripts/export_fastvlm_vision.sh, Sources/CoremlOcctModels/CoremlOcctModels.swift, Python/cli.py, and NOTICE, re-run the download scripts, and re-run the benchmarks.

Related repos in the gsdali/ family

The four models here are general-purpose Apple vision models. Drawing-specific converted models live in sibling repos and are the first-choice tools if you need engineering-drawing accuracy rather than broad-coverage pretraining:

  • gsdali/coreml-hawp — HAWP-v2 line detection
  • gsdali/coreml-edocr2 — eDOCr2 dimension/OCR
  • gsdali/coreml-detr-doclayout — DETR fine-tuned on drawing layout
  • gsdali/coreml-rt-detr — RT-DETR for symbol detection
  • gsdali/coreml-paddleocr — PP-OCRv5 for general OCR
  • gsdali/occt-design-loop-models — our trained GNN + RF models

About

Convenience wrappers for Apple's CoreML vision models, optimised for engineering drawing workflows

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors