Feature: Depth Estimation Support
Add support for CoreML Depth Anything Small - a depth estimation model optimized for Apple Silicon.
What It Does
Input: Single image
Output: Per-pixel depth map (grayscale heatmap where brightness = depth)
Model Details:
- Architecture: DPT (Dense Prediction Transformer) with DINOv2 backbone
- Size: 24.8M parameters
- F16 variant: 45.8MB, ~25-34ms inference on Apple Silicon
Why Add This?
1. 3D Scene Understanding
- Know how far objects are (not just where)
- Critical for AR/VR applications
- Enables proper 3D positioning
2. Better Meta Glasses Integration
- Current: 2D bounding boxes on flat video
- With depth: True 3D spatial understanding
- Objects can be placed correctly in 3D space
3. SAM2 + Depth = Full 3D
- SAM2: Object segmentation masks
- Depth: Z-coordinate for each pixel
- Combined: Complete 3D model of scene
4. Pre-processing for Training
- Depth-aware image augmentation
- Generate synthetic training data
- Better depth-aware model training
Technical Requirements
Input Handling:
- Load images (already supported)
- Pass to CoreML model (need MLFeatureValue for images)
Output Processing:
- Parse multi-array depth map output
- Convert to UIImage/CGImage for display
- Apply colormap (grayscale or rainbow heatmap)
Visualization:
- Overlay depth heatmap on video/image
- Add depth slider for filtering
- Show depth value on hover/click
- Color map options (viridis, plasma, rainbow, etc.)
Model Loading:
- Download from HuggingFace:
apple/coreml-depth-anything-small
- Supports
.mlpackage format
- Already optimized for Apple Neural Engine
Integration Points
New Model Type:
enum ModelKind {
case detector
case classifier
case embedding
case depthEstimation // NEW
}
New DetectedObject:
struct DetectedObject {
// Existing fields for detection/classification
// NEW: depth map data
var depthMap: CGImage?
var minDepth: Float?
var maxDepth: Float?
}
Performance Considerations
- Speed: ~25-34ms per inference (fast enough for real-time)
- Size: 45.8MB (F16) - reasonable
- Hardware: Runs on Neural Engine (doesn't block GPU)
Use Cases
- AR on Meta Glasses - Proper 3D object placement
- Scene understanding - Know room dimensions, object distances
- Training data - Capture depth-aware datasets
- 3D reconstruction - Build 3D models from 2D images
Related
🤖 Generated with Claude Code
Feature: Depth Estimation Support
Add support for CoreML Depth Anything Small - a depth estimation model optimized for Apple Silicon.
What It Does
Input: Single image
Output: Per-pixel depth map (grayscale heatmap where brightness = depth)
Model Details:
Why Add This?
1. 3D Scene Understanding
2. Better Meta Glasses Integration
3. SAM2 + Depth = Full 3D
4. Pre-processing for Training
Technical Requirements
Input Handling:
Output Processing:
Visualization:
Model Loading:
apple/coreml-depth-anything-small.mlpackageformatIntegration Points
New Model Type:
New DetectedObject:
Performance Considerations
Use Cases
Related
🤖 Generated with Claude Code