feat: Add depth estimation support (CoreML Depth Anything Small)

## Feature: Depth Estimation Support

Add support for [CoreML Depth Anything Small](https://huggingface.co/apple/coreml-depth-anything-small) - a depth estimation model optimized for Apple Silicon.

## What It Does

**Input:** Single image
**Output:** Per-pixel depth map (grayscale heatmap where brightness = depth)

**Model Details:**
- Architecture: DPT (Dense Prediction Transformer) with DINOv2 backbone
- Size: 24.8M parameters
- F16 variant: 45.8MB, ~25-34ms inference on Apple Silicon

## Why Add This?

**1. 3D Scene Understanding**
- Know how far objects are (not just where)
- Critical for AR/VR applications
- Enables proper 3D positioning

**2. Better Meta Glasses Integration**
- Current: 2D bounding boxes on flat video
- With depth: True 3D spatial understanding
- Objects can be placed correctly in 3D space

**3. SAM2 + Depth = Full 3D**
- SAM2: Object segmentation masks
- Depth: Z-coordinate for each pixel
- Combined: Complete 3D model of scene

**4. Pre-processing for Training**
- Depth-aware image augmentation
- Generate synthetic training data
- Better depth-aware model training

## Technical Requirements

**Input Handling:**
- Load images (already supported)
- Pass to CoreML model (need MLFeatureValue for images)

**Output Processing:**
- Parse multi-array depth map output
- Convert to UIImage/CGImage for display
- Apply colormap (grayscale or rainbow heatmap)

**Visualization:**
- Overlay depth heatmap on video/image
- Add depth slider for filtering
- Show depth value on hover/click
- Color map options (viridis, plasma, rainbow, etc.)

**Model Loading:**
- Download from HuggingFace: `apple/coreml-depth-anything-small`
- Supports `.mlpackage` format
- Already optimized for Apple Neural Engine

## Integration Points

**New Model Type:**
```swift
enum ModelKind {
    case detector
    case classifier
    case embedding
    case depthEstimation  // NEW
}
```

**New DetectedObject:**
```swift
struct DetectedObject {
    // Existing fields for detection/classification
    // NEW: depth map data
    var depthMap: CGImage?
    var minDepth: Float?
    var maxDepth: Float?
}
```

## Performance Considerations

- **Speed:** ~25-34ms per inference (fast enough for real-time)
- **Size:** 45.8MB (F16) - reasonable
- **Hardware:** Runs on Neural Engine (doesn't block GPU)

## Use Cases

1. **AR on Meta Glasses** - Proper 3D object placement
2. **Scene understanding** - Know room dimensions, object distances
3. **Training data** - Capture depth-aware datasets
4. **3D reconstruction** - Build 3D models from 2D images

## Related
- Model: https://huggingface.co/apple/coreml-depth-anything-small
- Paper: [Depth Anything](https://arxiv.org/abs/2401.10891)
- Example code: https://github.com/huggingface/coreml-examples/tree/main/depth-anything-example

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add depth estimation support (CoreML Depth Anything Small) #6

Feature: Depth Estimation Support

What It Does

Why Add This?

Technical Requirements

Integration Points

Performance Considerations

Use Cases

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: Add depth estimation support (CoreML Depth Anything Small) #6

Description

Feature: Depth Estimation Support

What It Does

Why Add This?

Technical Requirements

Integration Points

Performance Considerations

Use Cases

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions