Skip to content

Add View-of-Delft (VoD) dataset support#23

Merged
stepankonev merged 3 commits into
mainfrom
stepankonev/view-of-delft
Jun 15, 2026
Merged

Add View-of-Delft (VoD) dataset support#23
stepankonev merged 3 commits into
mainfrom
stepankonev/view-of-delft

Conversation

@stepankonev

@stepankonev stepankonev commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Summary

Adds View-of-Delft (VoD) dataset support — a from-scratch processor that reads the extracted KITTI-style lidar/ tree directly, with no devkit dependency (the vod-tudelft devkit pins numpy=1.19 / python=3.7, incompatible with this project's numpy 2.x / py3.12 — same rationale as nuScenes reading its tables directly).

Ingested into StandardFrameData:

  • cameras — the single front camera (CameraDirection.FRONT; intrinsics from the calib P2, extrinsics inv(Tr_velo_to_cam))
  • lidar_pc — the Velodyne HDL-64 xyz in the ego (velodyne, FLU) frame
  • detections_3d — KITTI boxes mapped into the ego frame, each class folded into the coarse DetectionType taxonomy (the two-wheeler family → BICYCLE; Car/truck/vehicle_otherVEHICLE; static/ambiguous → UNKNOWN; DontCare dropped)
  • ego past/future trajectoryT_map_from_lidar from the per-frame pose, via FuturePastStatesFromMatricesAggregator

Frames are grouped into the official scenes (vendored split table from the devkit docs) so per-segment trajectories never span two recordings.

Why not the devkit's FrameTransformMatrix?

The VoD devkit ships convenient frame transforms — from vod.frame import FrameTransformMatrix; transforms = FrameTransformMatrix(frame_data), exposing t_camera_lidar, t_lidar_camera, t_map_camera, … — so the natural question is why we reimplement them in _vod_geometry.py rather than importing them. Two reasons:

  • It's a Python 3.7 / numpy 1.19 stack. The devkit's environment.yml pins python=3.7, numpy=1.19 (plus open3d=0.13, k3d, transforms3d, …), which is incompatible with this project's numpy 2.x / py3.12 — so it can't be a runtime dependency. Same rationale as nuScenes reading its JSON tables directly (its devkit pins numpy<2).
  • Coupling. FrameTransformMatrix(frame_data) is constructed from a devkit FrameDataLoader + kitti_locations, so adopting it means taking on the devkit's whole IO layer — not just the matrix algebra, which is only @ and np.linalg.inv, done here via standard_e2e.utils (se3, transform_points, wrap_to_pi).

We reproduce exactly the transforms we need: devkit t_camera_lidar ≡ our parse_calibration (K from P2, Tr_velo_to_cam); devkit t_lidar_camera = inv(t_camera_lidar); devkit t_map_camera @ t_camera_lidar ≡ our ego_pose_map_from_lidar (mapToCamera @ Tr_velo_to_cam). The one non-trivial convention — the box-yaw -(rotation + π/2) — was validated against the devkit's own corner builder (see Geometry below).

Geometry (verified against the devkit and real data)

  • Ego pose — VoD's mapToCamera is the camera pose in the map (T_map_from_camera), so T_map_from_lidar = mapToCamera @ Tr_velo_to_cam (the inverse yields ~50× too-fast, non-physical ego motion; caught by checking speed on consecutive frames).
  • Box height — KITTI location is the bottom-face center, raised by H/2 to the geometric center (verified by lidar containment: z-ratio +1.04 → +0.04).
  • Box yaw — VoD keeps the KITTI camera-x zero-reference, so the FLU heading is -(rotation + π/2), not -rotation (verified against the devkit's get_transformed_3d_label_corners and the lidar PCA major axis: median misalignment 78° → 12° over 1101 elongated objects).

Not ingested / limitations

  • The 3+1D radar (radar / radar_3frames / radar_5frames) has no StandardE2E modality yet.
  • Per-point reflectance dropped (lidar is xyz-only).
  • Per-frame timestamps synthesised at the 10 Hz LiDAR-lead rate (the detection release ships none).
  • The test split is sensor-only (no labels → no detections).
  • A handful of scene-start frames have unconverged map-localization (xy jumps while camera height stays sane) — a faithful source artifact, not a conversion issue.
  • ~5.3 MB/frame at native resolution (full train ≈ 26 GB); bound via the config's cameras_identity_adapter: {max_size} / lidar_adapter: {max_points}.

Tests

Comprehensive unit tests (calibration / ego-pose / box-geometry incl. the −π/2 yaw offset and H/2 lift, velodyne decode, KITTI label + track-id parse, class taxonomy, scene/split table, root resolution and frame enumeration) plus VOD_ROOT-gated real-frame checks. wrap_to_pi lifted to utils.geometry with its own test. Full suite green (the two failing navsim tests are pre-existing real-data / permission environment failures, unrelated).

Commits

  • Add View-of-Delft (VoD) dataset support
  • Refactor VoD: lift generic geometry helpers to utils, slim FrameRef
  • Fix VoD box heading: KITTI rotation needs the −π/2 reference offset

View-of-Delft (TU Delft, IEEE RA-L 2022) is a compact urban dataset with a 3+1D radar, a 64-layer Velodyne LiDAR, a front camera and KITTI-format 3D boxes over 24 recording scenes. The processor reads the extracted lidar/ tree directly, with no devkit dependency.

Ingested into StandardFrameData: the front camera (CameraDirection.FRONT; K from the calib P2, extrinsics inv(Tr_velo_to_cam)); the Velodyne xyz as lidar_pc in the ego (velodyne, FLU) frame; KITTI 3D boxes as detections_3d in the ego frame, each class folded into the coarse DetectionType taxonomy (DontCare dropped); and the ego past/future trajectory from the per-frame pose via FuturePastStatesFromMatricesAggregator.

Geometry: box yaw is VoD's rotation about LiDAR -Z, negated to the FLU heading; the KITTI bottom-face location is raised by H/2 to the geometric center; the ego pose is T_map_from_lidar = mapToCamera @ Tr_velo_to_cam (VoD's mapToCamera is the camera pose in the map). Frames are grouped into the official scenes (vendored split table) so per-segment trajectories never span recordings.

Not ingested: the 3+1D radar has no StandardE2E modality yet; per-point reflectance is dropped (lidar is xyz-only). Per-frame timestamps are synthesised at the 10 Hz LiDAR-lead rate (the release ships none); the test split is unlabelled.

Adds the vod processor/converter/io/geometry/splits package, configs/vod.yaml, scripts/extract_vod.sh and scripts/prepare_dataset_vod.sh, registry wiring, README and docs/datasets.rst entries, and 41 tests (unit + VOD_ROOT-gated real-frame checks).
wrap_to_pi moves to standard_e2e.utils.geometry (re-exported, covered in tests/test_geometry.py) instead of living privately in the VoD package, matching the utils.geometry consolidation pattern.

parse_calibration drops the VoD-local _row_major_3x4_to_4x4 and reuses the shared se3() to assemble Tr_velo_to_cam from the calib's 3x4 [R|t].

FrameRef drops its split field (only ever set, never read; the processor uses its own split), trimming the per-frame locator to root/scene/subdir/frame_id.

No behavior change; full test suite green (the two failing navsim tests are pre-existing real-data/permission env failures, unrelated).
VoD's KITTI rotation keeps the KITTI zero-reference (a box's length runs along the camera x-axis at rotation==0, i.e. lidar lateral -y), so the ego (velodyne FLU) heading is -(rotation + pi/2), not -rotation. The previous -rotation left every box yaw 90 degrees off (perpendicular to travel).

Confirmed against the devkit's box-corner builder, which rotates by -(rotation + pi/2), and empirically against the lidar point-cloud PCA major axis over 1101 elongated objects (median axis misalignment 78deg -> 12deg).

The earlier lidar-containment verification missed this: VoD's compact footprints (pedestrians, dense bike clusters) stay inside a 90-degrees-rotated box of the same L x W area, so containment is insensitive to a 90deg yaw error. Updated the unit tests, docstrings, README and docs/datasets.rst accordingly.

No other behavior change.
@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

@stepankonev stepankonev merged commit 5315bf4 into main Jun 15, 2026
3 checks passed
@stepankonev stepankonev mentioned this pull request Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant