Add View-of-Delft (VoD) dataset support by stepankonev · Pull Request #23 · stepankonev/StandardE2E

stepankonev · 2026-06-14T00:23:17Z

Summary

Adds View-of-Delft (VoD) dataset support — a from-scratch processor that reads the extracted KITTI-style lidar/ tree directly, with no devkit dependency (the vod-tudelft devkit pins numpy=1.19 / python=3.7, incompatible with this project's numpy 2.x / py3.12 — same rationale as nuScenes reading its tables directly).

Ingested into StandardFrameData:

cameras — the single front camera (CameraDirection.FRONT; intrinsics from the calib P2, extrinsics inv(Tr_velo_to_cam))
lidar_pc — the Velodyne HDL-64 xyz in the ego (velodyne, FLU) frame
detections_3d — KITTI boxes mapped into the ego frame, each class folded into the coarse DetectionType taxonomy (the two-wheeler family → BICYCLE; Car/truck/vehicle_other → VEHICLE; static/ambiguous → UNKNOWN; DontCare dropped)
ego past/future trajectory — T_map_from_lidar from the per-frame pose, via FuturePastStatesFromMatricesAggregator

Frames are grouped into the official scenes (vendored split table from the devkit docs) so per-segment trajectories never span two recordings.

Why not the devkit's `FrameTransformMatrix`?

The VoD devkit ships convenient frame transforms — from vod.frame import FrameTransformMatrix; transforms = FrameTransformMatrix(frame_data), exposing t_camera_lidar, t_lidar_camera, t_map_camera, … — so the natural question is why we reimplement them in _vod_geometry.py rather than importing them. Two reasons:

It's a Python 3.7 / numpy 1.19 stack. The devkit's environment.yml pins python=3.7, numpy=1.19 (plus open3d=0.13, k3d, transforms3d, …), which is incompatible with this project's numpy 2.x / py3.12 — so it can't be a runtime dependency. Same rationale as nuScenes reading its JSON tables directly (its devkit pins numpy<2).
Coupling. FrameTransformMatrix(frame_data) is constructed from a devkit FrameDataLoader + kitti_locations, so adopting it means taking on the devkit's whole IO layer — not just the matrix algebra, which is only @ and np.linalg.inv, done here via standard_e2e.utils (se3, transform_points, wrap_to_pi).

We reproduce exactly the transforms we need: devkit t_camera_lidar ≡ our parse_calibration (K from P2, Tr_velo_to_cam); devkit t_lidar_camera = inv(t_camera_lidar); devkit t_map_camera @ t_camera_lidar ≡ our ego_pose_map_from_lidar (mapToCamera @ Tr_velo_to_cam). The one non-trivial convention — the box-yaw -(rotation + π/2) — was validated against the devkit's own corner builder (see Geometry below).

Geometry (verified against the devkit and real data)

Ego pose — VoD's mapToCamera is the camera pose in the map (T_map_from_camera), so T_map_from_lidar = mapToCamera @ Tr_velo_to_cam (the inverse yields ~50× too-fast, non-physical ego motion; caught by checking speed on consecutive frames).
Box height — KITTI location is the bottom-face center, raised by H/2 to the geometric center (verified by lidar containment: z-ratio +1.04 → +0.04).
Box yaw — VoD keeps the KITTI camera-x zero-reference, so the FLU heading is -(rotation + π/2), not -rotation (verified against the devkit's get_transformed_3d_label_corners and the lidar PCA major axis: median misalignment 78° → 12° over 1101 elongated objects).

Not ingested / limitations

The 3+1D radar (radar / radar_3frames / radar_5frames) has no StandardE2E modality yet.
Per-point reflectance dropped (lidar is xyz-only).
Per-frame timestamps synthesised at the 10 Hz LiDAR-lead rate (the detection release ships none).
The test split is sensor-only (no labels → no detections).
A handful of scene-start frames have unconverged map-localization (xy jumps while camera height stays sane) — a faithful source artifact, not a conversion issue.
~5.3 MB/frame at native resolution (full train ≈ 26 GB); bound via the config's cameras_identity_adapter: {max_size} / lidar_adapter: {max_points}.

Tests

Comprehensive unit tests (calibration / ego-pose / box-geometry incl. the −π/2 yaw offset and H/2 lift, velodyne decode, KITTI label + track-id parse, class taxonomy, scene/split table, root resolution and frame enumeration) plus VOD_ROOT-gated real-frame checks. wrap_to_pi lifted to utils.geometry with its own test. Full suite green (the two failing navsim tests are pre-existing real-data / permission environment failures, unrelated).

Commits

Add View-of-Delft (VoD) dataset support
Refactor VoD: lift generic geometry helpers to utils, slim FrameRef
Fix VoD box heading: KITTI rotation needs the −π/2 reference offset

View-of-Delft (TU Delft, IEEE RA-L 2022) is a compact urban dataset with a 3+1D radar, a 64-layer Velodyne LiDAR, a front camera and KITTI-format 3D boxes over 24 recording scenes. The processor reads the extracted lidar/ tree directly, with no devkit dependency. Ingested into StandardFrameData: the front camera (CameraDirection.FRONT; K from the calib P2, extrinsics inv(Tr_velo_to_cam)); the Velodyne xyz as lidar_pc in the ego (velodyne, FLU) frame; KITTI 3D boxes as detections_3d in the ego frame, each class folded into the coarse DetectionType taxonomy (DontCare dropped); and the ego past/future trajectory from the per-frame pose via FuturePastStatesFromMatricesAggregator. Geometry: box yaw is VoD's rotation about LiDAR -Z, negated to the FLU heading; the KITTI bottom-face location is raised by H/2 to the geometric center; the ego pose is T_map_from_lidar = mapToCamera @ Tr_velo_to_cam (VoD's mapToCamera is the camera pose in the map). Frames are grouped into the official scenes (vendored split table) so per-segment trajectories never span recordings. Not ingested: the 3+1D radar has no StandardE2E modality yet; per-point reflectance is dropped (lidar is xyz-only). Per-frame timestamps are synthesised at the 10 Hz LiDAR-lead rate (the release ships none); the test split is unlabelled. Adds the vod processor/converter/io/geometry/splits package, configs/vod.yaml, scripts/extract_vod.sh and scripts/prepare_dataset_vod.sh, registry wiring, README and docs/datasets.rst entries, and 41 tests (unit + VOD_ROOT-gated real-frame checks).

wrap_to_pi moves to standard_e2e.utils.geometry (re-exported, covered in tests/test_geometry.py) instead of living privately in the VoD package, matching the utils.geometry consolidation pattern. parse_calibration drops the VoD-local _row_major_3x4_to_4x4 and reuses the shared se3() to assemble Tr_velo_to_cam from the calib's 3x4 [R|t]. FrameRef drops its split field (only ever set, never read; the processor uses its own split), trimming the per-frame locator to root/scene/subdir/frame_id. No behavior change; full test suite green (the two failing navsim tests are pre-existing real-data/permission env failures, unrelated).

VoD's KITTI rotation keeps the KITTI zero-reference (a box's length runs along the camera x-axis at rotation==0, i.e. lidar lateral -y), so the ego (velodyne FLU) heading is -(rotation + pi/2), not -rotation. The previous -rotation left every box yaw 90 degrees off (perpendicular to travel). Confirmed against the devkit's box-corner builder, which rotates by -(rotation + pi/2), and empirically against the lidar point-cloud PCA major axis over 1101 elongated objects (median axis misalignment 78deg -> 12deg). The earlier lidar-containment verification missed this: VoD's compact footprints (pedestrians, dense bike clusters) stay inside a 90-degrees-rotated box of the same L x W area, so containment is insensitive to a 90deg yaw error. Updated the unit tests, docstrings, README and docs/datasets.rst accordingly. No other behavior change.

codecov · 2026-06-14T00:27:53Z

Codecov Report

❌ Patch coverage is 79.48718% with 56 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
.../caching/src_datasets/vod/vod_dataset_processor.py	57.14%	33 Missing ⚠️
standard_e2e/caching/src_datasets/vod/_vod_io.py	89.09%	12 Missing ⚠️
.../caching/src_datasets/vod/vod_dataset_converter.py	57.89%	8 Missing ⚠️
...dard_e2e/caching/src_datasets/vod/_vod_geometry.py	94.44%	2 Missing ⚠️
standard_e2e/caching/process_source_dataset.py	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

stepankonev added 3 commits June 13, 2026 16:14

stepankonev merged commit 5315bf4 into main Jun 15, 2026
3 checks passed

stepankonev mentioned this pull request Jun 15, 2026

Bump version to 0.0.6 #25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add View-of-Delft (VoD) dataset support#23

Add View-of-Delft (VoD) dataset support#23
stepankonev merged 3 commits into
mainfrom
stepankonev/view-of-delft

stepankonev commented Jun 14, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stepankonev commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why not the devkit's FrameTransformMatrix?

Geometry (verified against the devkit and real data)

Not ingested / limitations

Tests

Commits

Uh oh!

codecov Bot commented Jun 14, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

stepankonev commented Jun 14, 2026 •

edited

Loading

Why not the devkit's `FrameTransformMatrix`?