Add View-of-Delft (VoD) dataset support#23
Merged
Conversation
View-of-Delft (TU Delft, IEEE RA-L 2022) is a compact urban dataset with a 3+1D radar, a 64-layer Velodyne LiDAR, a front camera and KITTI-format 3D boxes over 24 recording scenes. The processor reads the extracted lidar/ tree directly, with no devkit dependency. Ingested into StandardFrameData: the front camera (CameraDirection.FRONT; K from the calib P2, extrinsics inv(Tr_velo_to_cam)); the Velodyne xyz as lidar_pc in the ego (velodyne, FLU) frame; KITTI 3D boxes as detections_3d in the ego frame, each class folded into the coarse DetectionType taxonomy (DontCare dropped); and the ego past/future trajectory from the per-frame pose via FuturePastStatesFromMatricesAggregator. Geometry: box yaw is VoD's rotation about LiDAR -Z, negated to the FLU heading; the KITTI bottom-face location is raised by H/2 to the geometric center; the ego pose is T_map_from_lidar = mapToCamera @ Tr_velo_to_cam (VoD's mapToCamera is the camera pose in the map). Frames are grouped into the official scenes (vendored split table) so per-segment trajectories never span recordings. Not ingested: the 3+1D radar has no StandardE2E modality yet; per-point reflectance is dropped (lidar is xyz-only). Per-frame timestamps are synthesised at the 10 Hz LiDAR-lead rate (the release ships none); the test split is unlabelled. Adds the vod processor/converter/io/geometry/splits package, configs/vod.yaml, scripts/extract_vod.sh and scripts/prepare_dataset_vod.sh, registry wiring, README and docs/datasets.rst entries, and 41 tests (unit + VOD_ROOT-gated real-frame checks).
wrap_to_pi moves to standard_e2e.utils.geometry (re-exported, covered in tests/test_geometry.py) instead of living privately in the VoD package, matching the utils.geometry consolidation pattern. parse_calibration drops the VoD-local _row_major_3x4_to_4x4 and reuses the shared se3() to assemble Tr_velo_to_cam from the calib's 3x4 [R|t]. FrameRef drops its split field (only ever set, never read; the processor uses its own split), trimming the per-frame locator to root/scene/subdir/frame_id. No behavior change; full test suite green (the two failing navsim tests are pre-existing real-data/permission env failures, unrelated).
VoD's KITTI rotation keeps the KITTI zero-reference (a box's length runs along the camera x-axis at rotation==0, i.e. lidar lateral -y), so the ego (velodyne FLU) heading is -(rotation + pi/2), not -rotation. The previous -rotation left every box yaw 90 degrees off (perpendicular to travel). Confirmed against the devkit's box-corner builder, which rotates by -(rotation + pi/2), and empirically against the lidar point-cloud PCA major axis over 1101 elongated objects (median axis misalignment 78deg -> 12deg). The earlier lidar-containment verification missed this: VoD's compact footprints (pedestrians, dense bike clusters) stay inside a 90-degrees-rotated box of the same L x W area, so containment is insensitive to a 90deg yaw error. Updated the unit tests, docstrings, README and docs/datasets.rst accordingly. No other behavior change.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds View-of-Delft (VoD) dataset support — a from-scratch processor that reads the extracted KITTI-style
lidar/tree directly, with no devkit dependency (thevod-tudelftdevkit pinsnumpy=1.19/python=3.7, incompatible with this project's numpy 2.x / py3.12 — same rationale as nuScenes reading its tables directly).Ingested into
StandardFrameData:CameraDirection.FRONT; intrinsics from the calibP2, extrinsicsinv(Tr_velo_to_cam))DetectionTypetaxonomy (the two-wheeler family →BICYCLE;Car/truck/vehicle_other→VEHICLE; static/ambiguous →UNKNOWN;DontCaredropped)T_map_from_lidarfrom the per-frame pose, viaFuturePastStatesFromMatricesAggregatorFrames are grouped into the official scenes (vendored split table from the devkit docs) so per-segment trajectories never span two recordings.
Why not the devkit's
FrameTransformMatrix?The VoD devkit ships convenient frame transforms —
from vod.frame import FrameTransformMatrix; transforms = FrameTransformMatrix(frame_data), exposingt_camera_lidar,t_lidar_camera,t_map_camera, … — so the natural question is why we reimplement them in_vod_geometry.pyrather than importing them. Two reasons:environment.ymlpinspython=3.7,numpy=1.19(plusopen3d=0.13,k3d,transforms3d, …), which is incompatible with this project's numpy 2.x / py3.12 — so it can't be a runtime dependency. Same rationale as nuScenes reading its JSON tables directly (its devkit pinsnumpy<2).FrameTransformMatrix(frame_data)is constructed from a devkitFrameDataLoader+kitti_locations, so adopting it means taking on the devkit's whole IO layer — not just the matrix algebra, which is only@andnp.linalg.inv, done here viastandard_e2e.utils(se3,transform_points,wrap_to_pi).We reproduce exactly the transforms we need: devkit
t_camera_lidar≡ ourparse_calibration(KfromP2,Tr_velo_to_cam); devkitt_lidar_camera = inv(t_camera_lidar); devkitt_map_camera @ t_camera_lidar≡ ourego_pose_map_from_lidar(mapToCamera @ Tr_velo_to_cam). The one non-trivial convention — the box-yaw-(rotation + π/2)— was validated against the devkit's own corner builder (see Geometry below).Geometry (verified against the devkit and real data)
mapToCamerais the camera pose in the map (T_map_from_camera), soT_map_from_lidar = mapToCamera @ Tr_velo_to_cam(the inverse yields ~50× too-fast, non-physical ego motion; caught by checking speed on consecutive frames).locationis the bottom-face center, raised by H/2 to the geometric center (verified by lidar containment: z-ratio +1.04 → +0.04).-(rotation + π/2), not-rotation(verified against the devkit'sget_transformed_3d_label_cornersand the lidar PCA major axis: median misalignment 78° → 12° over 1101 elongated objects).Not ingested / limitations
radar/radar_3frames/radar_5frames) has no StandardE2E modality yet.cameras_identity_adapter: {max_size}/lidar_adapter: {max_points}.Tests
Comprehensive unit tests (calibration / ego-pose / box-geometry incl. the −π/2 yaw offset and H/2 lift, velodyne decode, KITTI label + track-id parse, class taxonomy, scene/split table, root resolution and frame enumeration) plus
VOD_ROOT-gated real-frame checks.wrap_to_pilifted toutils.geometrywith its own test. Full suite green (the two failing navsim tests are pre-existing real-data / permission environment failures, unrelated).Commits