Add nuScenes (v1.0) dataset support#24
Merged
Merged
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
nuScenes (Motional, CVPR 2020) -- 1000 scenes, a 6-camera surround rig, LIDAR_TOP, densely annotated 3D boxes at 2 Hz keyframes, and the vector map-expansion HD map. Read directly from the JSON tables (no devkit dependency; it pins numpy<2 against this project's numpy 2.x). Ingested into StandardFrameData: the 6 CAM_* channels under their canonical CameraDirection (intrinsics=camera_intrinsic, extrinsics=T_ego_from_camera); LIDAR_TOP xyz in the ego frame; sample_annotation boxes transformed global->ego with category_name folded into the coarse DetectionType; the map-expansion vector layers (lane centers via the arcline paths, dividers, crossings, walkways, stop lines, drivable area, intersections) translated to the unified MapElementType in the ego frame and rasterised by HDMapBEVAdapter (when the map-expansion pack is unzipped into <dataroot>/maps/); and the ego trajectory from T_global_from_ego via FuturePastStatesFromMatricesAggregator. Architecture: NuscTables loads the tables once in the parent and resolves each keyframe into a compact, picklable NuscFrame, so workers ship no table state. Only keyframe sample_data are kept (~410k of 2.6M on trainval), cutting parent RAM + load time. Reuses standard_e2e.utils (se3, quat_wxyz_to_rotmat, transform_points, matrix_to_xyz_heading) + numpy/scipy; the split scene-lists and the lane-arcline discretization are vendored from the devkit (Apache-2.0). Robust to a partial download: the converter skips scenes whose sensor blob is not on disk yet (verified -- a partial trainval converted 62/700 scenes cleanly). The test split ships no annotations; the 5 radars have no StandardE2E target yet. Verified on v1.0-mini: box yaw aligns with the lidar (median 14 deg vs PCA major axis), the HD map places the ego inside DRIVABLE_AREA every frame, ~32 frames/s on 8 workers. Adds configs/nuscenes.yaml, extract/prepare scripts, registry wiring, README + docs/datasets.rst, and 34 tests (unit + synthetic-tree + NUSCENES_DATAROOT-gated real-frame checks).
Enable the existing Detections3DBEVAdapter in the nuScenes config on the same BEV grid as hdmap_bev_adapter, so the detection and HD-map rasters co-register pixel-for-pixel, and add detections_3d_bev to the saved features. The vector detections_3d is kept as well, for projecting the 3D boxes into the camera images.
d62f121 to
7d31c4f
Compare
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds support for the nuScenes dataset (Motional, CVPR 2020) — the de-facto surround-view E2E / BEV benchmark: 1000 ~20 s scenes, a 6-camera surround rig (1600×900), a 32-beam
LIDAR_TOP, and densely annotated 3D boxes at 2 Hz keyframes. One keyframesample→ oneStandardFrameData; one scene → one segment.Modalities emitted
CAM_*surround views mapped onto the canonicalCameraDirectionmembers; downscaled viacameras_identity_adaptermax_size.LIDAR_TOPcloud (xyz, ego frame).sample_annotationboxes transformed global→ego;category_namefolded into the coarseDetectionType.Detections3DBEVAdapteron the same grid as the HD-map raster, so the two co-register pixel-for-pixel.MapElementTypein the ego frame → rasterized byHDMapBEVAdapter.Design notes
nuscenes-devkitis not a runtime dependency (it pinsnumpy<2, which conflicts with the project's numpy 2.x). The split scene-lists and the lane-arcline discretization are vendored from the devkit (Apache-2.0).--splitis an official nuScenes label that also selects the metadata version:mini_train/mini_val→ v1.0-mini,train/val→ v1.0-trainval,test→ v1.0-test. The test split ships no annotations.LIDAR_TOPkeyframesample_datarows are loaded (~410 k vs 2.6 M), giving ~32 frames/s at 8 workers.nuScenes-map-expansion-v1.3pack is unzipped into<dataroot>/maps/; otherwise the map is skipped.Limitations
.tgzarchives and must be extracted first (scripts/extract_nuscenes.sh, orscripts/prepare_dataset_nuscenes.shto extract + preprocess in one step).Tests
tests/dataset_processors/test_nuscenes_dataset_processor.py— box geometry, global→ego transform,category_name→DetectionTypefolding, camera-channel mapping, partial-download detection, etc. Real-frame checks are gated onNUSCENES_DATAROOT(defaultv1.0-mini), so CI without the data still runs the unit-level assertions.Validation
mini_train converted end-to-end; on real frames I verified box dimensions (ped 0.85 m, vehicle 4.71 m, bike 2.06 m), map alignment (ego ∈ drivable area), and left/right BEV orientation — all correct. Per-scene videos (6-camera grid with projected boxes + BEV from the npz rasters) were rendered from the processed
.npzto sanity-check every modality.