Skip to content

Add nuScenes (v1.0) dataset support#24

Merged
stepankonev merged 2 commits into
mainfrom
stepankonev/nuscenes-impl
Jun 15, 2026
Merged

Add nuScenes (v1.0) dataset support#24
stepankonev merged 2 commits into
mainfrom
stepankonev/nuscenes-impl

Conversation

@stepankonev

Copy link
Copy Markdown
Owner

Summary

Adds support for the nuScenes dataset (Motional, CVPR 2020) — the de-facto surround-view E2E / BEV benchmark: 1000 ~20 s scenes, a 6-camera surround rig (1600×900), a 32-beam LIDAR_TOP, and densely annotated 3D boxes at 2 Hz keyframes. One keyframe sample → one StandardFrameData; one scene → one segment.

Modalities emitted

Modality Source
Cameras 6 CAM_* surround views mapped onto the canonical CameraDirection members; downscaled via cameras_identity_adapter max_size.
LiDAR LIDAR_TOP cloud (xyz, ego frame).
3D detections (vector) sample_annotation boxes transformed global→ego; category_name folded into the coarse DetectionType.
3D detections (BEV raster) the same boxes rasterized by Detections3DBEVAdapter on the same grid as the HD-map raster, so the two co-register pixel-for-pixel.
HD map (BEV) vector map-expansion (arcline lane centers, lane/road dividers, crossings, walkways, stop lines, drivable area, intersections) → unified MapElementType in the ego frame → rasterized by HDMapBEVAdapter.
Past/future trajectory ego poses, via the segment-context aggregator.

Design notes

  • Read directly from the JSON tablesnuscenes-devkit is not a runtime dependency (it pins numpy<2, which conflicts with the project's numpy 2.x). The split scene-lists and the lane-arcline discretization are vendored from the devkit (Apache-2.0).
  • --split is an official nuScenes label that also selects the metadata version: mini_train/mini_val → v1.0-mini, train/val → v1.0-trainval, test → v1.0-test. The test split ships no annotations.
  • Conversion-rate optimization — only LIDAR_TOP keyframe sample_data rows are loaded (~410 k vs 2.6 M), giving ~32 frames/s at 8 workers.
  • Partial trainval converts cleanly — scenes whose sensor blob is not yet on disk are detected and skipped, so conversion works while a download is still in progress.
  • HD map is optional — rasterized only when the separate nuScenes-map-expansion-v1.3 pack is unzipped into <dataroot>/maps/; otherwise the map is skipped.

Limitations

  • The 5 radars have no StandardE2E target yet and are not ingested.
  • Ships as .tgz archives and must be extracted first (scripts/extract_nuscenes.sh, or scripts/prepare_dataset_nuscenes.sh to extract + preprocess in one step).

Tests

  • tests/dataset_processors/test_nuscenes_dataset_processor.py — box geometry, global→ego transform, category_nameDetectionType folding, camera-channel mapping, partial-download detection, etc. Real-frame checks are gated on NUSCENES_DATAROOT (default v1.0-mini), so CI without the data still runs the unit-level assertions.
  • Box orientation verified against LiDAR PCA (not point-containment, which is blind to 90° yaw errors).

Validation

mini_train converted end-to-end; on real frames I verified box dimensions (ped 0.85 m, vehicle 4.71 m, bike 2.06 m), map alignment (ego ∈ drivable area), and left/right BEV orientation — all correct. Per-scene videos (6-camera grid with projected boxes + BEV from the npz rasters) were rendered from the processed .npz to sanity-check every modality.

nuScenes (Motional, CVPR 2020) -- 1000 scenes, a 6-camera surround rig, LIDAR_TOP, densely annotated 3D boxes at 2 Hz keyframes, and the vector map-expansion HD map. Read directly from the JSON tables (no devkit dependency; it pins numpy<2 against this project's numpy 2.x).

Ingested into StandardFrameData: the 6 CAM_* channels under their canonical CameraDirection (intrinsics=camera_intrinsic, extrinsics=T_ego_from_camera); LIDAR_TOP xyz in the ego frame; sample_annotation boxes transformed global->ego with category_name folded into the coarse DetectionType; the map-expansion vector layers (lane centers via the arcline paths, dividers, crossings, walkways, stop lines, drivable area, intersections) translated to the unified MapElementType in the ego frame and rasterised by HDMapBEVAdapter (when the map-expansion pack is unzipped into <dataroot>/maps/); and the ego trajectory from T_global_from_ego via FuturePastStatesFromMatricesAggregator.

Architecture: NuscTables loads the tables once in the parent and resolves each keyframe into a compact, picklable NuscFrame, so workers ship no table state. Only keyframe sample_data are kept (~410k of 2.6M on trainval), cutting parent RAM + load time. Reuses standard_e2e.utils (se3, quat_wxyz_to_rotmat, transform_points, matrix_to_xyz_heading) + numpy/scipy; the split scene-lists and the lane-arcline discretization are vendored from the devkit (Apache-2.0).

Robust to a partial download: the converter skips scenes whose sensor blob is not on disk yet (verified -- a partial trainval converted 62/700 scenes cleanly). The test split ships no annotations; the 5 radars have no StandardE2E target yet.

Verified on v1.0-mini: box yaw aligns with the lidar (median 14 deg vs PCA major axis), the HD map places the ego inside DRIVABLE_AREA every frame, ~32 frames/s on 8 workers. Adds configs/nuscenes.yaml, extract/prepare scripts, registry wiring, README + docs/datasets.rst, and 34 tests (unit + synthetic-tree + NUSCENES_DATAROOT-gated real-frame checks).
Enable the existing Detections3DBEVAdapter in the nuScenes config on the
same BEV grid as hdmap_bev_adapter, so the detection and HD-map rasters
co-register pixel-for-pixel, and add detections_3d_bev to the saved
features. The vector detections_3d is kept as well, for projecting the
3D boxes into the camera images.
@stepankonev stepankonev force-pushed the stepankonev/nuscenes-impl branch from d62f121 to 7d31c4f Compare June 15, 2026 12:31
@stepankonev stepankonev merged commit 1e61103 into main Jun 15, 2026
3 checks passed
@stepankonev stepankonev mentioned this pull request Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant