Skip to content

[Backport to 20_0_X (#50691)] SoA Schema evolution#51315

Open
Electricks94 wants to merge 10 commits into
cms-sw:CMSSW_20_0_Xfrom
Electricks94:SoASerialisation_200X
Open

[Backport to 20_0_X (#50691)] SoA Schema evolution#51315
Electricks94 wants to merge 10 commits into
cms-sw:CMSSW_20_0_Xfrom
Electricks94:SoASerialisation_200X

Conversation

@Electricks94

Copy link
Copy Markdown
Contributor

PR description:

Extension of the custom streamer of the SoA Backend to encounter for the following cases:

  • Added columns/eigen columns/scalars are initialised to 0
  • Removed or rearranged columns are handle correctly by ROOT
  • columns with changed type (e.g. double to float) are handle by ROOTm this holds also for Eigen columns and scalars
  • Eigen columns with changed dimensions for example Eigen::Matrix<float, 4, 2> -> Eigen::Matrix<float, 3, 2>; result in a meaningfull error at reading
  • Columns with enum types are stored as integers in the SoA backend and only exposed as enums. Hence, the schema evolution behaves exactly like in the case of integers and also 8 byte large enums are supported which is not case with ROOT normally (see Add support for enums with non-default size root-project/root#17009)
  • Examples using ioread rules are provided to showcase how complex cases (e.g. using custom types for SoA columns) can be handled

PR validation:

A set of SoA Layouts is provided which evolve from a base SoA Layout. The base SoA Layout is stored multiple times using a different type alias which is then changed in the code to simulate an evolving class. The created ROOT files are then used to validate the reading of the previously stored Collections.

This PR is an extension of #50487

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

This is a backport of #50691 to 20_0_X

@felicepantaleo fyi

@cmsbuild

cmsbuild commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

A new Pull Request was created by @Electricks94 for CMSSW_20_0_X.

It involves the following packages:

  • DataFormats/Portable (heterogeneous)
  • DataFormats/SoATemplate (heterogeneous)
  • HeterogeneousCore/TestModules (****)

The following packages do not have a category, yet:

HeterogeneousCore/TestModules
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@cmsbuild, @fwyzard, @makortel can you please review it and eventually sign? Thanks.
@makortel, @missirol, @mmusich, @rovere this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild

cmsbuild commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

cms-bot internal usage

@fwyzard

fwyzard commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

type ngt

@fwyzard

fwyzard commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

backport #50691

@fwyzard

fwyzard commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

urgent

It would be good to include this in 20.0.0, as it makes the schema evolution of the SoA dataformats more robust.
So if there any SoA included in the MC production, this would make the samples more forward-compatible.

@fwyzard

fwyzard commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

enable gpu

@fwyzard

fwyzard commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

please test

@fwyzard

fwyzard commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

+heterogeneous

@cmsbuild

Copy link
Copy Markdown
Contributor

-1

Failed Tests: RelVals-AMD_MI300X
Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-13d381/54271/summary.html
COMMIT: f0c1f8e
CMSSW: CMSSW_20_0_X_2026-06-25-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/51315/54271/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed RelVals-AMD_MI300X

  • 34434.40334434.403_TTbar_14TeV+Run4D121_Patatrack_PixelOnlyAlpaka_Validation/step2_TTbar_14TeV+Run4D121_Patatrack_PixelOnlyAlpaka_Validation.log

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 44
  • DQMHistoTests: Total histograms compared: 3248390
  • DQMHistoTests: Total failures: 30
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3248342
  • DQMHistoTests: Total skipped: 18
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 43 files compared)
  • Checked 191 log files, 159 edm output root files, 44 DQM output files
  • TriggerResults: no differences found

AMD_W7900 Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 138 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 163207
  • DQMHistoTests: Total failures: 21196
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 142011
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 25 log files, 20 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

NVIDIA_H100 Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 95 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 163207
  • DQMHistoTests: Total failures: 17399
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 145808
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 25 log files, 20 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

NVIDIA_L40S Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 133 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 163207
  • DQMHistoTests: Total failures: 16408
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 146799
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 25 log files, 20 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

NVIDIA_T4 Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 127 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 163207
  • DQMHistoTests: Total failures: 11630
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 151577
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 25 log files, 20 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

Max Memory Comparisons exceeding threshold NVIDIA_H100

@cms-sw/core-l2 , I found 1 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...
  • Error: Workflow 34434.7503_TTbar_14TeV+Run4D121_HLTHeterogeneousValid step2 max memory diff 273.9 exceeds +/- 30.0 MiB

@fwyzard

fwyzard commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

The MI300X failure is a timeout, let's try again.

@fwyzard

fwyzard commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

please test

@cmsbuild

Copy link
Copy Markdown
Contributor

-1

Failed Tests: amd_w7900UnitTests
Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-13d381/54300/summary.html
COMMIT: f0c1f8e
CMSSW: CMSSW_20_0_X_2026-06-25-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/51315/54300/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 44
  • DQMHistoTests: Total histograms compared: 3248390
  • DQMHistoTests: Total failures: 32
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3248340
  • DQMHistoTests: Total skipped: 18
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 43 files compared)
  • Checked 191 log files, 159 edm output root files, 44 DQM output files
  • TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 147 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 163207
  • DQMHistoTests: Total failures: 13573
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 149634
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 25 log files, 20 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

NVIDIA_H100 Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 79 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 163207
  • DQMHistoTests: Total failures: 15540
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 147667
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 25 log files, 20 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

NVIDIA_L40S Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 168 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 163207
  • DQMHistoTests: Total failures: 17550
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 145657
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 25 log files, 20 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

NVIDIA_T4 Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 206 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 163207
  • DQMHistoTests: Total failures: 23034
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 140173
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 25 log files, 20 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants