Skip to content

Adding rocAL+rocJPEG decode performance harness#474

Open
essamROCm wants to merge 10 commits into
ROCm:developfrom
essamROCm:ea/rocjpeg-decode-perf-harness
Open

Adding rocAL+rocJPEG decode performance harness#474
essamROCm wants to merge 10 commits into
ROCm:developfrom
essamROCm:ea/rocjpeg-decode-perf-harness

Conversation

@essamROCm

@essamROCm essamROCm commented May 21, 2026

Copy link
Copy Markdown
Contributor

Motivation

This PR adds a rocAL-focused performance harness for validating and measuring rocJPEG-backed image decode behavior, especially for multi-GPU sharded decode workloads.

The main goals are:

  • Improve rocJPEG usage inside rocAL by allowing the four-thread decode path to use up to four dedicated rocJPEG decoder instances, each handling a sub-batch, instead of funneling the full batch through a single shared decoder instance.
  • Add a repeatable way to compare rocAL + rocJPEG decode performance with the dedicated rocJPEG OpenMP split path enabled and disabled.
  • Add C++ and Python benchmark coverage for the same decode scenario so changes can be checked from both rocAL API surfaces.
  • Add reporting scripts that summarize per-GPU/per-shard decode timing, decoded image counts, and speedup/reduction metrics.
  • Add a standalone helper launcher for running jpegdecodeperf across GPU shards, making it easier to compare rocAL decode results against rocJPEG sample-level performance.
  • Update the existing dataloader_multithread test app so it can drive the new rocJPEG split-path benchmarking with configurable CPU thread count and effective batch sizing.

Technical Details

This PR adds the main rocAL decode enhancement being measured by the new harness: rocJPEG decode work can now be split across multiple dedicated rocJPEG decoder instances instead of sending the whole batch through one shared rocJPEG decoder.

The rocJPEG dedicated OpenMP split path is enabled by default. With the default behavior, rocAL creates up to four rocJPEG decoder instances, bounded by the configured CPU thread count and batch size. The input batch is divided into per-decoder sub-batches, and OpenMP dispatches those sub-batches across the dedicated decoder workers. This allows the benchmark configuration of four CPU threads to use four rocJPEG decoder instances, reducing contention around one decoder and improving decode throughput for sharded/multi-GPU image loading workloads.

To compare against the previous behavior, set ROCAL_ROCJPEG_DEDICATED_OMP_SPLIT=0. In that mode, rocAL keeps the previous single-decoder rocJPEG path, so the benchmark scripts can compare the old and new behavior directly.

This PR adds tests/cpp_api/rocjpeg_decode_perf/ as a manual performance harness for the rocJPEG split-decoder change. These scripts are not regular CTest unit tests; they are intended for explicit developer/reviewer runs on systems with a suitable dataset and GPU configuration.

The harness is needed because the change is performance-sensitive. A correctness-only test would not show whether splitting rocJPEG work across multiple decoder instances improves decode time or whether ON/OFF behavior remains comparable.

The harness provides:

  • A C++ rocAL benchmark path using dataloader_multithread.
  • A Python rocAL benchmark path using fn.readers.file and fn.decoders.image.
  • ON/OFF comparison using ROCAL_ROCJPEG_DEDICATED_OMP_SPLIT=0 and =1.
  • TurboJPEG comparison runs.
  • Per-GPU/per-shard decoded image counts and decode timing summaries.
  • A standalone sharded jpegdecodeperf launcher for rocJPEG sample-level comparison.
  • Reporting scripts to produce PR-friendly summaries of speedup and decode-time reduction.

This PR adds a new manual benchmark/support folder:

tests/cpp_api/rocjpeg_decode_perf/

New files:

tests/cpp_api/rocjpeg_decode_perf/README.md
tests/cpp_api/rocjpeg_decode_perf/perf_sharded_launcher.cpp
tests/cpp_api/rocjpeg_decode_perf/reporting_perf_sharded_results.sh
tests/cpp_api/rocjpeg_decode_perf/reporting_test_results.sh
tests/cpp_api/rocjpeg_decode_perf/rocal_decode_call_bench.py
tests/cpp_api/rocjpeg_decode_perf/run_tests_twice_solution_on_off.sh

The new README.md documents the benchmark purpose, required environment variables, common workflow, log locations, and example commands for PR reviewers or developers running the tests manually.

run_tests_twice_solution_on_off.sh is the main rocAL comparison driver. It runs six benchmark cases:

C++ rocAL + rocJPEG, split solution OFF
C++ rocAL + rocJPEG, split solution ON
C++ rocAL + TurboJPEG
Python rocAL + rocJPEG, split solution OFF
Python rocAL + rocJPEG, split solution ON
Python rocAL + TurboJPEG

The script uses:

ROCAL_ROCJPEG_DEDICATED_OMP_SPLIT=0
ROCAL_ROCJPEG_DEDICATED_OMP_SPLIT=1

to toggle the dedicated split path for rocJPEG runs. It writes logs under configurable LOG_DIR, defaulting to:

/tmp/rocjpeg_decode_perf

The script requires only the machine-specific inputs to be exported:

DATASET
ROCAL_CPP_BIN

and supports optional:

DATASET_LABEL
GPU_COUNT
ROCM_PATH
LOG_DIR
WORKSPACE
ROCAL_PY_BENCH
ROCJPEG_DECODER_CREATE_LOG

reporting_test_results.sh parses the logs produced by run_tests_twice_solution_on_off.sh. It summarizes:

  • decoded image counts for C++ and Python runs
  • per-GPU/per-shard decode times
  • average decode time for each mode
  • rocJPEG split-path decode-time reduction
  • rocJPEG split-path speedup

rocal_decode_call_bench.py is a Python rocAL decode benchmark. It builds a simple readers.file + decoders.image pipeline and supports:

  • CPU or GPU decode mode
  • configurable batch size
  • configurable CPU thread count
  • configurable device id
  • configurable number of GPUs
  • configurable number of shards
  • multi-process shard execution for multi-GPU tests
  • rocAL internal timing extraction through pipe.timing_info()

perf_sharded_launcher.cpp is a standalone helper launcher for rocJPEG jpegdecodeperf. It:

  • recursively scans a dataset for .jpg and .jpeg files
  • creates per-GPU shard directories using symlinks
  • launches one jpegdecodeperf process per GPU
  • writes per-GPU logs as jpegdecodeperf_gpu<N>.log
  • supports configurable batch size, thread count, output format, shard work directory, and log directory

reporting_perf_sharded_results.sh parses the logs produced by perf_sharded_launcher.cpp. It reports:

  • per-GPU decoded image counts
  • per-GPU decode time
  • total decoded image count
  • average decode time across GPU shards
  • wall/max decode time across GPU shards

Changes in:

rocAL/include/loaders/image/image_read_and_decode.h
rocAL/source/loaders/image/image_read_and_decode.cpp

include:

  • Add a vector of rocJPEG decoder instances for split execution.
  • Add per-decoder sub-batch size tracking.
  • Add _use_rocjpeg_dedicated_omp_split, controlled by ROCAL_ROCJPEG_DEDICATED_OMP_SPLIT.
  • Initialize multiple rocJPEG decoder instances when the split path is enabled.
  • Split the rocJPEG batch across up to four decoder workers, bounded by batch size and configured CPU thread count.
  • Run rocJPEG decode-info/decode-batch work across decoder shards using OpenMP.
  • Preserve the existing single-decoder rocJPEG path when the split path is disabled.

This PR also updates:

tests/cpp_api/dataloader_multithread/dataloader_multithread.cpp

to support this benchmark path by:

  • accepting an additional cpu_thread_count argument
  • passing cpu_thread_count into rocalCreate
  • detecting ROCAL_ROCJPEG_DEDICATED_OMP_SPLIT
  • increasing the effective rocAL batch size for rocJPEG split-path tests
  • avoiding unnecessary output copies when display is disabled
  • resizing image name buffers and display buffers according to effective batch size
  • updating usage text to document rocJPEG mode and CPU thread count

Test Plan

Lightweight validation was run after applying the changes:

  1. Verified shell syntax for all new scripts:
for f in tests/cpp_api/rocjpeg_decode_perf/*.sh; do
  bash -n "$f" || exit 1
done
  1. Verified the helper C++ launcher compiles independently:
g++ -std=c++17 -Wall -Wextra -pedantic \
  tests/cpp_api/rocjpeg_decode_perf/perf_sharded_launcher.cpp \
  -o /tmp/perf_sharded_launcher_check
  1. Verified the Python benchmark script compiles:
python3 -m py_compile tests/cpp_api/rocjpeg_decode_perf/rocal_decode_call_bench.py

Manual benchmark workflow documented in the README:

export WORKSPACE=/path/to/rocAL/tests/cpp_api/rocjpeg_decode_perf
cd "$WORKSPACE"

export DATASET=/path/to/image_dataset
export DATASET_LABEL=my_dataset
export ROCM_PATH=/opt/rocm
export LOG_DIR=/tmp/rocjpeg_decode_perf
export ROCAL_CPP_BIN=/path/to/dataloader_multithread

./run_tests_twice_solution_on_off.sh 1
./reporting_test_results.sh 1

Standalone jpegdecodeperf workflow documented in the README:

g++ -std=c++17 -O2 -Wall perf_sharded_launcher.cpp -o perf_sharded_launcher

./perf_sharded_launcher \
  "$DATASET" \
  1 \
  /path/to/jpegdecodeperf \
  32 \
  4 \
  rgb \
  "$LOG_DIR/shards" \
  "$LOG_DIR"

./reporting_perf_sharded_results.sh 1

Test Result

The following local checks passed:

  • Shell syntax validation passed for:
    • reporting_perf_sharded_results.sh
    • reporting_test_results.sh
    • run_tests_twice_solution_on_off.sh
  • perf_sharded_launcher.cpp compiled successfully with g++ -std=c++17 -Wall -Wextra -pedantic.
  • rocal_decode_call_bench.py passed Python bytecode compilation with python3 -m py_compile.

The benchmark harness was added specifically to report the effect of this change by comparing:

ROCAL_ROCJPEG_DEDICATED_OMP_SPLIT=0

against:

ROCAL_ROCJPEG_DEDICATED_OMP_SPLIT=1

for both C++ and Python rocAL decode paths.

No full hardware benchmark results are included in this PR note because the added harness is intended to support manual performance validation on systems with the target ROCm/rocAL/rocJPEG installation, dataset, and GPU configuration.

@essamROCm essamROCm self-assigned this May 21, 2026
@essamROCm essamROCm added enhancement New feature or request ci:precheckin labels May 21, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a manual performance harness for measuring rocAL image decode throughput (rocJPEG vs TurboJPEG) in multi-GPU sharded workloads, and updates rocAL’s rocJPEG path to optionally split decode work across multiple dedicated rocJPEG decoder instances.

Changes:

  • Add a new manual benchmark folder (tests/cpp_api/rocjpeg_decode_perf/) with C++/Python runners and reporting scripts for repeatable on/off comparisons.
  • Enhance rocAL’s rocJPEG decode implementation to optionally shard a batch across up to 4 dedicated rocJPEG decoder instances using OpenMP.
  • Update dataloader_multithread to support configurable CPU thread count and an “effective batch size” for rocJPEG split-path benchmarking.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/cpp_api/rocjpeg_decode_perf/run_tests_twice_solution_on_off.sh Driver script to run C++/Python benchmarks with rocJPEG split on/off plus TurboJPEG baselines.
tests/cpp_api/rocjpeg_decode_perf/rocal_decode_call_bench.py Python rocAL decode benchmark with optional multi-process shard execution and timing extraction.
tests/cpp_api/rocjpeg_decode_perf/reporting_test_results.sh Parses rocAL C++/Python logs and summarizes per-shard decode times and computed speedups.
tests/cpp_api/rocjpeg_decode_perf/reporting_perf_sharded_results.sh Parses sharded jpegdecodeperf logs and summarizes per-GPU decode results.
tests/cpp_api/rocjpeg_decode_perf/README.md Documents benchmark purpose, required env, workflows, and log/report generation.
tests/cpp_api/rocjpeg_decode_perf/perf_sharded_launcher.cpp C++ helper to shard a dataset via symlinks and launch jpegdecodeperf per GPU with logs.
tests/cpp_api/dataloader_multithread/dataloader_multithread.cpp Adds CPU thread-count arg and adjusts effective batch sizing/output handling for rocJPEG split benchmarking.
rocAL/source/loaders/image/image_read_and_decode.cpp Implements optional rocJPEG dedicated OpenMP split path with multiple decoder instances and per-shard decode.
rocAL/include/loaders/image/image_read_and_decode.h Adds state for multiple rocJPEG decoders, sub-batch sizes, and split toggle flag.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/cpp_api/rocjpeg_decode_perf/perf_sharded_launcher.cpp Outdated
Comment thread tests/cpp_api/rocjpeg_decode_perf/perf_sharded_launcher.cpp Outdated
Comment thread tests/cpp_api/dataloader_multithread/dataloader_multithread.cpp
Comment thread tests/cpp_api/rocjpeg_decode_perf/rocal_decode_call_bench.py Outdated
Comment thread tests/testScripts/rocal_decode_call_bench.py
Comment thread tests/testScripts/rocal_decode_call_bench.py
@LakshmiKumar23 LakshmiKumar23 self-requested a review May 26, 2026 17:43
Harden the sharded launcher work directory cleanup, make symlink names collision-resistant, validate dataloader CPU thread count, improve Python decoded image accounting, and use spawn for multi-shard benchmark workers.
Comment thread rocAL/source/loaders/image/image_read_and_decode.cpp Outdated
Comment thread rocAL/source/loaders/image/image_read_and_decode.cpp Outdated
Comment thread rocAL/source/loaders/image/image_read_and_decode.cpp Outdated
Comment thread rocAL/source/loaders/image/image_read_and_decode.cpp Outdated
Comment thread rocAL/source/loaders/image/image_read_and_decode.cpp
@essamROCm essamROCm requested a review from rrawther May 26, 2026 21:35
Comment thread rocAL/source/loaders/image/image_read_and_decode.cpp Outdated
@essamROCm

essamROCm commented May 26, 2026

Copy link
Copy Markdown
Contributor Author

Latest Code Change Test Summary

Test Configuration

AMD-SMI 26.2.2+671d39a71e
amdgpu version: 6.18.2
ROCm version: 7.2.2
VBIOS version: 00102431
Platform: Linux
AMD Instinct MI300X - SPX/NPS1

Dataset: ImageNet 5 Classes
Total Images : 19,500
GPU Count : 8


Decoded Image Count

Test Configuration Images Decoded
C++ rocAL + rocJPEG OFF 19,504
C++ rocAL + rocJPEG ON 19,504
C++ rocAL + TurboJPEG 19,504
PY rocAL + rocJPEG OFF 19,500
PY rocAL + rocJPEG ON 19,500
PY rocAL + TurboJPEG 19,500

C++ rocAL Sample Decode-Time Results

Mode Average decode time
rocAL + rocJPEG OFF 1.504794 seconds
rocAL + rocJPEG ON 0.651028 seconds
rocAL + TurboJPEG 1.986759 seconds

Python rocAL Benchmark Decode-Time Results

Mode Average decode time
rocAL + rocJPEG OFF 1.265821 seconds
rocAL + rocJPEG ON 0.616735 seconds
rocAL + TurboJPEG 2.052351 seconds

rocAL Patch Solution Enhancement

Path Decode-time reduction Speedup
C++ rocAL + rocJPEG 56.74% 2.31x
Python rocAL + rocJPEG 51.28% 2.05x

Comment thread rocAL/source/loaders/image/image_read_and_decode.cpp Outdated
Comment thread tests/cpp_api/rocjpeg_decode_perf/README.md Outdated

@rrawther rrawther left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please address review comments

@essamROCm

Copy link
Copy Markdown
Contributor Author

The Added Test Script Files

These files are not intended to be regular correctness/unit tests. They are a manual performance harness for validating the rocJPEG split-decoder change in rocAL.

The rocAL code change affects how rocJPEG decode work is scheduled internally: instead of using one rocJPEG decoder instance for the full batch, the split path can use multiple dedicated rocJPEG decoder instances and divide the batch across them. To validate that type of change, we need more than a normal pass/fail test; we need a repeatable way to compare decode timing with the split path ON and OFF across C++ and Python rocAL entry points.

The scripts are organized as follows:

  • run_tests_twice_solution_on_off.sh: Main driver. Runs rocAL C++ and Python decode benchmarks with ROCAL_ROCJPEG_DEDICATED_OMP_SPLIT=0 and =1, plus TurboJPEG comparison runs.
  • reporting_test_results.sh: Parses those logs and summarizes decoded image counts, per-shard/per-GPU decode times, average decode time, and ON-vs-OFF speedup/reduction.
  • rocal_decode_call_bench.py: Python rocAL decode benchmark equivalent to the C++ sample path, used to validate the Python API path.
  • perf_sharded_launcher.cpp: Helper to shard a dataset and launch rocJPEG jpegdecodeperf once per GPU. This is for comparing against the rocJPEG sample-level benchmark.
  • reporting_perf_sharded_results.sh: Summarizes the jpegdecodeperf per-GPU logs.

This folder gives developers a reproducible workflow for answering:

  1. Does the old single-decoder path still work?
  2. Does the new split-decoder path work?
  3. How many images were decoded in each mode?
  4. What is the per-GPU/per-shard decode time?
  5. What speedup or reduction does the split path provide?
  6. Does behavior look consistent through both C++ rocAL and Python rocAL APIs?

This is why the folder is under tests/cpp_api: it depends on the existing dataloader_multithread C++ sample/test and is intended for manual performance validation, not automatic CTest execution.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Comment thread tests/cpp_api/rocjpeg_decode_perf/perf_sharded_launcher.cpp Outdated
Comment thread rocAL/source/loaders/image/image_read_and_decode.cpp Outdated
Comment thread tests/cpp_api/rocjpeg_decode_perf/rocal_decode_call_bench.py Outdated
Comment thread tests/cpp_api/dataloader_multithread/dataloader_multithread.cpp Outdated
Comment thread tests/cpp_api/dataloader_multithread/dataloader_multithread.cpp Outdated
@codecov

codecov Bot commented May 28, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 66.66667% with 37 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...cAL/source/loaders/image/image_read_and_decode.cpp 66.67% 37 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #474      +/-   ##
===========================================
- Coverage    75.68%   75.67%   -0.01%     
===========================================
  Files          318      318              
  Lines        26289    26358      +69     
===========================================
+ Hits         19895    19944      +49     
- Misses        6394     6414      +20     
Files with missing lines Coverage Δ
...ocAL/include/loaders/image/image_read_and_decode.h 80.00% <ø> (ø)
...cAL/source/loaders/image/image_read_and_decode.cpp 80.36% <66.67%> (-2.00%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread tests/cpp_api/dataloader_multithread/dataloader_multithread.cpp Outdated
Comment thread tests/cpp_api/rocjpeg_decode_perf/README.md Outdated
Comment thread tests/cpp_api/rocjpeg_decode_perf/reporting_perf_sharded_results.sh Outdated
Comment thread tests/cpp_api/rocjpeg_decode_perf/perf_sharded_launcher.cpp Outdated
Comment thread tests/testScripts/rocal_decode_call_bench.py
@LakshmiKumar23

LakshmiKumar23 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Verified the benchmark scripts in the PR with ROCAL_ROCJPEG_DEDICATED_OMP_SPLIT=0/1. @essamROCm let's get rid of the OMP_SPLIT=0 option since that was mainly for comparison and testing. We can keep 1 as default and don't need to check the env variable etc. You can clean up the PR while removing the 3 files mentioned.

lakshmi@ctr-s95-mi300x-3:~/rocAL/tests/cpp_api/rocjpeg_decode_perf$ export ROCAL_ROCJPEG_DEDICATED_OMP_SPLIT=1
lakshmi@ctr-s95-mi300x-3:~/rocAL/tests/cpp_api/rocjpeg_decode_perf$ python3 rocal_decode_call_bench.py --path ~/rocAL/data/images/AMD-tinyDataSet/ --device gpu --num-threads 4 --num-gpus 1
OK: OpenVX using GPU device - 0: AMD Instinct MI300X [gfx942:sramecc+:xnack-] with 304 CUs on PCI bus 05:00.0

OK: loaded 79 kernels from libvx_rpp.so
Pipeline has been created succesfully
Requested batch size: 32
Effective rocAL batch size: 128
GPU/device id: 0
Shard id: 0
Num shards: 1
Decoding started with 4 threads, please wait!
Total decoded images: 256
Total processing time (sec): 0.110670
Average processing time per image (ms): 0.432305
Average decoded images per sec (Images/Sec): 2313.18
rocAL internal decode time (sec): 0.055007
rocAL internal load time (sec): 0.147558
rocAL internal process time (sec): 0.000601
Decoding completed!
lakshmi@ctr-s95-mi300x-3:~/rocAL/tests/cpp_api/rocjpeg_decode_perf$ export ROCAL_ROCJPEG_DEDICATED_OMP_SPLIT=0
lakshmi@ctr-s95-mi300x-3:~/rocAL/tests/cpp_api/rocjpeg_decode_perf$ python3 rocal_decode_call_bench.py --path ~/rocAL/data/images/AMD-tinyDataSet/ --device gpu --num-threads 4 --num-gpus 1
OK: OpenVX using GPU device - 0: AMD Instinct MI300X [gfx942:sramecc+:xnack-] with 304 CUs on PCI bus 05:00.0

OK: loaded 79 kernels from libvx_rpp.so
Pipeline has been created succesfully
Requested batch size: 32
GPU/device id: 0
Shard id: 0
Num shards: 1
Decoding started with 4 threads, please wait!
Total decoded images: 256
Total processing time (sec): 0.162205
Average processing time per image (ms): 0.633612
Average decoded images per sec (Images/Sec): 1578.25
rocAL internal decode time (sec): 0.047525
rocAL internal load time (sec): 0.180833
rocAL internal process time (sec): 0.000694
Decoding completed!

@essamROCm

Copy link
Copy Markdown
Contributor Author

Verified the benchmark scripts in the PR with ROCAL_ROCJPEG_DEDICATED_OMP_SPLIT=0/1. @essamROCm let's get rid of the OMP_SPLIT=0 option since that was mainly for comparison and testing. We can keep 1 as default and don't need to check the env variable etc. You can clean up the PR while removing the 3 files mentioned.

Addressed in this commit: e20b8d0

Updated as requested. I removed the ROCAL_ROCJPEG_DEDICATED_OMP_SPLIT option and the old single-decoder comparison path, so the rocJPEG split-decoder path is now the default rocJPEG behavior without checking an environment variable.

@essamROCm

Copy link
Copy Markdown
Contributor Author

Commit: e20b8d0 Validation

Verified the changes/cleanup with:

bash -n tests/cpp_api/rocjpeg_decode_perf/run_tests_twice_solution_on_off.sh
python3 -m py_compile tests/cpp_api/rocjpeg_decode_perf/rocal_decode_call_bench.py
cmake --build build --target rocal -j$(nproc)

Also built the dataloader_multithread sample separately to confirm the sample changes compile.

@essamROCm essamROCm marked this pull request as ready for review June 3, 2026 00:57
@essamROCm essamROCm requested review from a team and kiritigowda as code owners June 3, 2026 00:57
Comment thread tests/testScripts/run_dataloader_multithread.sh
Comment thread tests/testScripts/README.md

@spolifroni-amd spolifroni-amd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a changelog entry.

@LakshmiKumar23 LakshmiKumar23 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not effect the TurboJPEG workflow for trainings and improves only decoding time with rocJPEG.

@essamROCm essamROCm requested a review from rrawther June 8, 2026 14:27
@essamROCm

Copy link
Copy Markdown
Contributor Author

This requires a changelog entry.

The changelog file changes done in the latest commit.

@essamROCm essamROCm dismissed spolifroni-amd’s stale review June 9, 2026 18:33

Changes are made on this request, the review is showing a way to resolve conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:precheckin enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants