Skip to content

Add gaudi2 component for Intel Habana Gaudi2 AI accelerator#545

Open
tokey-tahmid wants to merge 18 commits into
icl-utk-edu:masterfrom
tokey-tahmid:feature/gaudi2-component
Open

Add gaudi2 component for Intel Habana Gaudi2 AI accelerator#545
tokey-tahmid wants to merge 18 commits into
icl-utk-edu:masterfrom
tokey-tahmid:feature/gaudi2-component

Conversation

@tokey-tahmid

@tokey-tahmid tokey-tahmid commented Jan 28, 2026

Copy link
Copy Markdown

Pull Request Description

This PR adds a new component for accessing hardware performance counters on Intel Habana Gaudi2 AI accelerator via the SPMU interface. Currently supports 189 native events covering TPC (81), EDMA (50), MME CTRL (8), and PDMA (50) engines with 6 simultaneous hardware counters per SPMU unit.

Implementation

Uses hlthunk_debug() ioctl interface with:

  • HL_DEBUG_OP_SET_MODE - Enable debug mode
  • HL_DEBUG_OP_SPMU - Configure SPMU event selection
  • HL_DEBUG_OP_READBLOCK - Read counter values

File Description

Component:

  • linux-gaudi2.c - Main component implementation
  • gaudi2_events.h - Event definitions and SPMU base addresses
  • Rules.gaudi2 - Build configuration
  • README.md - Component documentation

Tests (ctypes-based):

  • tests/Makefile - Build system for tests
  • tests/python/run_tests.sh - Test runner script
  • tests/python/test_component_and_events.py - Event enumeration and component availability tests
  • tests/python/test_start_stop_read.py - Basic start/stop/read lifecycle test on TPC events
  • tests/python/test_multidevice.py - Multi-device counter isolation test
  • tests/python/test_mme_events.py - MME CTRL SPMU event test
  • tests/python/test_edma_events.py - EDMA SPMU event test

Tests (cyPAPI-based):

  • tests/cypapi_tests/Makefile - Builds cyPAPI dependency
  • tests/cypapi_tests/run_tests.sh - Test runner script
  • tests/cypapi_tests/test_start_stop_read_cypapi.py - Basic lifecycle test using cyPAPI
  • tests/cypapi_tests/test_multidevice_cypapi.py - Multi-device test using cyPAPI
  • tests/cypapi_tests/test_mme_events_cypapi.py - MME CTRL event test using cyPAPI
  • tests/cypapi_tests/test_edma_events_cypapi.py - EDMA event test using cyPAPI

Testing

Tested on Voyager machine with Gaudi2 (HL-225)

  • All of the following PAPI utilities successfully work with the component:
    • papi_component_avail - ✅
    • papi_avail - ✅
    • papi_native_avail - ✅
    • papi_command_line - ✅
  • Successfully instrumented workloads with the gaudi2 component to test the events and counter values:
    • Python example - ✅
    • Custom TPC kernel - ✅

Author Checklist

  • Description
    Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
  • Commits
    Commits are self contained and only do one thing
    Commits have a header of the form: module: short description
    Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
  • Tests
    The PR needs to pass all the tests

@dbarry9 dbarry9 self-requested a review January 28, 2026 15:59
Comment thread src/components/gaudi2/linux-gaudi2.c Outdated
Comment thread src/components/gaudi2/Rules.gaudi2 Outdated
Comment thread src/components/gaudi2/Rules.gaudi2
Comment thread src/components/gaudi2/linux-gaudi2.c Outdated
Comment thread src/components/gaudi2/linux-gaudi2.c Outdated
Comment thread src/components/gaudi2/linux-gaudi2.c Outdated
Comment thread src/components/gaudi2/linux-gaudi2.c Outdated
Comment thread src/components/gaudi2/linux-gaudi2.c Outdated
Comment thread src/components/gaudi2/linux-gaudi2.c Outdated
Comment thread src/components/gaudi2/linux-gaudi2.c Outdated
@tokey-tahmid tokey-tahmid force-pushed the feature/gaudi2-component branch from 56bcc52 to 8a29c68 Compare February 3, 2026 21:02
Comment thread src/components/gaudi2/linux-gaudi2.c Outdated
Comment thread src/components/gaudi2/linux-gaudi2.c Outdated
Comment thread src/components/gaudi2/linux-gaudi2.c Outdated
Comment thread src/components/gaudi2/linux-gaudi2.c Outdated
Comment thread src/components/gaudi2/linux-gaudi2.c Outdated
Comment thread src/components/gaudi2/linux-gaudi2.c Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants