Skip to content

[Deepin-Kernel-SIG] [linux 6.18.y] [FROMLIST] [Intel] thermal: intel: Add support for Directed Package Thermal Interrupt#1797

Open
Avenger-285714 wants to merge 5 commits into
deepin-community:linux-6.18.yfrom
Avenger-285714:DPTI-6.18
Open

[Deepin-Kernel-SIG] [linux 6.18.y] [FROMLIST] [Intel] thermal: intel: Add support for Directed Package Thermal Interrupt#1797
Avenger-285714 wants to merge 5 commits into
deepin-community:linux-6.18.yfrom
Avenger-285714:DPTI-6.18

Conversation

@Avenger-285714
Copy link
Copy Markdown
Member

@Avenger-285714 Avenger-285714 commented Jun 2, 2026

Hi,

This is v2 of this series. The main changes are a new patch fixing a pre-
existing bug and a redesigned error rollback strategy that disables
directed thermal interrupts across all packages when enabling fails in any
one of them. Please see the changelog for details.

Package-level thermal interrupts are currently broadcast to all CPUs in a
package. Only one CPU is needed to service package-wide events.
Broadcasting creates unnecessary resource contention. Thermal interrupts
generated for Hardware Feedback Interface[1] updates are an example: all
CPUs in the package receive the interrupt and race for a lock to update a
shared data structure. Idle CPUs are needlessly woken up.

Newer Intel processors allow directing package-level thermal interrupts
only to CPUs that explicitly request them. A CPU opts in by setting a
designated bit in IA32_THERM_INTERRUPT. Hardware acknowledges the request
by setting a designated bit in IA32_PACKAGE_THERM_STATUS.

This series enables directed package-level thermal interrupts and
designates one handler CPU per package using the CPU hotplug
infrastructure. A new CPU is selected if the handler CPU goes offline.

Because CPU0's hotplug callbacks are not invoked during suspend and resume,
syscore callbacks are added to restore the handler for the boot package.
The series also disables directed delivery during kexec reboot, avoiding
stale interrupt routing when rebooting into a kernel that does not support
the feature.

This patchset introduces a change in behavior in the /sys/devices/system/
cpu/cpuN/thermal_throttle/package* sysfs files. These files reflect per-CPU
variables updated when a CPU handles a package-level thermal interrupt. In
broadcast mode, all CPUs update their variables. When directed package-
level thermal interrupts are enabled, only the handler CPU's variables are
updated.

Lastly, nothing changes for processors that do not support this feature:
they fall back to broadcast delivery.

[1] Intel Software Developer's Manual Vol. 3, Section 17.6, March 2026 https://www.intel.com/SDM

Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com

Changes in v2:

  • Fixed an existing bug in the error-handling of thermal_throttle_online()
    CPU hotplug callback.
  • Redesigned the rollback mechanism to handle all packages, not only
    the boot package.
  • Fixed the handling of the return value of cpumask_any_but(), which on
    failure returns small_cpumask_bits, not nr_cpu_ids.
  • Renamed the Directed Package Thermal Interrupt CPUID and MSR bits for
    consistency and brevity. (Boris)
  • Added measurements of the latency of setup acknowledgment from hardware.
  • Removed an unused argument from directed_thermal_pkg_intr_supported().
  • Reused the global rollback mechanism in the syscore shutdown callback.
  • Link to v1: https://lore.kernel.org/r/20260309-rneri-directed-therm-intr-v1-0-2956e3000950@linux.intel.com

Ricardo Neri (5):
thermal: intel: Fix dangling resources on thermal_throttle_online() failure
x86/thermal: Add bit definitions for Intel Directed Package Thermal Interrupt
thermal: intel: Enable the Directed Package-level Thermal Interrupt
thermal: intel: Add syscore callbacks for suspend and resume
thermal: intel: Add a syscore shutdown callback for kexec reboot

arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/msr-index.h | 2 +
drivers/thermal/intel/therm_throt.c | 272 +++++++++++++++++++++++++++++++++++-
3 files changed, 272 insertions(+), 4 deletions(-)

base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d
change-id: 20260306-rneri-directed-therm-intr-9f3f8888bb3f

Best regards,

Ricardo Neri ricardo.neri-calderon@linux.intel.com
Link: https://lore.kernel.org/linux-pm/20260528-rneri-directed-therm-intr-v2-0-8e2f9e0c1a36@linux.intel.com/

Summary by Sourcery

Add support for Intel Directed Package Thermal Interrupts and fix thermal throttle CPU hotplug error handling.

New Features:

  • Introduce per-package handler CPU support for Intel Directed Package Thermal Interrupts, enabling directed rather than broadcast delivery of package thermal events.
  • Add syscore-based suspend, resume, and shutdown handling to correctly manage directed package thermal interrupt routing across power management and kexec reboot.

Bug Fixes:

  • Fix thermal_throttle_online() hotplug callback to avoid leaving partially initialized per-CPU thermal devices on error paths.
  • Ensure global rollback of directed package thermal interrupt configuration when initialization or per-package enabling fails.

Enhancements:

  • Extend x86 CPU feature and MSR definitions to expose Intel Directed Package Thermal Interrupt capability bits and acknowledgment status.
  • Improve thermal throttling initialization to cleanly tear down directed interrupt state if CPU hotplug state setup fails.

ricardon added 5 commits June 2, 2026 20:15
…ailure

The function thermal_throttle_add_dev() may fail and abort a CPU hotplug
online operation. Since the failure occurs within the online callback,
thermal_throttle_online(), the CPU hotplug framework does not invoke the
corresponding offline callback. As a result, the hardware and software
resources set up during the failed operation are not torn down.

Since only thermal_throttle_add_dev() can fail, call it before setting up
the rest of the resources.

Fixes: f665620 ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lore.kernel.org/linux-pm/20260528-rneri-directed-therm-intr-v2-1-8e2f9e0c1a36@linux.intel.com/
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
…nterrupt

Add CPUID and MSR bit definitions required to support Intel Directed
Package Thermal Interrupt.

A CPU requests directed package-level thermal interrupts by setting bit 25
in IA32_THERM_INTERRUPT. Hardware acknowledges by setting bit 25 in
IA32_PACKAGE_THERM_STATUS, indicating that only CPUs that opted in will
receive the interrupt. If no CPU in the package requests it, delivery
falls back to broadcast.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
[WangYuli: Fix conflicts]
Link: https://lore.kernel.org/linux-pm/20260528-rneri-directed-therm-intr-v2-2-8e2f9e0c1a36@linux.intel.com/
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
Package-level thermal interrupts are broadcast to all online CPUs within a
package, even though only one CPU needs to service them. This results in
unnecessary wakeups, lock contention, and corresponding performance and
power-efficiency penalties.

When supported by hardware, a CPU requests to receive directed package-
level thermal interrupts by setting a designated bit in
IA32_THERM_INTERRUPT. The operating system must then verify that hardware
has acknowledged this request by checking a designated bit in
IA32_PACKAGE_THERM_STATUS.

Enable directed package-level thermal interrupts on one CPU per package
using the CPU hotplug infrastructure. Keep track of the CPUs handling
package-level interrupts with an array.

If the handling CPU goes offline, select a new CPU. Temporarily enable
directed interrupts on both the current and new CPU until hardware
acknowledges the new selection, then disable them on the outgoing CPU.

Systems without directed-interrupt support continue to broadcast the
package-level interrupt to all CPUs.

Also, add a rollback mechanism in the CPU hotplug online callback to
fall back to broadcast mode if the directed-interrupt acknowledgment fails
in any package. This is most important during boot, when all CPUs in a
package come online and would otherwise keep retrying on faulty hardware.
A complete rollback is not needed in the CPU hotplug offline callback since
at that point the hardware is known to work.

While here, update an inline comment to point to the correct volume of the
Intel Software Developer's Manual.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lore.kernel.org/linux-pm/20260528-rneri-directed-therm-intr-v2-3-8e2f9e0c1a36@linux.intel.com/
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
Directed package-level thermal interrupts are serviced by a single CPU per
package. These handler CPUs are selected at boot through the CPU hotplug
infrastructure. This mechanism is sufficient to restore the directed
interrupt configuration when resuming from suspend for non-boot packages.
It also keeps the handler-tracking array updated.

For the boot package, CPU0 is chosen during boot because its CPU hotplug
online callback runs first. However, this callback is not invoked on
resume. The directed package-level interrupt configuration for the boot
package is not restored. Add a syscore resume callback to re-enable
directed package-level interrupts for this package.

Disabling directed interrupts during suspend is required to keep the
handler-tracking array in a consistent state for the boot package,
allowing the correct configuration to be restored on resume.

The resume callback must busy-wait for hardware acknowledgment of the
directed interrupt setup. Otherwise, the handler-tracking array could be
left in an inconsistent state. This implies running with interrupts
disabled for up to 15ms, though in practice it takes less than 1ms.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lore.kernel.org/linux-pm/20260528-rneri-directed-therm-intr-v2-4-8e2f9e0c1a36@linux.intel.com/
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
A kexec reboot may load a kernel that does not support directed package-
level thermal interrupts. Without a shutdown callback, the directed
interrupt configuration remains enabled across kexec but will not be
handled correctly. In particular, if the CPU designated to receive the
directed interrupt goes offline, no other CPU in the package will receive
it.

Add a syscore shutdown callback to disable directed package-level thermal
interrupts on all packages before a kexec reboot. If the post-kexec kernel
does not enable directed interrupts, it falls back to broadcasting the
interrupt to all CPUs.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lore.kernel.org/linux-pm/20260528-rneri-directed-therm-intr-v2-5-8e2f9e0c1a36@linux.intel.com/
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
@Avenger-285714 Avenger-285714 requested review from Copilot and opsiff June 2, 2026 12:18
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Jun 2, 2026

Reviewer's Guide

Adds support for Intel Directed Package Thermal Interrupts to the x86 thermal throttling path, including per‑package handler CPU selection, hotplug- and syscore-aware enable/disable/rollback logic, and the necessary CPUID/MSR feature bits, while also fixing an existing hotplug error-handling bug.

Sequence diagram for enabling directed package thermal interrupt on CPU online

sequenceDiagram
    participant CPU_hotplug
    participant thermal_throttle_online
    participant enable_directed_thermal_pkg_intr
    participant config_directed_thermal_pkg_intr
    participant check_directed_thermal_pkg_intr_ack
    participant disable_all_directed_thermal_pkg_intr
    participant Hardware

    CPU_hotplug->>thermal_throttle_online: thermal_throttle_online(cpu)
    thermal_throttle_online->>thermal_throttle_online: thermal_throttle_add_dev(dev, cpu)
    thermal_throttle_online->>enable_directed_thermal_pkg_intr: enable_directed_thermal_pkg_intr(cpu)
    enable_directed_thermal_pkg_intr->>enable_directed_thermal_pkg_intr: topology_logical_package_id(cpu)
    enable_directed_thermal_pkg_intr->>enable_directed_thermal_pkg_intr: directed_intr_handler_cpus[pkg_id]
    alt first_handler_in_package
        enable_directed_thermal_pkg_intr->>Hardware: thermal_clear_package_intr_status(PACKAGE_LEVEL, PACKAGE_THERM_STATUS_DPTI_ACK)
        enable_directed_thermal_pkg_intr->>config_directed_thermal_pkg_intr: config_directed_thermal_pkg_intr(&enable=true)
        config_directed_thermal_pkg_intr->>Hardware: wrmsrl(MSR_IA32_THERM_INTERRUPT, THERM_INT_DPTI_ENABLE)
        enable_directed_thermal_pkg_intr->>check_directed_thermal_pkg_intr_ack: check_directed_thermal_pkg_intr_ack()
        check_directed_thermal_pkg_intr_ack->>Hardware: rdmsrl(MSR_IA32_PACKAGE_THERM_STATUS)
        alt ack_received
            check_directed_thermal_pkg_intr_ack-->>enable_directed_thermal_pkg_intr: return 0
            enable_directed_thermal_pkg_intr->>enable_directed_thermal_pkg_intr: directed_intr_handler_cpus[pkg_id] = cpu
        else ack_timeout
            check_directed_thermal_pkg_intr_ack-->>enable_directed_thermal_pkg_intr: return -ETIMEDOUT
            enable_directed_thermal_pkg_intr->>config_directed_thermal_pkg_intr: config_directed_thermal_pkg_intr(&enable=false)
            config_directed_thermal_pkg_intr->>Hardware: wrmsrl(MSR_IA32_THERM_INTERRUPT, ~THERM_INT_DPTI_ENABLE)
            enable_directed_thermal_pkg_intr->>disable_all_directed_thermal_pkg_intr: disable_all_directed_thermal_pkg_intr()
            disable_all_directed_thermal_pkg_intr->>config_directed_thermal_pkg_intr: smp_call_function_single(handler_cpu, config_directed_thermal_pkg_intr, &enable=false, wait=true)
            disable_all_directed_thermal_pkg_intr->>disable_all_directed_thermal_pkg_intr: kfree(directed_intr_handler_cpus)
        end
    else handler_already_set
        enable_directed_thermal_pkg_intr-->>thermal_throttle_online: return
    end
    thermal_throttle_online-->>CPU_hotplug: return
Loading

Sequence diagram for syscore suspend/resume/shutdown handling of directed package thermal interrupts

sequenceDiagram
    participant Syscore
    participant directed_pkg_intr_syscore_suspend
    participant directed_pkg_intr_syscore_resume
    participant directed_pkg_intr_syscore_shutdown
    participant enable_directed_thermal_pkg_intr
    participant disable_directed_thermal_pkg_intr
    participant disable_all_directed_thermal_pkg_intr

    Syscore->>directed_pkg_intr_syscore_suspend: directed_pkg_intr_syscore_suspend(data)
    directed_pkg_intr_syscore_suspend->>disable_directed_thermal_pkg_intr: disable_directed_thermal_pkg_intr(0)
    disable_directed_thermal_pkg_intr-->>directed_pkg_intr_syscore_suspend: return
    directed_pkg_intr_syscore_suspend-->>Syscore: return 0

    Syscore->>directed_pkg_intr_syscore_resume: directed_pkg_intr_syscore_resume(data)
    directed_pkg_intr_syscore_resume->>enable_directed_thermal_pkg_intr: enable_directed_thermal_pkg_intr(0)
    enable_directed_thermal_pkg_intr-->>directed_pkg_intr_syscore_resume: return
    directed_pkg_intr_syscore_resume-->>Syscore: return

    Syscore->>directed_pkg_intr_syscore_shutdown: directed_pkg_intr_syscore_shutdown(data)
    directed_pkg_intr_syscore_shutdown->>disable_all_directed_thermal_pkg_intr: disable_all_directed_thermal_pkg_intr()
    disable_all_directed_thermal_pkg_intr-->>directed_pkg_intr_syscore_shutdown: return
    directed_pkg_intr_syscore_shutdown-->>Syscore: return
Loading

File-Level Changes

Change Details Files
Introduce infrastructure to enable/disable directed package-level thermal interrupts and assign a single handler CPU per package, with global rollback on failure.
  • Add DPTI-related feature detection, including a per-package directed_intr_handler_cpus array and helper to check support.
  • Implement functions to configure IA32_THERM_INTERRUPT on a target CPU via smp_call_function_single and to wait for hardware acknowledgment via PACKAGE_THERM_STATUS_DPTI_ACK with timeout and cleanup.
  • Provide enable/disable helpers for directed interrupts that choose a handler CPU per package on CPU online, reassign on CPU offline using cpumask_any_but, and fall back to broadcast delivery with global teardown if setup or reassignment fails.
drivers/thermal/intel/therm_throt.c
Integrate directed package interrupt handling with CPU hotplug and syscore suspend/resume/shutdown flows, ensuring consistent behavior across boot, suspend, resume, and kexec.
  • Extend thermal_throttle_online/offline callbacks to enable/disable directed package interrupts around the existing thermal and HFI setup/teardown, while correcting error handling when thermal_throttle_add_dev() fails.
  • Add syscore ops and registration to restore CPU0 as the handler on resume, disable its handler role on suspend, and globally tear down directed interrupts on shutdown (e.g., kexec) using the common rollback path.
  • Initialize DPTI support during thermal_throttle_init_device(), including allocation/initialization of directed_intr_handler_cpus and rollback on init failure, and ensure CPU hotplug locking/interrupt constraints are respected.
drivers/thermal/intel/therm_throt.c
Define architectural feature and MSR bits required for Intel Directed Package Thermal Interrupt support.
  • Add X86_FEATURE_DPTI in cpufeatures for CPUID.06H:EAX[24].
  • Add THERM_INT_DPTI_ENABLE and PACKAGE_THERM_STATUS_DPTI_ACK MSR bit definitions for IA32_THERM_INTERRUPT and IA32_PACKAGE_THERM_STATUS, respectively.
  • Update thermal_intr_init_pkg_clear_mask() to conditionally clear the DPTI-related package thermal status bit when supported.
arch/x86/include/asm/cpufeatures.h
arch/x86/include/asm/msr-index.h
drivers/thermal/intel/therm_throt.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from avenger-285714. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • disable_all_directed_thermal_pkg_intr() unconditionally uses smp_call_function_single() but is also invoked from the syscore shutdown callback where interrupts are disabled and CPU hotplug is frozen, which contradicts the function’s own calling requirements and risks deadlock; consider either avoiding SMP calls in syscore context or splitting out a variant that is safe for shutdown.
  • The comment above disable_all_directed_thermal_pkg_intr() mentions syscore resume and asserts no SMP calls will be issued in that context, but the current users are the syscore shutdown callback and cpuhp teardown path, so it would be good to update the comment to accurately describe the actual callers and constraints.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- disable_all_directed_thermal_pkg_intr() unconditionally uses smp_call_function_single() but is also invoked from the syscore shutdown callback where interrupts are disabled and CPU hotplug is frozen, which contradicts the function’s own calling requirements and risks deadlock; consider either avoiding SMP calls in syscore context or splitting out a variant that is safe for shutdown.
- The comment above disable_all_directed_thermal_pkg_intr() mentions syscore resume and asserts no SMP calls will be issued in that context, but the current users are the syscore shutdown callback and cpuhp teardown path, so it would be good to update the comment to accurately describe the actual callers and constraints.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for Intel Directed Package Thermal Interrupts (DPTI) in the x86 thermal throttling path to avoid broadcasting package thermal interrupts to all CPUs, reducing contention and unnecessary wakeups. It also integrates CPU hotplug and syscore (suspend/resume/shutdown) handling to keep the directed-interrupt “handler CPU” per package consistent across lifecycle events.

Changes:

  • Add a new x86 CPU feature bit and MSR bit definitions for DPTI enable/acknowledge.
  • Extend drivers/thermal/intel/therm_throt.c to opt CPUs into directed package thermal interrupts, select a per-package handler CPU via CPU hotplug, and restore/teardown state via syscore callbacks.
  • Adjust thermal hotplug online error handling to avoid leaving partially initialized sysfs resources behind.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
drivers/thermal/intel/therm_throt.c Implements directed package interrupt enable/disable, per-package handler selection, and syscore suspend/resume/shutdown integration.
arch/x86/include/asm/msr-index.h Adds MSR bit definitions for enabling DPTI and checking its hardware acknowledgment.
arch/x86/include/asm/cpufeatures.h Introduces X86_FEATURE_DPTI to gate the feature at runtime.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +675 to +679
/*
* The package-level interrupt must remain directed after this CPU goes
* offline.
*/
new_cpu = cpumask_any_but(topology_core_cpumask(cpu), cpu);
#define X86_FEATURE_HWP_HIGHEST_PERF_CHANGE (14*32+15) /* HWP Highest perf change */
#define X86_FEATURE_HFI (14*32+19) /* "hfi" Hardware Feedback Interface */

#define X86_FEATURE_DPTI (14*32+24) /* Intel Directed Package Thermal Interrupt */
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants