Skip to content

[WIP] [Deepin-Kernel-SIG] [linux 6.6.y] [FROMLIST] [Intel] thermal: intel: Add support for Directed Package Thermal Interrupt#1796

Draft
Avenger-285714 wants to merge 5 commits into
deepin-community:linux-6.6.yfrom
Avenger-285714:DPTI-6.6
Draft

[WIP] [Deepin-Kernel-SIG] [linux 6.6.y] [FROMLIST] [Intel] thermal: intel: Add support for Directed Package Thermal Interrupt#1796
Avenger-285714 wants to merge 5 commits into
deepin-community:linux-6.6.yfrom
Avenger-285714:DPTI-6.6

Conversation

@Avenger-285714
Copy link
Copy Markdown
Member

@Avenger-285714 Avenger-285714 commented Jun 2, 2026

Hi,

This is v2 of this series. The main changes are a new patch fixing a pre-
existing bug and a redesigned error rollback strategy that disables
directed thermal interrupts across all packages when enabling fails in any
one of them. Please see the changelog for details.

Package-level thermal interrupts are currently broadcast to all CPUs in a
package. Only one CPU is needed to service package-wide events.
Broadcasting creates unnecessary resource contention. Thermal interrupts
generated for Hardware Feedback Interface[1] updates are an example: all
CPUs in the package receive the interrupt and race for a lock to update a
shared data structure. Idle CPUs are needlessly woken up.

Newer Intel processors allow directing package-level thermal interrupts
only to CPUs that explicitly request them. A CPU opts in by setting a
designated bit in IA32_THERM_INTERRUPT. Hardware acknowledges the request
by setting a designated bit in IA32_PACKAGE_THERM_STATUS.

This series enables directed package-level thermal interrupts and
designates one handler CPU per package using the CPU hotplug
infrastructure. A new CPU is selected if the handler CPU goes offline.

Because CPU0's hotplug callbacks are not invoked during suspend and resume,
syscore callbacks are added to restore the handler for the boot package.
The series also disables directed delivery during kexec reboot, avoiding
stale interrupt routing when rebooting into a kernel that does not support
the feature.

This patchset introduces a change in behavior in the /sys/devices/system/
cpu/cpuN/thermal_throttle/package* sysfs files. These files reflect per-CPU
variables updated when a CPU handles a package-level thermal interrupt. In
broadcast mode, all CPUs update their variables. When directed package-
level thermal interrupts are enabled, only the handler CPU's variables are
updated.

Lastly, nothing changes for processors that do not support this feature:
they fall back to broadcast delivery.

[1] Intel Software Developer's Manual Vol. 3, Section 17.6, March 2026 https://www.intel.com/SDM

Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com

Changes in v2:

  • Fixed an existing bug in the error-handling of thermal_throttle_online()
    CPU hotplug callback.
  • Redesigned the rollback mechanism to handle all packages, not only
    the boot package.
  • Fixed the handling of the return value of cpumask_any_but(), which on
    failure returns small_cpumask_bits, not nr_cpu_ids.
  • Renamed the Directed Package Thermal Interrupt CPUID and MSR bits for
    consistency and brevity. (Boris)
  • Added measurements of the latency of setup acknowledgment from hardware.
  • Removed an unused argument from directed_thermal_pkg_intr_supported().
  • Reused the global rollback mechanism in the syscore shutdown callback.
  • Link to v1: https://lore.kernel.org/r/20260309-rneri-directed-therm-intr-v1-0-2956e3000950@linux.intel.com

Ricardo Neri (5):
thermal: intel: Fix dangling resources on thermal_throttle_online() failure
x86/thermal: Add bit definitions for Intel Directed Package Thermal Interrupt
thermal: intel: Enable the Directed Package-level Thermal Interrupt
thermal: intel: Add syscore callbacks for suspend and resume
thermal: intel: Add a syscore shutdown callback for kexec reboot

arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/msr-index.h | 2 +
drivers/thermal/intel/therm_throt.c | 272 +++++++++++++++++++++++++++++++++++-
3 files changed, 272 insertions(+), 4 deletions(-)

base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d
change-id: 20260306-rneri-directed-therm-intr-9f3f8888bb3f

Best regards,

Ricardo Neri ricardo.neri-calderon@linux.intel.com

Link: https://lore.kernel.org/linux-pm/20260528-rneri-directed-therm-intr-v2-0-8e2f9e0c1a36@linux.intel.com/

Summary by Sourcery

Add support for Intel Directed Package Thermal Interrupts in the x86 thermal throttling driver and ensure correct setup, teardown, and rollback across CPU hotplug, suspend/resume, and kexec paths.

New Features:

  • Introduce a per-package handler mechanism to direct Intel package-level thermal interrupts to a single CPU when hardware supports Directed Package Thermal Interrupts.
  • Expose a new x86 CPU feature flag and MSR bit definitions for Intel Directed Package Thermal Interrupt support.

Bug Fixes:

  • Fix resource leak and error handling in the thermal_throttle_online CPU hotplug callback to avoid dangling sysfs devices on failure.

Enhancements:

  • Add centralized enable/disable and rollback logic for directed package thermal interrupts across all packages, including fallback to broadcast delivery on hardware errors.
  • Integrate syscore suspend, resume, and shutdown callbacks so directed package thermal interrupt routing is correctly restored after suspend/resume and torn down before kexec reboots.
  • Extend thermal interrupt package clear mask handling to account for Directed Package Thermal Interrupt acknowledgment bits in package thermal status MSRs.

ricardon added 5 commits June 2, 2026 19:53
…ailure

The function thermal_throttle_add_dev() may fail and abort a CPU hotplug
online operation. Since the failure occurs within the online callback,
thermal_throttle_online(), the CPU hotplug framework does not invoke the
corresponding offline callback. As a result, the hardware and software
resources set up during the failed operation are not torn down.

Since only thermal_throttle_add_dev() can fail, call it before setting up
the rest of the resources.

Fixes: f665620 ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lore.kernel.org/linux-pm/20260528-rneri-directed-therm-intr-v2-1-8e2f9e0c1a36@linux.intel.com/
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
…nterrupt

Add CPUID and MSR bit definitions required to support Intel Directed
Package Thermal Interrupt.

A CPU requests directed package-level thermal interrupts by setting bit 25
in IA32_THERM_INTERRUPT. Hardware acknowledges by setting bit 25 in
IA32_PACKAGE_THERM_STATUS, indicating that only CPUs that opted in will
receive the interrupt. If no CPU in the package requests it, delivery
falls back to broadcast.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
[WangYuli: Fix conflicts]
Link: https://lore.kernel.org/linux-pm/20260528-rneri-directed-therm-intr-v2-2-8e2f9e0c1a36@linux.intel.com/
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
Package-level thermal interrupts are broadcast to all online CPUs within a
package, even though only one CPU needs to service them. This results in
unnecessary wakeups, lock contention, and corresponding performance and
power-efficiency penalties.

When supported by hardware, a CPU requests to receive directed package-
level thermal interrupts by setting a designated bit in
IA32_THERM_INTERRUPT. The operating system must then verify that hardware
has acknowledged this request by checking a designated bit in
IA32_PACKAGE_THERM_STATUS.

Enable directed package-level thermal interrupts on one CPU per package
using the CPU hotplug infrastructure. Keep track of the CPUs handling
package-level interrupts with an array.

If the handling CPU goes offline, select a new CPU. Temporarily enable
directed interrupts on both the current and new CPU until hardware
acknowledges the new selection, then disable them on the outgoing CPU.

Systems without directed-interrupt support continue to broadcast the
package-level interrupt to all CPUs.

Also, add a rollback mechanism in the CPU hotplug online callback to
fall back to broadcast mode if the directed-interrupt acknowledgment fails
in any package. This is most important during boot, when all CPUs in a
package come online and would otherwise keep retrying on faulty hardware.
A complete rollback is not needed in the CPU hotplug offline callback since
at that point the hardware is known to work.

While here, update an inline comment to point to the correct volume of the
Intel Software Developer's Manual.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lore.kernel.org/linux-pm/20260528-rneri-directed-therm-intr-v2-3-8e2f9e0c1a36@linux.intel.com/
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
Directed package-level thermal interrupts are serviced by a single CPU per
package. These handler CPUs are selected at boot through the CPU hotplug
infrastructure. This mechanism is sufficient to restore the directed
interrupt configuration when resuming from suspend for non-boot packages.
It also keeps the handler-tracking array updated.

For the boot package, CPU0 is chosen during boot because its CPU hotplug
online callback runs first. However, this callback is not invoked on
resume. The directed package-level interrupt configuration for the boot
package is not restored. Add a syscore resume callback to re-enable
directed package-level interrupts for this package.

Disabling directed interrupts during suspend is required to keep the
handler-tracking array in a consistent state for the boot package,
allowing the correct configuration to be restored on resume.

The resume callback must busy-wait for hardware acknowledgment of the
directed interrupt setup. Otherwise, the handler-tracking array could be
left in an inconsistent state. This implies running with interrupts
disabled for up to 15ms, though in practice it takes less than 1ms.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lore.kernel.org/linux-pm/20260528-rneri-directed-therm-intr-v2-4-8e2f9e0c1a36@linux.intel.com/
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
A kexec reboot may load a kernel that does not support directed package-
level thermal interrupts. Without a shutdown callback, the directed
interrupt configuration remains enabled across kexec but will not be
handled correctly. In particular, if the CPU designated to receive the
directed interrupt goes offline, no other CPU in the package will receive
it.

Add a syscore shutdown callback to disable directed package-level thermal
interrupts on all packages before a kexec reboot. If the post-kexec kernel
does not enable directed interrupts, it falls back to broadcasting the
interrupt to all CPUs.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lore.kernel.org/linux-pm/20260528-rneri-directed-therm-intr-v2-5-8e2f9e0c1a36@linux.intel.com/
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
@Avenger-285714 Avenger-285714 requested review from Copilot and opsiff June 2, 2026 12:01
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Jun 2, 2026

Reviewer's Guide

Implements support for Intel Directed Package Thermal Interrupts in the x86 thermal throttling driver, including CPUID/MSR definitions, per-package handler CPU selection with hotplug-aware enable/disable logic, syscore callbacks for suspend/resume/shutdown, and a bugfix to the CPU online path error handling and resource teardown.

Sequence diagram for enabling Directed Package Thermal Interrupt on CPU hotplug online

sequenceDiagram
    actor CPU
    participant thermal_throttle as thermal_throttle_online
    participant dpti as enable_directed_thermal_pkg_intr
    participant msr as config_directed_thermal_pkg_intr
    participant hw as MSR_IA32_PACKAGE_THERM_STATUS
    participant rollback as disable_all_directed_thermal_pkg_intr

    CPU->>thermal_throttle: thermal_throttle_online(cpu)
    thermal_throttle->>thermal_throttle: thermal_throttle_add_dev(dev, cpu)
    thermal_throttle->>dpti: enable_directed_thermal_pkg_intr(cpu)

    alt pkg_has_no_handler
        dpti->>msr: config_directed_thermal_pkg_intr(&enable=true)
        dpti->>hw: thermal_clear_package_intr_status(PACKAGE_LEVEL, PACKAGE_THERM_STATUS_DPTI_ACK)
        dpti->>dpti: check_directed_thermal_pkg_intr_ack()
        alt ack_ok
            dpti->>dpti: directed_intr_handler_cpus[pkg_id] = cpu
        else ack_timeout
            dpti->>msr: config_directed_thermal_pkg_intr(&enable=false)
            dpti->>rollback: disable_all_directed_thermal_pkg_intr()
        end
    else pkg_already_has_handler
        dpti-->>dpti: return (no change)
    end

    thermal_throttle->>thermal_throttle: apic_write(APIC_LVTTHMR, unmask)
    thermal_throttle-->>CPU: return
Loading

File-Level Changes

Change Details Files
Add x86 feature and MSR bit definitions for Directed Package Thermal Interrupt (DPTI).
  • Introduce X86_FEATURE_DPTI CPUID feature flag in cpufeatures header.
  • Define THERM_INT_DPTI_ENABLE bit in IA32_THERM_INTERRUPT MSR.
  • Define PACKAGE_THERM_STATUS_DPTI_ACK bit in IA32_PACKAGE_THERM_STATUS MSR.
  • Extend thermal package clear mask initialization to include DPTI ACK based on the new CPUID bit.
arch/x86/include/asm/cpufeatures.h
arch/x86/include/asm/msr-index.h
drivers/thermal/intel/therm_throt.c
Implement directed package-level thermal interrupt management and integration with CPU hotplug.
  • Add global per-package directed_intr_handler_cpus array to track the CPU handling the directed package interrupt per package.
  • Implement config_directed_thermal_pkg_intr() to toggle the DPTI enable bit in IA32_THERM_INTERRUPT on a target CPU, callable via smp_call_function_single().
  • Add check_directed_thermal_pkg_intr_ack() helper to wait (up to ~15ms) for hardware acknowledgment via PACKAGE_THERM_STATUS_DPTI_ACK and clear the status bit on success.
  • Implement enable_directed_thermal_pkg_intr() to select a handler CPU per package, program directed interrupts, verify hardware ACK, and rollback globally (disabling DPTI on all packages) on failure.
  • Implement disable_directed_thermal_pkg_intr() to migrate the handler to another CPU in the same package on CPU offline, falling back to broadcast if migration or ACK fails and updating directed_intr_handler_cpus accordingly.
  • Provide disable_all_directed_thermal_pkg_intr() to tear down directed interrupts on all packages and free the handler array.
  • Add directed_thermal_pkg_intr_supported() helper to gate DPTI logic on CPUID support and successful handler array allocation.
  • Wire enable_directed_thermal_pkg_intr()/disable_directed_thermal_pkg_intr() into thermal_throttle_online()/thermal_throttle_offline() hotplug callbacks.
drivers/thermal/intel/therm_throt.c
Add syscore-based power management and shutdown handling for DPTI to cover CPU0 and kexec paths.
  • Introduce syscore callbacks directed_pkg_intr_syscore_suspend(), _resume(), and _shutdown() to manage directed interrupts across suspend/resume and system shutdown (including kexec), explicitly handling CPU0 which lacks hotplug callbacks in these paths.
  • Register a syscore object directed_pkg_intr_pm with the new syscore_ops so DPTI is disabled before suspend/shutdown and re-enabled for the boot package on resume.
  • Ensure global DPTI teardown via disable_all_directed_thermal_pkg_intr() is reused from syscore shutdown for kexec to avoid stale routing when booting into a kernel without DPTI support.
drivers/thermal/intel/therm_throt.c
Fix and harden thermal_throttle_online() and module init error handling.
  • Change thermal_throttle_online() to call thermal_throttle_add_dev() first, return an error directly on failure, and reuse that result instead of calling the function at the end, avoiding dangling sysfs resources when subsequent initialization fails.
  • Call enable_directed_thermal_pkg_intr() during CPU online after HFI setup and before unmasking the thermal LVT.
  • In thermal_throttle_offline(), call disable_directed_thermal_pkg_intr() before tearing down HFI and workqueues.
  • During module/device init (thermal_throttle_init_device()), call init_directed_pkg_intr() once therm_throt_en is set, and on cpuhp_setup_state() failure perform global DPTI teardown via disable_all_directed_thermal_pkg_intr() before returning the error.
drivers/thermal/intel/therm_throt.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@Avenger-285714 Avenger-285714 changed the title [Deepin-Kernel-SIG] [linux 6.18.y] [FROMLIST] [Intel] thermal: intel: Add support for Directed Package Thermal Interrupt [Deepin-Kernel-SIG] [linux 6.6.y] [FROMLIST] [Intel] thermal: intel: Add support for Directed Package Thermal Interrupt Jun 2, 2026
@deepin-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from avenger-285714. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • check_directed_thermal_pkg_intr_ack() busy-waits for up to 15ms using udelay(1) in a tight loop; if this runs in a sleepable context (e.g. CPU hotplug online path), consider switching to a sleep-based wait (usleep_range or similar) or at least adding cpu_relax() to reduce contention.
  • disable_all_directed_thermal_pkg_intr() assumes the caller holds cpu_hotplug_lock, but it is also invoked from the syscore shutdown path where that may not be true; it would be good to either explicitly take the lock inside this helper (or use a non-blocking mechanism) or clearly document and ensure all call sites satisfy the locking requirement.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- check_directed_thermal_pkg_intr_ack() busy-waits for up to 15ms using udelay(1) in a tight loop; if this runs in a sleepable context (e.g. CPU hotplug online path), consider switching to a sleep-based wait (usleep_range or similar) or at least adding cpu_relax() to reduce contention.
- disable_all_directed_thermal_pkg_intr() assumes the caller holds cpu_hotplug_lock, but it is also invoked from the syscore shutdown path where that may not be true; it would be good to either explicitly take the lock inside this helper (or use a non-blocking mechanism) or clearly document and ensure all call sites satisfy the locking requirement.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@Avenger-285714 Avenger-285714 changed the title [Deepin-Kernel-SIG] [linux 6.6.y] [FROMLIST] [Intel] thermal: intel: Add support for Directed Package Thermal Interrupt [WIP] [Deepin-Kernel-SIG] [linux 6.6.y] [FROMLIST] [Intel] thermal: intel: Add support for Directed Package Thermal Interrupt Jun 2, 2026
@Avenger-285714 Avenger-285714 marked this pull request as draft June 2, 2026 12:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This patchset adds support for Intel Directed Package Thermal Interrupts (DPTI) in the Intel thermal throttling driver so that package-level thermal interrupts can be routed to a single designated CPU per package (when supported), with integration into CPU hotplug and power-management/reboot paths.

Changes:

  • Add a new x86 CPU feature flag (X86_FEATURE_DPTI) and MSR bit definitions for enabling DPTI and detecting hardware acknowledgment.
  • Extend the Intel thermal throttling driver to enable/disable directed delivery, select/transition a per-package handler CPU via CPU hotplug, and add syscore callbacks for suspend/resume and kexec shutdown.
  • Update package thermal status clear-mask logic to include the new DPTI acknowledgment bit.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
drivers/thermal/intel/therm_throt.c Implements directed package interrupt enable/disable, handler CPU tracking, hotplug hooks, and syscore integration; extends clear-mask handling.
arch/x86/include/asm/msr-index.h Adds MSR bit definitions for DPTI enable and DPTI ACK status.
arch/x86/include/asm/cpufeatures.h Introduces X86_FEATURE_DPTI feature bit for CPUID.06H:EAX[24].

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +786 to +807
static void directed_pkg_intr_syscore_resume(void *data)
{
enable_directed_thermal_pkg_intr(0);
}

static int directed_pkg_intr_syscore_suspend(void *data)
{
disable_directed_thermal_pkg_intr(0);

return 0;
}

static void directed_pkg_intr_syscore_shutdown(void *data)
{
disable_all_directed_thermal_pkg_intr();
}

static const struct syscore_ops directed_pkg_intr_pm_ops = {
.resume = directed_pkg_intr_syscore_resume,
.suspend = directed_pkg_intr_syscore_suspend,
.shutdown = directed_pkg_intr_syscore_shutdown,
};
Comment on lines +809 to +830
static struct syscore directed_pkg_intr_pm = {
.ops = &directed_pkg_intr_pm_ops,
};

static __init void init_directed_pkg_intr(void)
{
int i;

if (!boot_cpu_has(X86_FEATURE_DPTI))
return;

directed_intr_handler_cpus = kmalloc_array(topology_max_packages(),
sizeof(*directed_intr_handler_cpus),
GFP_KERNEL);
if (!directed_intr_handler_cpus)
return;

for (i = 0; i < topology_max_packages(); i++)
directed_intr_handler_cpus[i] = nr_cpu_ids;

register_syscore(&directed_pkg_intr_pm);
}
Comment on lines +679 to +680
new_cpu = cpumask_any_but(topology_core_cpumask(cpu), cpu);
if (new_cpu < nr_cpu_ids) {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants