build(deps): bump urllib3 from 2.0.7 to 2.2.2 in /drivers/gpu/drm/ci/xfails#3
Open
dependabot[bot] wants to merge 30 commits into
Open
build(deps): bump urllib3 from 2.0.7 to 2.2.2 in /drivers/gpu/drm/ci/xfails#3dependabot[bot] wants to merge 30 commits into
dependabot[bot] wants to merge 30 commits into
Conversation
This patch introduces Ponte Vecchio support in Xe driver. Please note that besides this patch, likely the force_probe still needs to be used in order to actually enable the support for PVC. This patch was separated from the rest so we can ensure compliance with DRM uAPI rules on compute platforms. Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Oded Gabbay <ogabbay@kernel.org> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1154 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Add PVC GuC FW definition. This is patch is for CI-only and it is meant for the topic/core-for-CI branch. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1155 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Add LNL GuC FW definition. This FW is not ready to be pushed to linux-firmware yet, so this patch is for CI-only and it is meant for the topic/core-for-CI branch. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1156 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
As a lockmap takes a reference for every ww_mutex used together, this can be an arbitrarily large number and under control of userspace -- easily overflowing the arbitrary limit of 4096. However, the pin_count (used for detecting unexpected lock dropping) is a full 32b despite nesting being extremely rare (see lockdep_pin_lock). References: https://gitlab.freedesktop.org/drm/intel/-/issues/8028 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190425092004.9995-33-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Converting to pin_count:11 as per addition of sync:1] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@intel.com>
We have recently turned on ftrace-dump-on-oops for i915's CI and an issue we have encountered is that the trace buffer size greatly exceeds the pstore capabilities; we get the tail of the oops but not the introduction. Currently the global buffer size is controllable on the cmdline, but at the request of our CI sysadmin, we would like to add a control to the Kconfig as well. The rationale being the cmdline carries the temporary hacks that we want to eradicate, and we want to track the permanent configuration in .config. I have kept the Kconfig option hidden from the user as the default should suffice for the majority of users; reserving the configuration for those that eschew the cmdline option. v2: Add an expert prompt to stop the default value overriding .config changes. References: https://gitlab.freedesktop.org/drm/intel/-/issues/8029 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tomi Sarvela <tomi.p.sarvela@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Most systems keep the last messages from the panic, and we value the stacktrace most, so dump it last in order to preserve it for post-mortems. References: https://gitlab.freedesktop.org/drm/intel/-/issues/8030 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Martin Peres <martin.peres@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180903131745.30593-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Under CI testing, it is common for the cpus to overheat with the continuous workloads and end up being throttled. As the cpus still function, it is less of a critical error meriting urgent action, but an expected yet significant condition (pr_note). References: https://gitlab.freedesktop.org/drm/intel/-/issues/8031 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Petri Latvala <petri.latvala@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
References: https://gitlab.freedesktop.org/drm/intel/-/issues/8032 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Petri Latvala <petri.latvala@intel.com> [danvet: Rebase] Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Work around the following boot time crash: [ 10.456056] CPU: 1 PID: 220 Comm: systemd-udevd Tainted: G W 4.17.0-rc7-CI-CI_DRM_4040+ #182 [ 10.465828] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS +ICLSFWR1.R00.2204.A00.1805172221 05/17/2018 [ 10.479168] RIP: 0010:acpi_ps_complete_this_op+0xa7/0x22a [ 10.484627] RSP: 0018:ffffc900003a7578 EFLAGS: 00010202 [ 10.489881] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8804abeda9c8 RCX: 0000000000000020 [ 10.497045] RDX: 0000000000000000 RSI: ffff88049e604a68 RDI: 0000000000000000 [ 10.504213] RBP: 0000000000000000 R08: ffff8804abeda9c8 R09: 0000000000000000 [ 10.511376] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000e [ 10.518542] R13: ffff88049e604a68 R14: ffff88049e604a68 R15: ffffffffa00263c2 [ 10.525713] FS: 00007ff6d85f18c0(0000) GS:ffff8804be880000(0000) knlGS:0000000000000000 [ 10.533839] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 10.539616] CR2: 00007ff6d73cff40 CR3: 000000049f794001 CR4: 0000000000760ee0 [ 10.546783] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 10.553949] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 10.561112] PKRU: 55555554 [ 10.563849] Call Trace: [ 10.566323] acpi_ps_complete_op+0x49/0x3f1 [ 10.570537] acpi_ps_parse_loop+0x94c/0x9bb [ 10.574754] ? acpi_ds_delete_walk_state+0x113/0x131 [ 10.579750] acpi_ps_parse_aml+0x1a2/0x4af [ 10.583875] acpi_ps_execute_method+0x1e9/0x2a5 [ 10.588435] acpi_ns_evaluate+0x2e4/0x42c [ 10.592473] acpi_evaluate_object+0x1fd/0x3a8 [ 10.596873] usb_acpi_find_companion+0xee/0x1f0 [usbcore] [ 10.602319] acpi_platform_notify+0x33/0xa0 [ 10.606532] device_add+0x197/0x600 [ 10.610048] ? __init_waitqueue_head+0x36/0x50 [ 10.614529] usb_hub_create_port_device+0x11d/0x340 [usbcore] [ 10.620314] hub_probe+0x9a5/0x1010 [usbcore] [ 10.624701] ? _raw_spin_unlock_irqrestore+0x51/0x60 [ 10.629730] usb_probe_interface+0x13f/0x300 [usbcore] [ 10.634900] driver_probe_device+0x302/0x470 [ 10.639198] ? __driver_attach+0xe0/0xe0 [ 10.643147] bus_for_each_drv+0x59/0x90 [ 10.647013] __device_attach+0xb7/0x130 [ 10.650878] bus_probe_device+0x9c/0xb0 [ 10.654745] device_add+0x3c5/0x600 [ 10.658270] usb_set_configuration+0x540/0x880 [usbcore] [ 10.663621] generic_probe+0x28/0x80 [usbcore] [ 10.668097] driver_probe_device+0x302/0x470 [ 10.672393] ? __driver_attach+0xe0/0xe0 [ 10.676346] bus_for_each_drv+0x59/0x90 [ 10.680211] __device_attach+0xb7/0x130 [ 10.684076] bus_probe_device+0x9c/0xb0 [ 10.687940] device_add+0x3c5/0x600 [ 10.691464] usb_new_device+0x269/0x490 [usbcore] [ 10.696206] usb_add_hcd+0x558/0x850 [usbcore] [ 10.700682] xhci_pci_probe+0x13d/0x240 [xhci_pci] [ 10.705534] pci_device_probe+0xa1/0x130 [ 10.709484] driver_probe_device+0x302/0x470 [ 10.713784] __driver_attach+0xb9/0xe0 [ 10.717562] ? driver_probe_device+0x470/0x470 [ 10.722033] ? driver_probe_device+0x470/0x470 [ 10.726505] bus_for_each_dev+0x64/0x90 [ 10.730370] ? preempt_count_sub+0x92/0xd0 [ 10.734495] bus_add_driver+0x164/0x260 [ 10.738362] ? 0xffffffffa004e000 [ 10.741704] driver_register+0x57/0xc0 [ 10.745482] ? 0xffffffffa004e000 [ 10.748824] do_one_initcall+0x4a/0x350 [ 10.752690] ? do_init_module+0x22/0x20a [ 10.756643] ? rcu_read_lock_sched_held+0x74/0x80 [ 10.761377] ? kmem_cache_alloc_trace+0x284/0x2e0 [ 10.766114] do_init_module+0x5b/0x20a [ 10.769895] load_module+0x250d/0x2b20 [ 10.773678] ? kernel_read+0x2c/0x40 [ 10.777285] ? __se_sys_finit_module+0xaa/0xc0 [ 10.781759] __se_sys_finit_module+0xaa/0xc0 [ 10.786061] do_syscall_64+0x54/0x190 [ 10.789752] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 10.794831] RIP: 0033:0x7ff6d74664d9 [ 10.798430] RSP: 002b:00007ffd91e7dd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 10.806033] RAX: ffffffffffffffda RBX: 0000560519bfae20 RCX: 00007ff6d74664d9 [ 10.813195] RDX: 0000000000000000 RSI: 00007ff6d795ce23 RDI: 000000000000000e [ 10.820360] RBP: 00007ff6d795ce23 R08: 0000000000000000 R09: 0000000000000000 [ 10.827523] R10: 000000000000000e R11: 0000000000000246 R12: 0000000000000000 [ 10.834690] R13: 0000560519bf9a30 R14: 0000000000020000 R15: 000000000aba9500 [ 10.841862] Code: c2 10 5f ea 81 48 c7 c6 f0 5e ea 81 bf 7c 00 00 00 e8 0d 7c 00 00 31 ed e9 88 01 00 00 48 8b 03 31 ed 48 85 c0 +0f 84 e9 00 00 00 <4c> 8b 60 28 4d 85 e4 0f 84 dc 00 00 00 0f b7 78 0a e8 62 fe ff [ 10.860832] RIP: acpi_ps_complete_this_op+0xa7/0x22a RSP: ffffc900003a7578 [ 10.867907] ---[ end trace 3a0d2ee1129bc71e ]--- References: https://gitlab.freedesktop.org/drm/intel/-/issues/8033 Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Imre Deak <imre.deak@intel.com> Tested-by: Tomi Sarvela <tomi.p.sarvela@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180702135756.12159-1-imre.deak@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
There's the hung_task_panic sysctl, but that's a bit an extreme measure.
As a fallback taint at least the machine.
Our CI uses this to decide when a reboot is necessary, plus to figure
out whether the kernel is still happy.
v2: Works much better when I put the else { add_taint() } at the right
place.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/8034
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: "Liu, Chuansheng" <chuansheng.liu@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk> (for core-for-CI)
Link: https://patchwork.freedesktop.org/patch/msgid/20190502204648.5537-1-daniel.vetter@ffwll.ch
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
There's the soft/hardlookup_panic sysctls, but that's a bit an extreme measure. As a fallback taint at least the machine. Our CI uses this to decide when a reboot is necessary, plus to figure out whether the kernel is still happy. References: https://gitlab.freedesktop.org/drm/intel/-/issues/8035 Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Laurence Oberman <loberman@redhat.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Sinan Kaya <okaya@kernel.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> (for core-for-CI) Link: https://patchwork.freedesktop.org/patch/msgid/20190502194208.3535-2-daniel.vetter@ffwll.ch Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
References: https://gitlab.freedesktop.org/drm/intel/-/issues/8036 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2781 [danvet: Rebase] Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We can't allow spam in CI. Update 26th June 2018: This is still an issue: Update 23rd May 2019: You guessed it, still ocurring. [ 224.739686] ------------[ cut here ]------------ [ 224.739712] WARNING: CPU: 3 PID: 2982 at net/sched/sch_generic.c:461 dev_watchdog+0x1fd/0x210 [ 224.739714] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_pcm i915 asix usbnet mii mei_me mei prime_numbers i2c_hid pinctrl_sunrisepoint pinctrl_intel btusb btrtl btbcm btintel bluetooth ecdh_generic [ 224.739775] CPU: 3 PID: 2982 Comm: gem_exec_suspen Tainted: G U W 4.18.0-rc2-CI-Patchwork_9414+ #1 [ 224.739777] Hardware name: Dell Inc. XPS 13 9350/, BIOS 1.4.12 11/30/2016 [ 224.739780] RIP: 0010:dev_watchdog+0x1fd/0x210 [ 224.739781] Code: 49 63 4c 24 f0 eb 92 4c 89 ef c6 05 21 46 ad 00 01 e8 77 ee fc ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 88 4c 14 82 e8 a3 fe 84 ff <0f> 0b eb be 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 c7 47 [ 224.739866] RSP: 0018:ffff88027dd83e40 EFLAGS: 00010286 [ 224.739869] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000102 [ 224.739871] RDX: 0000000080000102 RSI: ffffffff820c8c6c RDI: 00000000ffffffff [ 224.739873] RBP: ffff8802644c1540 R08: 0000000071be9b33 R09: 0000000000000000 [ 224.739874] R10: ffff88027dd83dc0 R11: 0000000000000000 R12: ffff8802644c1588 [ 224.739876] R13: ffff8802644c1160 R14: 0000000000000001 R15: ffff88026a5dc728 [ 224.739878] FS: 00007f18f4887980(0000) GS:ffff88027dd80000(0000) knlGS:0000000000000000 [ 224.739880] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 224.739881] CR2: 00007f4c627ae548 CR3: 000000022ca1a002 CR4: 00000000003606e0 [ 224.739883] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 224.739885] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 224.739886] Call Trace: [ 224.739888] <IRQ> [ 224.739892] ? qdisc_reset+0xe0/0xe0 [ 224.739894] ? qdisc_reset+0xe0/0xe0 [ 224.739897] call_timer_fn+0x93/0x360 [ 224.739903] expire_timers+0xc1/0x1d0 [ 224.739908] run_timer_softirq+0xc7/0x170 [ 224.739916] __do_softirq+0xd9/0x505 [ 224.739923] irq_exit+0xa9/0xc0 [ 224.739926] smp_apic_timer_interrupt+0x9c/0x2d0 [ 224.739929] apic_timer_interrupt+0xf/0x20 [ 224.739931] </IRQ> [ 224.739934] RIP: 0010:delay_tsc+0x2e/0xb0 [ 224.739936] Code: 49 89 fc 55 53 bf 01 00 00 00 e8 6d 2c 78 ff e8 88 9d b6 ff 41 89 c5 0f ae e8 0f 31 48 c1 e2 20 48 09 c2 48 89 d5 eb 16 f3 90 <bf> 01 00 00 00 e8 48 2c 78 ff e8 63 9d b6 ff 44 39 e8 75 36 0f ae [ 224.740021] RSP: 0018:ffffc900002f7d48 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13 [ 224.740024] RAX: 0000000080000000 RBX: 0000000649565ca9 RCX: 0000000000000001 [ 224.740026] RDX: 0000000080000001 RSI: ffffffff820c8c6c RDI: 00000000ffffffff [ 224.740027] RBP: 00000006493ea9ce R08: 000000005e81e2ee R09: 0000000000000000 [ 224.740029] R10: 0000000000000120 R11: 0000000000000000 R12: 00000000002ad8d6 [ 224.740030] R13: 0000000000000003 R14: 0000000000000004 R15: ffff88025caf5408 [ 224.740040] ? delay_tsc+0x66/0xb0 [ 224.740045] hibernation_debug_sleep+0x1c/0x30 [ 224.740048] hibernation_snapshot+0x2c1/0x690 [ 224.740053] hibernate+0x142/0x2a4 [ 224.740057] state_store+0xd0/0xe0 [ 224.740063] kernfs_fop_write+0x104/0x190 [ 224.740068] __vfs_write+0x31/0x180 [ 224.740072] ? rcu_read_lock_sched_held+0x6f/0x80 [ 224.740075] ? rcu_sync_lockdep_assert+0x29/0x50 [ 224.740078] ? __sb_start_write+0x152/0x1f0 [ 224.740080] ? __sb_start_write+0x168/0x1f0 [ 224.740084] vfs_write+0xbd/0x1a0 [ 224.740088] ksys_write+0x50/0xc0 [ 224.740094] do_syscall_64+0x55/0x190 [ 224.740097] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 224.740099] RIP: 0033:0x7f18f400a281 [ 224.740100] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53 [ 224.740186] RSP: 002b:00007fffd1f4fec8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 224.740189] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f18f400a281 [ 224.740190] RDX: 0000000000000004 RSI: 00007f18f448069a RDI: 0000000000000006 [ 224.740192] RBP: 00007fffd1f4fef0 R08: 0000000000000000 R09: 0000000000000000 [ 224.740194] R10: 0000000000000000 R11: 0000000000000246 R12: 000055e795d03400 [ 224.740195] R13: 00007fffd1f50500 R14: 0000000000000000 R15: 0000000000000000 [ 224.740205] irq event stamp: 1582591 [ 224.740207] hardirqs last enabled at (1582590): [<ffffffff810f9f9c>] vprintk_emit+0x4bc/0x4d0 [ 224.740210] hardirqs last disabled at (1582591): [<ffffffff81a0111c>] error_entry+0x7c/0x100 [ 224.740212] softirqs last enabled at (1582568): [<ffffffff81c0034f>] __do_softirq+0x34f/0x505 [ 224.740215] softirqs last disabled at (1582571): [<ffffffff8108c959>] irq_exit+0xa9/0xc0 [ 224.740218] WARNING: CPU: 3 PID: 2982 at net/sched/sch_generic.c:461 dev_watchdog+0x1fd/0x210 [ 224.740219] ---[ end trace 6e41d690e611c338 ]--- References: https://gitlab.freedesktop.org/drm/intel/-/issues/8037 References: https://bugzilla.kernel.org/show_bug.cgi?id=196399 Acked-by: Martin Peres <martin.peres@linux.intel.com> Cc: Martin Peres <martin.peres@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170718082110.12524-1-daniel.vetter@ffwll.ch Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@intel.com>
Since the kernel now used hashed pointers for raw addresses, it is very hard to guage the relative placement within a section, and since the hash value will never match up with any contents, using it provides no information relevant for slab debugging. Show the relative offset into each section, so that some reference for the hexdump is provided. References: https://gitlab.freedesktop.org/drm/intel/-/issues/8038 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
If the MSI is already enabled, trying to enable it again results in an -EINVAL and on the first attempt a WARN. That WARN causes our CI to abort the run [on each first attempt to suspend]: <4> [463.142025] WARNING: CPU: 0 PID: 2225 at drivers/pci/msi.c:1074 __pci_enable_msi_range+0x3cb/0x420 <4> [463.142026] Modules linked in: snd_hda_intel i915 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_dspcfg ghash_clmulni_intel snd_hda_codec btusb btrtl btbcm btintel e1000e bluetooth snd_hwdep snd_hda_core ptp ecdh_generic snd_pcm ecc pps_core mei_me mei prime_numbers [last unloaded: i915] <4> [463.142045] CPU: 0 PID: 2225 Comm: kworker/u8:14 Tainted: G U 5.7.0-rc2-CI-CI_DRM_8350+ #1 <4> [463.142046] Hardware name: Intel Corporation NUC7i5BNH/NUC7i5BNB, BIOS BNKBL357.86A.0060.2017.1214.2013 12/14/2017 <4> [463.142049] Workqueue: events_unbound async_run_entry_fn <4> [463.142051] RIP: 0010:__pci_enable_msi_range+0x3cb/0x420 <4> [463.142053] Code: 76 58 49 8d 56 48 48 89 df e8 31 73 fd ff e9 20 fe ff ff 31 f6 48 89 df e8 c2 e9 fd ff e9 d6 fe ff ff 45 89 fc e9 1a ff ff ff <0f> 0b 41 bc ea ff ff ff e9 0d ff ff ff 41 bc ea ff ff ff e9 02 ff <4> [463.142054] RSP: 0018:ffffc90000593cd0 EFLAGS: 00010202 <4> [463.142056] RAX: 0000000000000010 RBX: ffff888274051000 RCX: 0000000000000000 <4> [463.142057] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff888274051000 <4> [463.142058] RBP: ffff888238aa1018 R08: 0000000000000001 R09: 0000000000000001 <4> [463.142060] R10: ffffc90000593d90 R11: 00000000c79cdfd5 R12: ffff8882740510b0 <4> [463.142061] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 <4> [463.142062] FS: 0000000000000000(0000) GS:ffff888276c00000(0000) knlGS:0000000000000000 <4> [463.142064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [463.142065] CR2: 000055706f347d80 CR3: 0000000005610003 CR4: 00000000003606f0 <4> [463.142066] Call Trace: <4> [463.142073] pci_enable_msi+0x11/0x20 <4> [463.142077] azx_resume+0x1ab/0x200 [snd_hda_intel] <4> [463.142080] ? pci_pm_thaw+0x80/0x80 <4> [463.142084] dpm_run_callback+0x64/0x280 <4> [463.142089] device_resume+0xd4/0x1c0 <4> [463.142093] ? dpm_watchdog_set+0x60/0 While this would appear to be a bug in snd-hda, it does appear inconsequential, at least for gfx-ci. Downgrade the warning to an info, like the other already-enabled error for MSI-X. References: https://gitlab.freedesktop.org/drm/intel/-/issues/8041 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1687 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200423082753.3899018-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We inadvertently create a dependency on mmap_sem with a whole chain. This breaks any user who wants to take a lock and call rcu_barrier(), while also taking that lock inside mmap_sem: <4> [604.892532] ====================================================== <4> [604.892534] WARNING: possible circular locking dependency detected <4> [604.892536] 5.6.0-rc7-CI-Patchwork_17096+ #1 Tainted: G U <4> [604.892537] ------------------------------------------------------ <4> [604.892538] kms_frontbuffer/2595 is trying to acquire lock: <4> [604.892540] ffffffff8264a558 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190 <4> [604.892547] but task is already holding lock: <4> [604.892547] ffff888484716050 (reservation_ww_class_mutex){+.+.}, at: i915_gem_object_pin_to_display_plane+0x89/0x270 [i915] <4> [604.892592] which lock already depends on the new lock. <4> [604.892593] the existing dependency chain (in reverse order) is: <4> [604.892594] -> #6 (reservation_ww_class_mutex){+.+.}: <4> [604.892597] __ww_mutex_lock.constprop.15+0xc3/0x1090 <4> [604.892598] ww_mutex_lock+0x39/0x70 <4> [604.892600] dma_resv_lockdep+0x10e/0x1f5 <4> [604.892602] do_one_initcall+0x58/0x300 <4> [604.892604] kernel_init_freeable+0x17b/0x1dc <4> [604.892605] kernel_init+0x5/0x100 <4> [604.892606] ret_from_fork+0x24/0x50 <4> [604.892607] -> #5 (reservation_ww_class_acquire){+.+.}: <4> [604.892609] dma_resv_lockdep+0xec/0x1f5 <4> [604.892610] do_one_initcall+0x58/0x300 <4> [604.892610] kernel_init_freeable+0x17b/0x1dc <4> [604.892611] kernel_init+0x5/0x100 <4> [604.892612] ret_from_fork+0x24/0x50 <4> [604.892613] -> #4 (&mm->mmap_sem#2){++++}: <4> [604.892615] __might_fault+0x63/0x90 <4> [604.892617] _copy_to_user+0x1e/0x80 <4> [604.892619] perf_read+0x200/0x2b0 <4> [604.892621] vfs_read+0x96/0x160 <4> [604.892622] ksys_read+0x9f/0xe0 <4> [604.892623] do_syscall_64+0x4f/0x220 <4> [604.892624] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [604.892625] -> #3 (&cpuctx_mutex){+.+.}: <4> [604.892626] __mutex_lock+0x9a/0x9c0 <4> [604.892627] perf_event_init_cpu+0xa4/0x140 <4> [604.892629] perf_event_init+0x19d/0x1cd <4> [604.892630] start_kernel+0x362/0x4e4 <4> [604.892631] secondary_startup_64+0xa4/0xb0 <4> [604.892631] -> #2 (pmus_lock){+.+.}: <4> [604.892633] __mutex_lock+0x9a/0x9c0 <4> [604.892633] perf_event_init_cpu+0x6b/0x140 <4> [604.892635] cpuhp_invoke_callback+0x9b/0x9d0 <4> [604.892636] _cpu_up+0xa2/0x140 <4> [604.892637] do_cpu_up+0x61/0xa0 <4> [604.892639] smp_init+0x57/0x96 <4> [604.892639] kernel_init_freeable+0x87/0x1dc <4> [604.892640] kernel_init+0x5/0x100 <4> [604.892642] ret_from_fork+0x24/0x50 <4> [604.892642] -> #1 (cpu_hotplug_lock.rw_sem){++++}: <4> [604.892643] cpus_read_lock+0x34/0xd0 <4> [604.892644] rcu_barrier+0xaa/0x190 <4> [604.892645] kernel_init+0x21/0x100 <4> [604.892647] ret_from_fork+0x24/0x50 <4> [604.892647] -> #0 (rcu_state.barrier_mutex){+.+.}: <4> [604.892649] __lock_acquire+0x1328/0x15d0 <4> [604.892650] lock_acquire+0xa7/0x1c0 <4> [604.892651] __mutex_lock+0x9a/0x9c0 <4> [604.892652] rcu_barrier+0x23/0x190 <4> [604.892680] i915_gem_object_unbind+0x29d/0x3f0 [i915] <4> [604.892707] i915_gem_object_pin_to_display_plane+0x141/0x270 [i915] <4> [604.892737] intel_pin_and_fence_fb_obj+0xec/0x1f0 [i915] <4> [604.892767] intel_plane_pin_fb+0x3f/0xd0 [i915] <4> [604.892797] intel_prepare_plane_fb+0x13b/0x5c0 [i915] <4> [604.892798] drm_atomic_helper_prepare_planes+0x85/0x110 <4> [604.892827] intel_atomic_commit+0xda/0x390 [i915] <4> [604.892828] drm_atomic_helper_set_config+0x57/0xa0 <4> [604.892830] drm_mode_setcrtc+0x1c4/0x720 <4> [604.892830] drm_ioctl_kernel+0xb0/0xf0 <4> [604.892831] drm_ioctl+0x2e1/0x390 <4> [604.892833] ksys_ioctl+0x7b/0x90 <4> [604.892835] __x64_sys_ioctl+0x11/0x20 <4> [604.892835] do_syscall_64+0x4f/0x220 <4> [604.892836] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [604.892837] Changes since v1: - Use (*values)[n++] in perf_read_one(). Changes since v2: - Centrally allocate values. References: https://gitlab.freedesktop.org/drm/intel/-/issues/8042 Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200502171413.9133-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
References: https://gitlab.freedesktop.org/drm/intel/-/issues/8043 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Audio component in i915 is not hooked up yet causing long timeouts and angry abortive CI. References: https://gitlab.freedesktop.org/drm/intel/-/issues/8045 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
References: https://gitlab.freedesktop.org/drm/intel/-/issues/8046 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2805 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
References: https://gitlab.freedesktop.org/drm/intel/-/issues/8047 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2874 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@intel.com>
By allowing the filter_glob parameter to be written to, it's possible to tweak the testsuites that will be executed on new module loads. This makes it easier to run specific tests without having to reload kunit and provides a way to filter tests on real HW even if kunit is builtin. Example for xe driver: 1) Run just 1 test # echo -n xe_bo > /sys/module/kunit/parameters/filter_glob # modprobe -r xe_live_test # modprobe xe_live_test # ls /sys/kernel/debug/kunit/ xe_bo 2) Run all tests # echo \* > /sys/module/kunit/parameters/filter_glob # modprobe -r xe_live_test # modprobe xe_live_test # ls /sys/kernel/debug/kunit/ xe_bo xe_dma_buf xe_migrate xe_mocs For completeness and to cover other use cases, also change filter and filter_action to rw. Link: https://lore.kernel.org/intel-xe/dzacvbdditbneiu3e3fmstjmttcbne44yspumpkd6sjn56jqpk@vxu7sksbqrp6/ Reviewed-by: Rae Moar <rmoar@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> [ This commit is targeting 6.9 through the kunit tree. Add to topic/core-for-CI so our CI infrastructure can start using it ahead of that ] References: https://gitlab.freedesktop.org/drm/intel/-/issues/10157 Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
# Conflicts: # drivers/gpu/drm/bridge/samsung-dsim.c
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.7 to 2.2.2. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](urllib3/urllib3@2.0.7...2.2.2) --- updated-dependencies: - dependency-name: urllib3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>
rodrigovivi
pushed a commit
that referenced
this pull request
Jan 27, 2025
This commit addresses a circular locking dependency issue within the GFX
isolation mechanism. The problem was identified by a warning indicating
a potential deadlock due to inconsistent lock acquisition order.
- The `amdgpu_gfx_enforce_isolation_ring_begin_use` and
`amdgpu_gfx_enforce_isolation_ring_end_use` functions previously
acquired `enforce_isolation_mutex` and called `amdgpu_gfx_kfd_sch_ctrl`,
leading to potential deadlocks. ie., If `amdgpu_gfx_kfd_sch_ctrl` is
called while `enforce_isolation_mutex` is held, and
`amdgpu_gfx_enforce_isolation_handler` is called while `kfd_sch_mutex` is
held, it can create a circular dependency.
By ensuring consistent lock usage, this fix resolves the issue:
[ 606.297333] ======================================================
[ 606.297343] WARNING: possible circular locking dependency detected
[ 606.297353] 6.10.0-amd-mlkd-610-311224-lof #19 Tainted: G OE
[ 606.297365] ------------------------------------------------------
[ 606.297375] kworker/u96:3/3825 is trying to acquire lock:
[ 606.297385] ffff9aa64e431cb8 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}, at: __flush_work+0x232/0x610
[ 606.297413]
but task is already holding lock:
[ 606.297423] ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu]
[ 606.297725]
which lock already depends on the new lock.
[ 606.297738]
the existing dependency chain (in reverse order) is:
[ 606.297749]
-> #2 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}:
[ 606.297765] __mutex_lock+0x85/0x930
[ 606.297776] mutex_lock_nested+0x1b/0x30
[ 606.297786] amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu]
[ 606.298007] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu]
[ 606.298225] amdgpu_ring_alloc+0x48/0x70 [amdgpu]
[ 606.298412] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu]
[ 606.298603] amdgpu_job_run+0xac/0x1e0 [amdgpu]
[ 606.298866] drm_sched_run_job_work+0x24f/0x430 [gpu_sched]
[ 606.298880] process_one_work+0x21e/0x680
[ 606.298890] worker_thread+0x190/0x350
[ 606.298899] kthread+0xe7/0x120
[ 606.298908] ret_from_fork+0x3c/0x60
[ 606.298919] ret_from_fork_asm+0x1a/0x30
[ 606.298929]
-> #1 (&adev->enforce_isolation_mutex){+.+.}-{3:3}:
[ 606.298947] __mutex_lock+0x85/0x930
[ 606.298956] mutex_lock_nested+0x1b/0x30
[ 606.298966] amdgpu_gfx_enforce_isolation_handler+0x87/0x370 [amdgpu]
[ 606.299190] process_one_work+0x21e/0x680
[ 606.299199] worker_thread+0x190/0x350
[ 606.299208] kthread+0xe7/0x120
[ 606.299217] ret_from_fork+0x3c/0x60
[ 606.299227] ret_from_fork_asm+0x1a/0x30
[ 606.299236]
-> #0 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}:
[ 606.299257] __lock_acquire+0x16f9/0x2810
[ 606.299267] lock_acquire+0xd1/0x300
[ 606.299276] __flush_work+0x250/0x610
[ 606.299286] cancel_delayed_work_sync+0x71/0x80
[ 606.299296] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu]
[ 606.299509] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu]
[ 606.299723] amdgpu_ring_alloc+0x48/0x70 [amdgpu]
[ 606.299909] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu]
[ 606.300101] amdgpu_job_run+0xac/0x1e0 [amdgpu]
[ 606.300355] drm_sched_run_job_work+0x24f/0x430 [gpu_sched]
[ 606.300369] process_one_work+0x21e/0x680
[ 606.300378] worker_thread+0x190/0x350
[ 606.300387] kthread+0xe7/0x120
[ 606.300396] ret_from_fork+0x3c/0x60
[ 606.300406] ret_from_fork_asm+0x1a/0x30
[ 606.300416]
other info that might help us debug this:
[ 606.300428] Chain exists of:
(work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work) --> &adev->enforce_isolation_mutex --> &adev->gfx.kfd_sch_mutex
[ 606.300458] Possible unsafe locking scenario:
[ 606.300468] CPU0 CPU1
[ 606.300476] ---- ----
[ 606.300484] lock(&adev->gfx.kfd_sch_mutex);
[ 606.300494] lock(&adev->enforce_isolation_mutex);
[ 606.300508] lock(&adev->gfx.kfd_sch_mutex);
[ 606.300521] lock((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work));
[ 606.300536]
*** DEADLOCK ***
[ 606.300546] 5 locks held by kworker/u96:3/3825:
[ 606.300555] #0: ffff9aa5aa1f5d58 ((wq_completion)comp_1.1.0){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[ 606.300577] #1: ffffaa53c3c97e40 ((work_completion)(&sched->work_run_job)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[ 606.300600] #2: ffff9aa64e463c98 (&adev->enforce_isolation_mutex){+.+.}-{3:3}, at: amdgpu_gfx_enforce_isolation_ring_begin_use+0x1c3/0x5d0 [amdgpu]
[ 606.300837] #3: ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu]
[ 606.301062] #4: ffffffff8c1a5660 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x70/0x610
[ 606.301083]
stack backtrace:
[ 606.301092] CPU: 14 PID: 3825 Comm: kworker/u96:3 Tainted: G OE 6.10.0-amd-mlkd-610-311224-lof #19
[ 606.301109] Hardware name: Gigabyte Technology Co., Ltd. X570S GAMING X/X570S GAMING X, BIOS F7 03/22/2024
[ 606.301124] Workqueue: comp_1.1.0 drm_sched_run_job_work [gpu_sched]
[ 606.301140] Call Trace:
[ 606.301146] <TASK>
[ 606.301154] dump_stack_lvl+0x9b/0xf0
[ 606.301166] dump_stack+0x10/0x20
[ 606.301175] print_circular_bug+0x26c/0x340
[ 606.301187] check_noncircular+0x157/0x170
[ 606.301197] ? register_lock_class+0x48/0x490
[ 606.301213] __lock_acquire+0x16f9/0x2810
[ 606.301230] lock_acquire+0xd1/0x300
[ 606.301239] ? __flush_work+0x232/0x610
[ 606.301250] ? srso_alias_return_thunk+0x5/0xfbef5
[ 606.301261] ? mark_held_locks+0x54/0x90
[ 606.301274] ? __flush_work+0x232/0x610
[ 606.301284] __flush_work+0x250/0x610
[ 606.301293] ? __flush_work+0x232/0x610
[ 606.301305] ? __pfx_wq_barrier_func+0x10/0x10
[ 606.301318] ? mark_held_locks+0x54/0x90
[ 606.301331] ? srso_alias_return_thunk+0x5/0xfbef5
[ 606.301345] cancel_delayed_work_sync+0x71/0x80
[ 606.301356] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu]
[ 606.301661] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu]
[ 606.302050] ? srso_alias_return_thunk+0x5/0xfbef5
[ 606.302069] amdgpu_ring_alloc+0x48/0x70 [amdgpu]
[ 606.302452] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu]
[ 606.302862] ? drm_sched_entity_error+0x82/0x190 [gpu_sched]
[ 606.302890] amdgpu_job_run+0xac/0x1e0 [amdgpu]
[ 606.303366] drm_sched_run_job_work+0x24f/0x430 [gpu_sched]
[ 606.303388] process_one_work+0x21e/0x680
[ 606.303409] worker_thread+0x190/0x350
[ 606.303424] ? __pfx_worker_thread+0x10/0x10
[ 606.303437] kthread+0xe7/0x120
[ 606.303449] ? __pfx_kthread+0x10/0x10
[ 606.303463] ret_from_fork+0x3c/0x60
[ 606.303476] ? __pfx_kthread+0x10/0x10
[ 606.303489] ret_from_fork_asm+0x1a/0x30
[ 606.303512] </TASK>
v2: Refactor lock handling to resolve circular dependency (Alex)
- Introduced a `sched_work` flag to defer the call to
`amdgpu_gfx_kfd_sch_ctrl` until after releasing
`enforce_isolation_mutex`.
- This change ensures that `amdgpu_gfx_kfd_sch_ctrl` is called outside
the critical section, preventing the circular dependency and deadlock.
- The `sched_work` flag is set within the mutex-protected section if
conditions are met, and the actual function call is made afterward.
- This approach ensures consistent lock acquisition order.
Fixes: afefd6f ("drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0b6b2dd)
Cc: stable@vger.kernel.org
rodrigovivi
pushed a commit
that referenced
this pull request
Jan 27, 2025
Fix a lockdep warning [1] observed during the write combining test.
The warning indicates a potential nested lock scenario that could lead
to a deadlock.
However, this is a false positive alarm because the SF lock and its
parent lock are distinct ones.
The lockdep confusion arises because the locks belong to the same object
class (i.e., struct mlx5_core_dev).
To resolve this, the code has been refactored to avoid taking both
locks. Instead, only the parent lock is acquired.
[1]
raw_ethernet_bw/2118 is trying to acquire lock:
[ 213.619032] ffff88811dd75e08 (&dev->wc_state_lock){+.+.}-{3:3}, at:
mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.620270]
[ 213.620270] but task is already holding lock:
[ 213.620943] ffff88810b585e08 (&dev->wc_state_lock){+.+.}-{3:3}, at:
mlx5_wc_support_get+0x10c/0x210 [mlx5_core]
[ 213.622045]
[ 213.622045] other info that might help us debug this:
[ 213.622778] Possible unsafe locking scenario:
[ 213.622778]
[ 213.623465] CPU0
[ 213.623815] ----
[ 213.624148] lock(&dev->wc_state_lock);
[ 213.624615] lock(&dev->wc_state_lock);
[ 213.625071]
[ 213.625071] *** DEADLOCK ***
[ 213.625071]
[ 213.625805] May be due to missing lock nesting notation
[ 213.625805]
[ 213.626522] 4 locks held by raw_ethernet_bw/2118:
[ 213.627019] #0: ffff88813f80d578 (&uverbs_dev->disassociate_srcu){.+.+}-{0:0},
at: ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[ 213.628088] #1: ffff88810fb23930 (&file->hw_destroy_rwsem){.+.+}-{3:3},
at: ib_init_ucontext+0x2d/0xf0 [ib_uverbs]
[ 213.629094] #2: ffff88810fb23878 (&file->ucontext_lock){+.+.}-{3:3},
at: ib_init_ucontext+0x49/0xf0 [ib_uverbs]
[ 213.630106] #3: ffff88810b585e08 (&dev->wc_state_lock){+.+.}-{3:3},
at: mlx5_wc_support_get+0x10c/0x210 [mlx5_core]
[ 213.631185]
[ 213.631185] stack backtrace:
[ 213.631718] CPU: 1 UID: 0 PID: 2118 Comm: raw_ethernet_bw Not tainted
6.12.0-rc7_internal_net_next_mlx5_89a0ad0 #1
[ 213.632722] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 213.633785] Call Trace:
[ 213.634099]
[ 213.634393] dump_stack_lvl+0x7e/0xc0
[ 213.634806] print_deadlock_bug+0x278/0x3c0
[ 213.635265] __lock_acquire+0x15f4/0x2c40
[ 213.635712] lock_acquire+0xcd/0x2d0
[ 213.636120] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.636722] ? mlx5_ib_enable_lb+0x24/0xa0 [mlx5_ib]
[ 213.637277] __mutex_lock+0x81/0xda0
[ 213.637697] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.638305] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.638902] ? rcu_read_lock_sched_held+0x3f/0x70
[ 213.639400] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.640016] mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.640615] set_ucontext_resp+0x68/0x2b0 [mlx5_ib]
[ 213.641144] ? debug_mutex_init+0x33/0x40
[ 213.641586] mlx5_ib_alloc_ucontext+0x18e/0x7b0 [mlx5_ib]
[ 213.642145] ib_init_ucontext+0xa0/0xf0 [ib_uverbs]
[ 213.642679] ib_uverbs_handler_UVERBS_METHOD_GET_CONTEXT+0x95/0xc0
[ib_uverbs]
[ 213.643426] ? _copy_from_user+0x46/0x80
[ 213.643878] ib_uverbs_cmd_verbs+0xa6b/0xc80 [ib_uverbs]
[ 213.644426] ? ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x130/0x130
[ib_uverbs]
[ 213.645213] ? __lock_acquire+0xa99/0x2c40
[ 213.645675] ? lock_acquire+0xcd/0x2d0
[ 213.646101] ? ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[ 213.646625] ? reacquire_held_locks+0xcf/0x1f0
[ 213.647102] ? do_user_addr_fault+0x45d/0x770
[ 213.647586] ib_uverbs_ioctl+0xe0/0x170 [ib_uverbs]
[ 213.648102] ? ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[ 213.648632] __x64_sys_ioctl+0x4d3/0xaa0
[ 213.649060] ? do_user_addr_fault+0x4a8/0x770
[ 213.649528] do_syscall_64+0x6d/0x140
[ 213.649947] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 213.650478] RIP: 0033:0x7fa179b0737b
[ 213.650893] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c
89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8
10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d
7d 2a 0f 00 f7 d8 64 89 01 48
[ 213.652619] RSP: 002b:00007ffd2e6d46e8 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 213.653390] RAX: ffffffffffffffda RBX: 00007ffd2e6d47f8 RCX:
00007fa179b0737b
[ 213.654084] RDX: 00007ffd2e6d47e0 RSI: 00000000c0181b01 RDI:
0000000000000003
[ 213.654767] RBP: 00007ffd2e6d47c0 R08: 00007fa1799be010 R09:
0000000000000002
[ 213.655453] R10: 00007ffd2e6d4960 R11: 0000000000000246 R12:
00007ffd2e6d487c
[ 213.656170] R13: 0000000000000027 R14: 0000000000000001 R15:
00007ffd2e6d4f70
Fixes: d98995b ("net/mlx5: Reimplement write combining test")
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
rodrigovivi
pushed a commit
that referenced
this pull request
Feb 10, 2025
When COWing a relocation tree path, at relocation.c:replace_path(), we can trigger a lockdep splat while we are in the btrfs_search_slot() call against the relocation root. This happens in that callchain at ctree.c:read_block_for_search() when we happen to find a child extent buffer already loaded through the fs tree with a lockdep class set to the fs tree. So when we attempt to lock that extent buffer through a relocation tree we have to reset the lockdep class to the class for a relocation tree, since a relocation tree has extent buffers that used to belong to a fs tree and may currently be already loaded (we swap extent buffers between the two trees at the end of replace_path()). However we are missing calls to btrfs_maybe_reset_lockdep_class() to reset the lockdep class at ctree.c:read_block_for_search() before we read lock an extent buffer, just like we did for btrfs_search_slot() in commit b40130b ("btrfs: fix lockdep splat with reloc root extent buffers"). So add the missing btrfs_maybe_reset_lockdep_class() calls before the attempts to read lock an extent buffer at ctree.c:read_block_for_search(). The lockdep splat was reported by syzbot and it looks like this: ====================================================== WARNING: possible circular locking dependency detected 6.13.0-rc5-syzkaller-00163-gab75170520d4 #0 Not tainted ------------------------------------------------------ syz.0.0/5335 is trying to acquire lock: ffff8880545dbc38 (btrfs-tree-01){++++}-{4:4}, at: btrfs_tree_read_lock_nested+0x2f/0x250 fs/btrfs/locking.c:146 but task is already holding lock: ffff8880545dba58 (btrfs-treloc-02/1){+.+.}-{4:4}, at: btrfs_tree_lock_nested+0x2f/0x250 fs/btrfs/locking.c:189 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (btrfs-treloc-02/1){+.+.}-{4:4}: reacquire_held_locks+0x3eb/0x690 kernel/locking/lockdep.c:5374 __lock_release kernel/locking/lockdep.c:5563 [inline] lock_release+0x396/0xa30 kernel/locking/lockdep.c:5870 up_write+0x79/0x590 kernel/locking/rwsem.c:1629 btrfs_force_cow_block+0x14b3/0x1fd0 fs/btrfs/ctree.c:660 btrfs_cow_block+0x371/0x830 fs/btrfs/ctree.c:755 btrfs_search_slot+0xc01/0x3180 fs/btrfs/ctree.c:2153 replace_path+0x1243/0x2740 fs/btrfs/relocation.c:1224 merge_reloc_root+0xc46/0x1ad0 fs/btrfs/relocation.c:1692 merge_reloc_roots+0x3b3/0x980 fs/btrfs/relocation.c:1942 relocate_block_group+0xb0a/0xd40 fs/btrfs/relocation.c:3754 btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4087 btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3494 __btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4278 btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4655 btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3670 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #1 (btrfs-tree-01/1){+.+.}-{4:4}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 down_write_nested+0xa2/0x220 kernel/locking/rwsem.c:1693 btrfs_tree_lock_nested+0x2f/0x250 fs/btrfs/locking.c:189 btrfs_init_new_buffer fs/btrfs/extent-tree.c:5052 [inline] btrfs_alloc_tree_block+0x41c/0x1440 fs/btrfs/extent-tree.c:5132 btrfs_force_cow_block+0x526/0x1fd0 fs/btrfs/ctree.c:573 btrfs_cow_block+0x371/0x830 fs/btrfs/ctree.c:755 btrfs_search_slot+0xc01/0x3180 fs/btrfs/ctree.c:2153 btrfs_insert_empty_items+0x9c/0x1a0 fs/btrfs/ctree.c:4351 btrfs_insert_empty_item fs/btrfs/ctree.h:688 [inline] btrfs_insert_inode_ref+0x2bb/0xf80 fs/btrfs/inode-item.c:330 btrfs_rename_exchange fs/btrfs/inode.c:7990 [inline] btrfs_rename2+0xcb7/0x2b90 fs/btrfs/inode.c:8374 vfs_rename+0xbdb/0xf00 fs/namei.c:5067 do_renameat2+0xd94/0x13f0 fs/namei.c:5224 __do_sys_renameat2 fs/namei.c:5258 [inline] __se_sys_renameat2 fs/namei.c:5255 [inline] __x64_sys_renameat2+0xce/0xe0 fs/namei.c:5255 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (btrfs-tree-01){++++}-{4:4}: check_prev_add kernel/locking/lockdep.c:3161 [inline] check_prevs_add kernel/locking/lockdep.c:3280 [inline] validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904 __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 down_read_nested+0xb5/0xa50 kernel/locking/rwsem.c:1649 btrfs_tree_read_lock_nested+0x2f/0x250 fs/btrfs/locking.c:146 btrfs_tree_read_lock fs/btrfs/locking.h:188 [inline] read_block_for_search+0x718/0xbb0 fs/btrfs/ctree.c:1610 btrfs_search_slot+0x1274/0x3180 fs/btrfs/ctree.c:2237 replace_path+0x1243/0x2740 fs/btrfs/relocation.c:1224 merge_reloc_root+0xc46/0x1ad0 fs/btrfs/relocation.c:1692 merge_reloc_roots+0x3b3/0x980 fs/btrfs/relocation.c:1942 relocate_block_group+0xb0a/0xd40 fs/btrfs/relocation.c:3754 btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4087 btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3494 __btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4278 btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4655 btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3670 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Chain exists of: btrfs-tree-01 --> btrfs-tree-01/1 --> btrfs-treloc-02/1 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(btrfs-treloc-02/1); lock(btrfs-tree-01/1); lock(btrfs-treloc-02/1); rlock(btrfs-tree-01); *** DEADLOCK *** 8 locks held by syz.0.0/5335: #0: ffff88801e3ae420 (sb_writers#13){.+.+}-{0:0}, at: mnt_want_write_file+0x5e/0x200 fs/namespace.c:559 #1: ffff888052c760d0 (&fs_info->reclaim_bgs_lock){+.+.}-{4:4}, at: __btrfs_balance+0x4c2/0x26b0 fs/btrfs/volumes.c:4183 #2: ffff888052c74850 (&fs_info->cleaner_mutex){+.+.}-{4:4}, at: btrfs_relocate_block_group+0x775/0xd90 fs/btrfs/relocation.c:4086 #3: ffff88801e3ae610 (sb_internal#2){.+.+}-{0:0}, at: merge_reloc_root+0xf11/0x1ad0 fs/btrfs/relocation.c:1659 #4: ffff888052c76470 (btrfs_trans_num_writers){++++}-{0:0}, at: join_transaction+0x405/0xda0 fs/btrfs/transaction.c:288 #5: ffff888052c76498 (btrfs_trans_num_extwriters){++++}-{0:0}, at: join_transaction+0x405/0xda0 fs/btrfs/transaction.c:288 #6: ffff8880545db878 (btrfs-tree-01/1){+.+.}-{4:4}, at: btrfs_tree_lock_nested+0x2f/0x250 fs/btrfs/locking.c:189 #7: ffff8880545dba58 (btrfs-treloc-02/1){+.+.}-{4:4}, at: btrfs_tree_lock_nested+0x2f/0x250 fs/btrfs/locking.c:189 stack backtrace: CPU: 0 UID: 0 PID: 5335 Comm: syz.0.0 Not tainted 6.13.0-rc5-syzkaller-00163-gab75170520d4 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2074 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2206 check_prev_add kernel/locking/lockdep.c:3161 [inline] check_prevs_add kernel/locking/lockdep.c:3280 [inline] validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904 __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 down_read_nested+0xb5/0xa50 kernel/locking/rwsem.c:1649 btrfs_tree_read_lock_nested+0x2f/0x250 fs/btrfs/locking.c:146 btrfs_tree_read_lock fs/btrfs/locking.h:188 [inline] read_block_for_search+0x718/0xbb0 fs/btrfs/ctree.c:1610 btrfs_search_slot+0x1274/0x3180 fs/btrfs/ctree.c:2237 replace_path+0x1243/0x2740 fs/btrfs/relocation.c:1224 merge_reloc_root+0xc46/0x1ad0 fs/btrfs/relocation.c:1692 merge_reloc_roots+0x3b3/0x980 fs/btrfs/relocation.c:1942 relocate_block_group+0xb0a/0xd40 fs/btrfs/relocation.c:3754 btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4087 btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3494 __btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4278 btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4655 btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3670 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f1ac6985d29 Code: ff ff c3 (...) RSP: 002b:00007f1ac63fe038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f1ac6b76160 RCX: 00007f1ac6985d29 RDX: 0000000020000180 RSI: 00000000c4009420 RDI: 0000000000000007 RBP: 00007f1ac6a01b08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000001 R14: 00007f1ac6b76160 R15: 00007fffda145a88 </TASK> Reported-by: syzbot+63913e558c084f7f8fdc@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/677b3014.050a0220.3b53b0.0064.GAE@google.com/ Fixes: 9978599 ("btrfs: reduce lock contention when eb cache miss for btree search") Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
rodrigovivi
pushed a commit
that referenced
this pull request
Feb 18, 2025
We have several places across the kernel where we want to access another
task's syscall arguments, such as ptrace(2), seccomp(2), etc., by making
a call to syscall_get_arguments().
This works for register arguments right away by accessing the task's
`regs' member of `struct pt_regs', however for stack arguments seen with
32-bit/o32 kernels things are more complicated. Technically they ought
to be obtained from the user stack with calls to an access_remote_vm(),
but we have an easier way available already.
So as to be able to access syscall stack arguments as regular function
arguments following the MIPS calling convention we copy them over from
the user stack to the kernel stack in arch/mips/kernel/scall32-o32.S, in
handle_sys(), to the current stack frame's outgoing argument space at
the top of the stack, which is where the handler called expects to see
its incoming arguments. This area is also pointed at by the `pt_regs'
pointer obtained by task_pt_regs().
Make the o32 stack argument space a proper member of `struct pt_regs'
then, by renaming the existing member from `pad0' to `args' and using
generated offsets to access the space. No functional change though.
With the change in place the o32 kernel stack frame layout at the entry
to a syscall handler invoked by handle_sys() is therefore as follows:
$sp + 68 -> | ... | <- pt_regs.regs[9]
+---------------------+
$sp + 64 -> | $t0 | <- pt_regs.regs[8]
+---------------------+
$sp + 60 -> | $a3/argument #4 | <- pt_regs.regs[7]
+---------------------+
$sp + 56 -> | $a2/argument #3 | <- pt_regs.regs[6]
+---------------------+
$sp + 52 -> | $a1/argument #2 | <- pt_regs.regs[5]
+---------------------+
$sp + 48 -> | $a0/argument #1 | <- pt_regs.regs[4]
+---------------------+
$sp + 44 -> | $v1 | <- pt_regs.regs[3]
+---------------------+
$sp + 40 -> | $v0 | <- pt_regs.regs[2]
+---------------------+
$sp + 36 -> | $at | <- pt_regs.regs[1]
+---------------------+
$sp + 32 -> | $zero | <- pt_regs.regs[0]
+---------------------+
$sp + 28 -> | stack argument #8 | <- pt_regs.args[7]
+---------------------+
$sp + 24 -> | stack argument #7 | <- pt_regs.args[6]
+---------------------+
$sp + 20 -> | stack argument #6 | <- pt_regs.args[5]
+---------------------+
$sp + 16 -> | stack argument #5 | <- pt_regs.args[4]
+---------------------+
$sp + 12 -> | psABI space for $a3 | <- pt_regs.args[3]
+---------------------+
$sp + 8 -> | psABI space for $a2 | <- pt_regs.args[2]
+---------------------+
$sp + 4 -> | psABI space for $a1 | <- pt_regs.args[1]
+---------------------+
$sp + 0 -> | psABI space for $a0 | <- pt_regs.args[0]
+---------------------+
holding user data received and with the first 4 frame slots reserved by
the psABI for the compiler to spill the incoming arguments from $a0-$a3
registers (which it sometimes does according to its needs) and the next
4 frame slots designated by the psABI for any stack function arguments
that follow. This data is also available for other tasks to peek/poke
at as reqired and where permitted.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
rodrigovivi
pushed a commit
that referenced
this pull request
Mar 3, 2025
Since commit 6037802 ("power: supply: core: implement extension API") there is the following ABBA deadlock (simplified) between the LED trigger code and the power-supply code: 1) When registering a power-supply class device, power_supply_register() calls led_trigger_register() from power_supply_create_triggers() in a scoped_guard(rwsem_read, &psy->extensions_sem) context. led_trigger_register() then in turn takes a LED subsystem lock. So here we have the following locking order: * Read-lock extensions_sem * Lock LED subsystem lock(s) 2) When registering a LED class device, with its default trigger set to a power-supply LED trigger (which has already been registered) The LED class code calls power_supply_led_trigger_activate() when setting up the default trigger. power_supply_led_trigger_activate() calls power_supply_get_property() to determine the initial value of to assign to the LED and that read-locks extensions_sem. So now we have the following locking order: * Lock LED subsystem lock(s) * Read-lock extensions_sem Fixing this is easy, there is no need to hold the extensions_sem when calling power_supply_create_triggers() since all triggers are always created rather then checking for the presence of certain attributes as power_supply_add_hwmon_sysfs() does. Move power_supply_create_triggers() out of the guard block to fix this. Here is the lockdep report fixed by this change: [ 31.249343] ====================================================== [ 31.249378] WARNING: possible circular locking dependency detected [ 31.249413] 6.13.0-rc6+ #251 Tainted: G C E [ 31.249440] ------------------------------------------------------ [ 31.249471] (udev-worker)/553 is trying to acquire lock: [ 31.249501] ffff892adbcaf660 (&psy->extensions_sem){.+.+}-{4:4}, at: power_supply_get_property.part.0+0x22/0x150 [ 31.249574] but task is already holding lock: [ 31.249603] ffff892adbc0bad0 (&led_cdev->trigger_lock){+.+.}-{4:4}, at: led_trigger_set_default+0x34/0xe0 [ 31.249657] which lock already depends on the new lock. [ 31.249696] the existing dependency chain (in reverse order) is: [ 31.249735] -> #2 (&led_cdev->trigger_lock){+.+.}-{4:4}: [ 31.249778] down_write+0x3b/0xd0 [ 31.249803] led_trigger_set_default+0x34/0xe0 [ 31.249833] led_classdev_register_ext+0x311/0x3a0 [ 31.249863] input_leds_connect+0x1dc/0x2a0 [ 31.249889] input_attach_handler.isra.0+0x75/0x90 [ 31.249921] input_register_device.cold+0xa1/0x150 [ 31.249955] hidinput_connect+0x8a2/0xb80 [ 31.249982] hid_connect+0x582/0x5c0 [ 31.250007] hid_hw_start+0x3f/0x60 [ 31.250030] hid_device_probe+0x122/0x1f0 [ 31.250053] really_probe+0xde/0x340 [ 31.250080] __driver_probe_device+0x78/0x110 [ 31.250105] driver_probe_device+0x1f/0xa0 [ 31.250132] __device_attach_driver+0x85/0x110 [ 31.250160] bus_for_each_drv+0x78/0xc0 [ 31.250184] __device_attach+0xb0/0x1b0 [ 31.250207] bus_probe_device+0x94/0xb0 [ 31.250230] device_add+0x64a/0x860 [ 31.250252] hid_add_device+0xe5/0x240 [ 31.250279] usbhid_probe+0x4dc/0x620 [ 31.250303] usb_probe_interface+0xe4/0x2a0 [ 31.250329] really_probe+0xde/0x340 [ 31.250353] __driver_probe_device+0x78/0x110 [ 31.250377] driver_probe_device+0x1f/0xa0 [ 31.250404] __device_attach_driver+0x85/0x110 [ 31.250431] bus_for_each_drv+0x78/0xc0 [ 31.250455] __device_attach+0xb0/0x1b0 [ 31.250478] bus_probe_device+0x94/0xb0 [ 31.250501] device_add+0x64a/0x860 [ 31.250523] usb_set_configuration+0x606/0x8a0 [ 31.250552] usb_generic_driver_probe+0x3e/0x60 [ 31.250579] usb_probe_device+0x3d/0x120 [ 31.250605] really_probe+0xde/0x340 [ 31.250629] __driver_probe_device+0x78/0x110 [ 31.250653] driver_probe_device+0x1f/0xa0 [ 31.250680] __device_attach_driver+0x85/0x110 [ 31.250707] bus_for_each_drv+0x78/0xc0 [ 31.250731] __device_attach+0xb0/0x1b0 [ 31.250753] bus_probe_device+0x94/0xb0 [ 31.250776] device_add+0x64a/0x860 [ 31.250798] usb_new_device.cold+0x141/0x38f [ 31.250828] hub_event+0x1166/0x1980 [ 31.250854] process_one_work+0x20f/0x580 [ 31.250879] worker_thread+0x1d1/0x3b0 [ 31.250904] kthread+0xee/0x120 [ 31.250926] ret_from_fork+0x30/0x50 [ 31.250954] ret_from_fork_asm+0x1a/0x30 [ 31.250982] -> #1 (triggers_list_lock){++++}-{4:4}: [ 31.251022] down_write+0x3b/0xd0 [ 31.251045] led_trigger_register+0x40/0x1b0 [ 31.251074] power_supply_register_led_trigger+0x88/0x150 [ 31.251107] power_supply_create_triggers+0x55/0xe0 [ 31.251135] __power_supply_register.part.0+0x34e/0x4a0 [ 31.251164] devm_power_supply_register+0x70/0xc0 [ 31.251190] bq27xxx_battery_setup+0x1a1/0x6d0 [bq27xxx_battery] [ 31.251235] bq27xxx_battery_i2c_probe+0xe5/0x17f [bq27xxx_battery_i2c] [ 31.251272] i2c_device_probe+0x125/0x2b0 [ 31.251299] really_probe+0xde/0x340 [ 31.251324] __driver_probe_device+0x78/0x110 [ 31.251348] driver_probe_device+0x1f/0xa0 [ 31.251375] __driver_attach+0xba/0x1c0 [ 31.251398] bus_for_each_dev+0x6b/0xb0 [ 31.251421] bus_add_driver+0x111/0x1f0 [ 31.251445] driver_register+0x6e/0xc0 [ 31.251470] i2c_register_driver+0x41/0xb0 [ 31.251498] do_one_initcall+0x5e/0x3a0 [ 31.251522] do_init_module+0x60/0x220 [ 31.251550] __do_sys_init_module+0x15f/0x190 [ 31.251575] do_syscall_64+0x93/0x180 [ 31.251598] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 31.251629] -> #0 (&psy->extensions_sem){.+.+}-{4:4}: [ 31.251668] __lock_acquire+0x13ce/0x21c0 [ 31.251694] lock_acquire+0xcf/0x2e0 [ 31.251719] down_read+0x3e/0x170 [ 31.251741] power_supply_get_property.part.0+0x22/0x150 [ 31.251774] power_supply_update_leds+0x8d/0x230 [ 31.251804] power_supply_led_trigger_activate+0x18/0x20 [ 31.251837] led_trigger_set+0x1fc/0x300 [ 31.251863] led_trigger_set_default+0x90/0xe0 [ 31.251892] led_classdev_register_ext+0x311/0x3a0 [ 31.251921] devm_led_classdev_multicolor_register_ext+0x6e/0xb80 [led_class_multicolor] [ 31.251969] ktd202x_probe+0x464/0x5c0 [leds_ktd202x] [ 31.252002] i2c_device_probe+0x125/0x2b0 [ 31.252027] really_probe+0xde/0x340 [ 31.252052] __driver_probe_device+0x78/0x110 [ 31.252076] driver_probe_device+0x1f/0xa0 [ 31.252103] __driver_attach+0xba/0x1c0 [ 31.252125] bus_for_each_dev+0x6b/0xb0 [ 31.252148] bus_add_driver+0x111/0x1f0 [ 31.252172] driver_register+0x6e/0xc0 [ 31.252197] i2c_register_driver+0x41/0xb0 [ 31.252225] do_one_initcall+0x5e/0x3a0 [ 31.252248] do_init_module+0x60/0x220 [ 31.252274] __do_sys_init_module+0x15f/0x190 [ 31.253986] do_syscall_64+0x93/0x180 [ 31.255826] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 31.257614] other info that might help us debug this: [ 31.257619] Chain exists of: &psy->extensions_sem --> triggers_list_lock --> &led_cdev->trigger_lock [ 31.257630] Possible unsafe locking scenario: [ 31.257632] CPU0 CPU1 [ 31.257633] ---- ---- [ 31.257634] lock(&led_cdev->trigger_lock); [ 31.257637] lock(triggers_list_lock); [ 31.257640] lock(&led_cdev->trigger_lock); [ 31.257643] rlock(&psy->extensions_sem); [ 31.257646] *** DEADLOCK *** [ 31.289433] 4 locks held by (udev-worker)/553: [ 31.289443] #0: ffff892ad9658108 (&dev->mutex){....}-{4:4}, at: __driver_attach+0xaf/0x1c0 [ 31.289463] #1: ffff892adbc0bbc8 (&led_cdev->led_access){+.+.}-{4:4}, at: led_classdev_register_ext+0x1c7/0x3a0 [ 31.289476] #2: ffffffffad0e30b0 (triggers_list_lock){++++}-{4:4}, at: led_trigger_set_default+0x2c/0xe0 [ 31.289487] #3: ffff892adbc0bad0 (&led_cdev->trigger_lock){+.+.}-{4:4}, at: led_trigger_set_default+0x34/0xe0 Fixes: 6037802 ("power: supply: core: implement extension API") Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Armin Wolf <W_Armin@gmx.de> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250130140035.20636-1-hdegoede@redhat.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
rodrigovivi
pushed a commit
that referenced
this pull request
Mar 3, 2025
…rary mm Erhard reports the following KASAN hit on Talos II (power9) with kernel 6.13: [ 12.028126] ================================================================== [ 12.028198] BUG: KASAN: user-memory-access in copy_to_kernel_nofault+0x8c/0x1a0 [ 12.028260] Write of size 8 at addr 0000187e458f2000 by task systemd/1 [ 12.028346] CPU: 87 UID: 0 PID: 1 Comm: systemd Tainted: G T 6.13.0-P9-dirty #3 [ 12.028408] Tainted: [T]=RANDSTRUCT [ 12.028446] Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV [ 12.028500] Call Trace: [ 12.028536] [c000000008dbf3b0] [c000000001656a48] dump_stack_lvl+0xbc/0x110 (unreliable) [ 12.028609] [c000000008dbf3f0] [c0000000006e2fc8] print_report+0x6b0/0x708 [ 12.028666] [c000000008dbf4e0] [c0000000006e2454] kasan_report+0x164/0x300 [ 12.028725] [c000000008dbf600] [c0000000006e54d4] kasan_check_range+0x314/0x370 [ 12.028784] [c000000008dbf640] [c0000000006e6310] __kasan_check_write+0x20/0x40 [ 12.028842] [c000000008dbf660] [c000000000578e8c] copy_to_kernel_nofault+0x8c/0x1a0 [ 12.028902] [c000000008dbf6a0] [c0000000000acfe4] __patch_instructions+0x194/0x210 [ 12.028965] [c000000008dbf6e0] [c0000000000ade80] patch_instructions+0x150/0x590 [ 12.029026] [c000000008dbf7c0] [c0000000001159bc] bpf_arch_text_copy+0x6c/0xe0 [ 12.029085] [c000000008dbf800] [c000000000424250] bpf_jit_binary_pack_finalize+0x40/0xc0 [ 12.029147] [c000000008dbf830] [c000000000115dec] bpf_int_jit_compile+0x3bc/0x930 [ 12.029206] [c000000008dbf990] [c000000000423720] bpf_prog_select_runtime+0x1f0/0x280 [ 12.029266] [c000000008dbfa00] [c000000000434b18] bpf_prog_load+0xbb8/0x1370 [ 12.029324] [c000000008dbfb70] [c000000000436ebc] __sys_bpf+0x5ac/0x2e00 [ 12.029379] [c000000008dbfd00] [c00000000043a228] sys_bpf+0x28/0x40 [ 12.029435] [c000000008dbfd20] [c000000000038eb4] system_call_exception+0x334/0x610 [ 12.029497] [c000000008dbfe50] [c00000000000c270] system_call_vectored_common+0xf0/0x280 [ 12.029561] --- interrupt: 3000 at 0x3fff82f5cfa8 [ 12.029608] NIP: 00003fff82f5cfa8 LR: 00003fff82f5cfa8 CTR: 0000000000000000 [ 12.029660] REGS: c000000008dbfe80 TRAP: 3000 Tainted: G T (6.13.0-P9-dirty) [ 12.029735] MSR: 900000000280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 42004848 XER: 00000000 [ 12.029855] IRQMASK: 0 GPR00: 0000000000000169 00003fffdcf789a0 00003fff83067100 0000000000000005 GPR04: 00003fffdcf78a98 0000000000000090 0000000000000000 0000000000000008 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00003fff836ff7e0 c000000000010678 0000000000000000 GPR16: 0000000000000000 0000000000000000 00003fffdcf78f28 00003fffdcf78f90 GPR20: 0000000000000000 0000000000000000 0000000000000000 00003fffdcf78f80 GPR24: 00003fffdcf78f70 00003fffdcf78d10 00003fff835c7239 00003fffdcf78bd8 GPR28: 00003fffdcf78a98 0000000000000000 0000000000000000 000000011f547580 [ 12.030316] NIP [00003fff82f5cfa8] 0x3fff82f5cfa8 [ 12.030361] LR [00003fff82f5cfa8] 0x3fff82f5cfa8 [ 12.030405] --- interrupt: 3000 [ 12.030444] ================================================================== Commit c28c15b ("powerpc/code-patching: Use temporary mm for Radix MMU") is inspired from x86 but unlike x86 is doesn't disable KASAN reports during patching. This wasn't a problem at the begining because __patch_mem() is not instrumented. Commit 465cabc ("powerpc/code-patching: introduce patch_instructions()") use copy_to_kernel_nofault() to copy several instructions at once. But when using temporary mm the destination is not regular kernel memory but a kind of kernel-like memory located in user address space. Because it is not in kernel address space it is not covered by KASAN shadow memory. Since commit e4137f0 ("mm, kasan, kmsan: instrument copy_from/to_kernel_nofault") KASAN reports bad accesses from copy_to_kernel_nofault(). Here a bad access to user memory is reported because KASAN detects the lack of shadow memory and the address is below TASK_SIZE. Do like x86 in commit b3fd8e8 ("x86/alternatives: Use temporary mm for text poking") and disable KASAN reports during patching when using temporary mm. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Close: https://lore.kernel.org/all/20250201151435.48400261@yea/ Fixes: 465cabc ("powerpc/code-patching: introduce patch_instructions()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/1c05b2a1b02ad75b981cfc45927e0b4a90441046.1738577687.git.christophe.leroy@csgroup.eu
rodrigovivi
pushed a commit
that referenced
this pull request
Mar 6, 2025
…/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.14, take #3 - Fix TCR_EL2 configuration to not use the ASID in TTBR1_EL2 and not mess-up T1SZ/PS by using the HCR_EL2.E2H==0 layout. - Bring back the VMID allocation to the vcpu_load phase, ensuring that we only setup VTTBR_EL2 once on VHE. This cures an ugly race that would lead to running with an unallocated VMID.
rodrigovivi
pushed a commit
that referenced
this pull request
Apr 3, 2025
…ate_pagetables'
This commit addresses a circular locking dependency in the
svm_range_cpu_invalidate_pagetables function. The function previously
held a lock while determining whether to perform an unmap or eviction
operation, which could lead to deadlocks.
Fixes the below:
[ 223.418794] ======================================================
[ 223.418820] WARNING: possible circular locking dependency detected
[ 223.418845] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE
[ 223.418869] ------------------------------------------------------
[ 223.418889] kfdtest/3939 is trying to acquire lock:
[ 223.418906] ffff8957552eae38 (&dqm->lock_hidden){+.+.}-{3:3}, at: evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.419302]
but task is already holding lock:
[ 223.419303] ffff8957556b83b0 (&prange->lock){+.+.}-{3:3}, at: svm_range_cpu_invalidate_pagetables+0x9d/0x850 [amdgpu]
[ 223.419447] Console: switching to colour dummy device 80x25
[ 223.419477] [IGT] amd_basic: executing
[ 223.419599]
which lock already depends on the new lock.
[ 223.419611]
the existing dependency chain (in reverse order) is:
[ 223.419621]
-> #2 (&prange->lock){+.+.}-{3:3}:
[ 223.419636] __mutex_lock+0x85/0xe20
[ 223.419647] mutex_lock_nested+0x1b/0x30
[ 223.419656] svm_range_validate_and_map+0x2f1/0x15b0 [amdgpu]
[ 223.419954] svm_range_set_attr+0xe8c/0x1710 [amdgpu]
[ 223.420236] svm_ioctl+0x46/0x50 [amdgpu]
[ 223.420503] kfd_ioctl_svm+0x50/0x90 [amdgpu]
[ 223.420763] kfd_ioctl+0x409/0x6d0 [amdgpu]
[ 223.421024] __x64_sys_ioctl+0x95/0xd0
[ 223.421036] x64_sys_call+0x1205/0x20d0
[ 223.421047] do_syscall_64+0x87/0x140
[ 223.421056] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 223.421068]
-> #1 (reservation_ww_class_mutex){+.+.}-{3:3}:
[ 223.421084] __ww_mutex_lock.constprop.0+0xab/0x1560
[ 223.421095] ww_mutex_lock+0x2b/0x90
[ 223.421103] amdgpu_amdkfd_alloc_gtt_mem+0xcc/0x2b0 [amdgpu]
[ 223.421361] add_queue_mes+0x3bc/0x440 [amdgpu]
[ 223.421623] unhalt_cpsch+0x1ae/0x240 [amdgpu]
[ 223.421888] kgd2kfd_start_sched+0x5e/0xd0 [amdgpu]
[ 223.422148] amdgpu_amdkfd_start_sched+0x3d/0x50 [amdgpu]
[ 223.422414] amdgpu_gfx_enforce_isolation_handler+0x132/0x270 [amdgpu]
[ 223.422662] process_one_work+0x21e/0x680
[ 223.422673] worker_thread+0x190/0x330
[ 223.422682] kthread+0xe7/0x120
[ 223.422690] ret_from_fork+0x3c/0x60
[ 223.422699] ret_from_fork_asm+0x1a/0x30
[ 223.422708]
-> #0 (&dqm->lock_hidden){+.+.}-{3:3}:
[ 223.422723] __lock_acquire+0x16f4/0x2810
[ 223.422734] lock_acquire+0xd1/0x300
[ 223.422742] __mutex_lock+0x85/0xe20
[ 223.422751] mutex_lock_nested+0x1b/0x30
[ 223.422760] evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.423025] kfd_process_evict_queues+0x8a/0x1d0 [amdgpu]
[ 223.423285] kgd2kfd_quiesce_mm+0x43/0x90 [amdgpu]
[ 223.423540] svm_range_cpu_invalidate_pagetables+0x4a7/0x850 [amdgpu]
[ 223.423807] __mmu_notifier_invalidate_range_start+0x1f5/0x250
[ 223.423819] copy_page_range+0x1e94/0x1ea0
[ 223.423829] copy_process+0x172f/0x2ad0
[ 223.423839] kernel_clone+0x9c/0x3f0
[ 223.423847] __do_sys_clone+0x66/0x90
[ 223.423856] __x64_sys_clone+0x25/0x30
[ 223.423864] x64_sys_call+0x1d7c/0x20d0
[ 223.423872] do_syscall_64+0x87/0x140
[ 223.423880] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 223.423891]
other info that might help us debug this:
[ 223.423903] Chain exists of:
&dqm->lock_hidden --> reservation_ww_class_mutex --> &prange->lock
[ 223.423926] Possible unsafe locking scenario:
[ 223.423935] CPU0 CPU1
[ 223.423942] ---- ----
[ 223.423949] lock(&prange->lock);
[ 223.423958] lock(reservation_ww_class_mutex);
[ 223.423970] lock(&prange->lock);
[ 223.423981] lock(&dqm->lock_hidden);
[ 223.423990]
*** DEADLOCK ***
[ 223.423999] 5 locks held by kfdtest/3939:
[ 223.424006] #0: ffffffffb82b4fc0 (dup_mmap_sem){.+.+}-{0:0}, at: copy_process+0x1387/0x2ad0
[ 223.424026] #1: ffff89575eda81b0 (&mm->mmap_lock){++++}-{3:3}, at: copy_process+0x13a8/0x2ad0
[ 223.424046] #2: ffff89575edaf3b0 (&mm->mmap_lock/1){+.+.}-{3:3}, at: copy_process+0x13e4/0x2ad0
[ 223.424066] #3: ffffffffb82e76e0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: copy_page_range+0x1cea/0x1ea0
[ 223.424088] #4: ffff8957556b83b0 (&prange->lock){+.+.}-{3:3}, at: svm_range_cpu_invalidate_pagetables+0x9d/0x850 [amdgpu]
[ 223.424365]
stack backtrace:
[ 223.424374] CPU: 0 UID: 0 PID: 3939 Comm: kfdtest Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14
[ 223.424392] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 223.424401] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO WIFI/X570 AORUS PRO WIFI, BIOS F36a 02/16/2022
[ 223.424416] Call Trace:
[ 223.424423] <TASK>
[ 223.424430] dump_stack_lvl+0x9b/0xf0
[ 223.424441] dump_stack+0x10/0x20
[ 223.424449] print_circular_bug+0x275/0x350
[ 223.424460] check_noncircular+0x157/0x170
[ 223.424469] ? __bfs+0xfd/0x2c0
[ 223.424481] __lock_acquire+0x16f4/0x2810
[ 223.424490] ? srso_return_thunk+0x5/0x5f
[ 223.424505] lock_acquire+0xd1/0x300
[ 223.424514] ? evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.424783] __mutex_lock+0x85/0xe20
[ 223.424792] ? evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.425058] ? srso_return_thunk+0x5/0x5f
[ 223.425067] ? mark_held_locks+0x54/0x90
[ 223.425076] ? evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.425339] ? srso_return_thunk+0x5/0x5f
[ 223.425350] mutex_lock_nested+0x1b/0x30
[ 223.425358] ? mutex_lock_nested+0x1b/0x30
[ 223.425367] evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.425631] kfd_process_evict_queues+0x8a/0x1d0 [amdgpu]
[ 223.425893] kgd2kfd_quiesce_mm+0x43/0x90 [amdgpu]
[ 223.426156] svm_range_cpu_invalidate_pagetables+0x4a7/0x850 [amdgpu]
[ 223.426423] ? srso_return_thunk+0x5/0x5f
[ 223.426436] __mmu_notifier_invalidate_range_start+0x1f5/0x250
[ 223.426450] copy_page_range+0x1e94/0x1ea0
[ 223.426461] ? srso_return_thunk+0x5/0x5f
[ 223.426474] ? srso_return_thunk+0x5/0x5f
[ 223.426484] ? lock_acquire+0xd1/0x300
[ 223.426494] ? copy_process+0x1718/0x2ad0
[ 223.426502] ? srso_return_thunk+0x5/0x5f
[ 223.426510] ? sched_clock_noinstr+0x9/0x10
[ 223.426519] ? local_clock_noinstr+0xe/0xc0
[ 223.426528] ? copy_process+0x1718/0x2ad0
[ 223.426537] ? srso_return_thunk+0x5/0x5f
[ 223.426550] copy_process+0x172f/0x2ad0
[ 223.426569] kernel_clone+0x9c/0x3f0
[ 223.426577] ? __schedule+0x4c9/0x1b00
[ 223.426586] ? srso_return_thunk+0x5/0x5f
[ 223.426594] ? sched_clock_noinstr+0x9/0x10
[ 223.426602] ? srso_return_thunk+0x5/0x5f
[ 223.426610] ? local_clock_noinstr+0xe/0xc0
[ 223.426619] ? schedule+0x107/0x1a0
[ 223.426629] __do_sys_clone+0x66/0x90
[ 223.426643] __x64_sys_clone+0x25/0x30
[ 223.426652] x64_sys_call+0x1d7c/0x20d0
[ 223.426661] do_syscall_64+0x87/0x140
[ 223.426671] ? srso_return_thunk+0x5/0x5f
[ 223.426679] ? common_nsleep+0x44/0x50
[ 223.426690] ? srso_return_thunk+0x5/0x5f
[ 223.426698] ? trace_hardirqs_off+0x52/0xd0
[ 223.426709] ? srso_return_thunk+0x5/0x5f
[ 223.426717] ? syscall_exit_to_user_mode+0xcc/0x200
[ 223.426727] ? srso_return_thunk+0x5/0x5f
[ 223.426736] ? do_syscall_64+0x93/0x140
[ 223.426748] ? srso_return_thunk+0x5/0x5f
[ 223.426756] ? up_write+0x1c/0x1e0
[ 223.426765] ? srso_return_thunk+0x5/0x5f
[ 223.426775] ? srso_return_thunk+0x5/0x5f
[ 223.426783] ? trace_hardirqs_off+0x52/0xd0
[ 223.426792] ? srso_return_thunk+0x5/0x5f
[ 223.426800] ? syscall_exit_to_user_mode+0xcc/0x200
[ 223.426810] ? srso_return_thunk+0x5/0x5f
[ 223.426818] ? do_syscall_64+0x93/0x140
[ 223.426826] ? syscall_exit_to_user_mode+0xcc/0x200
[ 223.426836] ? srso_return_thunk+0x5/0x5f
[ 223.426844] ? do_syscall_64+0x93/0x140
[ 223.426853] ? srso_return_thunk+0x5/0x5f
[ 223.426861] ? irqentry_exit+0x6b/0x90
[ 223.426869] ? srso_return_thunk+0x5/0x5f
[ 223.426877] ? exc_page_fault+0xa7/0x2c0
[ 223.426888] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 223.426898] RIP: 0033:0x7f46758eab57
[ 223.426906] Code: ba 04 00 f3 0f 1e fa 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 41 89 c0 85 c0 75 2c 64 48 8b 04 25 10 00
[ 223.426930] RSP: 002b:00007fff5c3e5188 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[ 223.426943] RAX: ffffffffffffffda RBX: 00007f4675f8c040 RCX: 00007f46758eab57
[ 223.426954] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[ 223.426965] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 223.426975] R10: 00007f4675e81a50 R11: 0000000000000246 R12: 0000000000000001
[ 223.426986] R13: 00007fff5c3e5470 R14: 00007fff5c3e53e0 R15: 00007fff5c3e5410
[ 223.427004] </TASK>
v2: To resolve this issue, the allocation of the process context buffer
(`proc_ctx_bo`) has been moved from the `add_queue_mes` function to the
`pqm_create_queue` function. This change ensures that the buffer is
allocated only when the first queue for a process is created and only if
the Micro Engine Scheduler (MES) is enabled. (Felix)
v3: Fix typo s/Memory Execution Scheduler (MES)/Micro Engine Scheduler
in commit message. (Lijo)
Fixes: 438b39a ("drm/amdkfd: pause autosuspend when creating pdd")
Cc: Jesse Zhang <jesse.zhang@amd.com>
Cc: Yunxiang Li <Yunxiang.Li@amd.com>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
rodrigovivi
pushed a commit
that referenced
this pull request
Apr 3, 2025
Use raw_spinlock in order to fix spurious messages about invalid context
when spinlock debugging is enabled. The lock is only used to serialize
register access.
[ 4.239592] =============================
[ 4.239595] [ BUG: Invalid wait context ]
[ 4.239599] 6.13.0-rc7-arm64-renesas-05496-gd088502a519f #35 Not tainted
[ 4.239603] -----------------------------
[ 4.239606] kworker/u8:5/76 is trying to lock:
[ 4.239609] ffff0000091898a0 (&p->lock){....}-{3:3}, at: gpio_rcar_config_interrupt_input_mode+0x34/0x164
[ 4.239641] other info that might help us debug this:
[ 4.239643] context-{5:5}
[ 4.239646] 5 locks held by kworker/u8:5/76:
[ 4.239651] #0: ffff0000080fb148 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x190/0x62c
[ 4.250180] OF: /soc/sound@ec500000/ports/port@0/endpoint: Read of boolean property 'frame-master' with a value.
[ 4.254094] #1: ffff80008299bd80 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1b8/0x62c
[ 4.254109] #2: ffff00000920c8f8
[ 4.258345] OF: /soc/sound@ec500000/ports/port@1/endpoint: Read of boolean property 'bitclock-master' with a value.
[ 4.264803] (&dev->mutex){....}-{4:4}, at: __device_attach_async_helper+0x3c/0xdc
[ 4.264820] #3: ffff00000a50ca40 (request_class#2){+.+.}-{4:4}, at: __setup_irq+0xa0/0x690
[ 4.264840] #4:
[ 4.268872] OF: /soc/sound@ec500000/ports/port@1/endpoint: Read of boolean property 'frame-master' with a value.
[ 4.273275] ffff00000a50c8c8 (lock_class){....}-{2:2}, at: __setup_irq+0xc4/0x690
[ 4.296130] renesas_sdhi_internal_dmac ee100000.mmc: mmc1 base at 0x00000000ee100000, max clock rate 200 MHz
[ 4.304082] stack backtrace:
[ 4.304086] CPU: 1 UID: 0 PID: 76 Comm: kworker/u8:5 Not tainted 6.13.0-rc7-arm64-renesas-05496-gd088502a519f #35
[ 4.304092] Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
[ 4.304097] Workqueue: async async_run_entry_fn
[ 4.304106] Call trace:
[ 4.304110] show_stack+0x14/0x20 (C)
[ 4.304122] dump_stack_lvl+0x6c/0x90
[ 4.304131] dump_stack+0x14/0x1c
[ 4.304138] __lock_acquire+0xdfc/0x1584
[ 4.426274] lock_acquire+0x1c4/0x33c
[ 4.429942] _raw_spin_lock_irqsave+0x5c/0x80
[ 4.434307] gpio_rcar_config_interrupt_input_mode+0x34/0x164
[ 4.440061] gpio_rcar_irq_set_type+0xd4/0xd8
[ 4.444422] __irq_set_trigger+0x5c/0x178
[ 4.448435] __setup_irq+0x2e4/0x690
[ 4.452012] request_threaded_irq+0xc4/0x190
[ 4.456285] devm_request_threaded_irq+0x7c/0xf4
[ 4.459398] ata1: link resume succeeded after 1 retries
[ 4.460902] mmc_gpiod_request_cd_irq+0x68/0xe0
[ 4.470660] mmc_start_host+0x50/0xac
[ 4.474327] mmc_add_host+0x80/0xe4
[ 4.477817] tmio_mmc_host_probe+0x2b0/0x440
[ 4.482094] renesas_sdhi_probe+0x488/0x6f4
[ 4.486281] renesas_sdhi_internal_dmac_probe+0x60/0x78
[ 4.491509] platform_probe+0x64/0xd8
[ 4.495178] really_probe+0xb8/0x2a8
[ 4.498756] __driver_probe_device+0x74/0x118
[ 4.503116] driver_probe_device+0x3c/0x154
[ 4.507303] __device_attach_driver+0xd4/0x160
[ 4.511750] bus_for_each_drv+0x84/0xe0
[ 4.515588] __device_attach_async_helper+0xb0/0xdc
[ 4.520470] async_run_entry_fn+0x30/0xd8
[ 4.524481] process_one_work+0x210/0x62c
[ 4.528494] worker_thread+0x1ac/0x340
[ 4.532245] kthread+0x10c/0x110
[ 4.535476] ret_from_fork+0x10/0x20
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250121135833.3769310-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
rodrigovivi
pushed a commit
that referenced
this pull request
Apr 3, 2025
…cal section
A circular lock dependency splat has been seen involving down_trylock():
======================================================
WARNING: possible circular locking dependency detected
6.12.0-41.el10.s390x+debug
------------------------------------------------------
dd/32479 is trying to acquire lock:
0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90
but task is already holding lock:
000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0
the existing dependency chain (in reverse order) is:
-> #4 (&zone->lock){-.-.}-{2:2}:
-> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
-> #2 (&rq->__lock){-.-.}-{2:2}:
-> #1 (&p->pi_lock){-.-.}-{2:2}:
-> #0 ((console_sem).lock){-.-.}-{2:2}:
The console_sem -> pi_lock dependency is due to calling try_to_wake_up()
while holding the console_sem raw_spinlock. This dependency can be broken
by using wake_q to do the wakeup instead of calling try_to_wake_up()
under the console_sem lock. This will also make the semaphore's
raw_spinlock become a terminal lock without taking any further locks
underneath it.
The hrtimer_bases.lock is a raw_spinlock while zone->lock is a
spinlock. The hrtimer_bases.lock -> zone->lock dependency happens via
the debug_objects_fill_pool() helper function in the debugobjects code.
-> #4 (&zone->lock){-.-.}-{2:2}:
__lock_acquire+0xe86/0x1cc0
lock_acquire.part.0+0x258/0x630
lock_acquire+0xb8/0xe0
_raw_spin_lock_irqsave+0xb4/0x120
rmqueue_bulk+0xac/0x8f0
__rmqueue_pcplist+0x580/0x830
rmqueue_pcplist+0xfc/0x470
rmqueue.isra.0+0xdec/0x11b0
get_page_from_freelist+0x2ee/0xeb0
__alloc_pages_noprof+0x2c2/0x520
alloc_pages_mpol_noprof+0x1fc/0x4d0
alloc_pages_noprof+0x8c/0xe0
allocate_slab+0x320/0x460
___slab_alloc+0xa58/0x12b0
__slab_alloc.isra.0+0x42/0x60
kmem_cache_alloc_noprof+0x304/0x350
fill_pool+0xf6/0x450
debug_object_activate+0xfe/0x360
enqueue_hrtimer+0x34/0x190
__run_hrtimer+0x3c8/0x4c0
__hrtimer_run_queues+0x1b2/0x260
hrtimer_interrupt+0x316/0x760
do_IRQ+0x9a/0xe0
do_irq_async+0xf6/0x160
Normally a raw_spinlock to spinlock dependency is not legitimate
and will be warned if CONFIG_PROVE_RAW_LOCK_NESTING is enabled,
but debug_objects_fill_pool() is an exception as it explicitly
allows this dependency for non-PREEMPT_RT kernel without causing
PROVE_RAW_LOCK_NESTING lockdep splat. As a result, this dependency is
legitimate and not a bug.
Anyway, semaphore is the only locking primitive left that is still
using try_to_wake_up() to do wakeup inside critical section, all the
other locking primitives had been migrated to use wake_q to do wakeup
outside of the critical section. It is also possible that there are
other circular locking dependencies involving printk/console_sem or
other existing/new semaphores lurking somewhere which may show up in
the future. Let just do the migration now to wake_q to avoid headache
like this.
Reported-by: yzbot+ed801a886dfdbfe7136d@syzkaller.appspotmail.com
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250307232717.1759087-3-boqun.feng@gmail.com
rodrigovivi
pushed a commit
that referenced
this pull request
Apr 3, 2025
When mb-xdp is set and return is XDP_PASS, packet is converted from xdp_buff to sk_buff with xdp_update_skb_shared_info() in bnxt_xdp_build_skb(). bnxt_xdp_build_skb() passes incorrect truesize argument to xdp_update_skb_shared_info(). The truesize is calculated as BNXT_RX_PAGE_SIZE * sinfo->nr_frags but the skb_shared_info was wiped by napi_build_skb() before. So it stores sinfo->nr_frags before bnxt_xdp_build_skb() and use it instead of getting skb_shared_info from xdp_get_shared_info_from_buff(). Splat looks like: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 0 at net/core/skbuff.c:6072 skb_try_coalesce+0x504/0x590 Modules linked in: xt_nat xt_tcpudp veth af_packet xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xfrm_user xt_addrtype nft_coms CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Not tainted 6.14.0-rc2+ #3 RIP: 0010:skb_try_coalesce+0x504/0x590 Code: 4b fd ff ff 49 8b 34 24 40 80 e6 40 0f 84 3d fd ff ff 49 8b 74 24 48 40 f6 c6 01 0f 84 2e fd ff ff 48 8d 4e ff e9 25 fd ff ff <0f> 0b e99 RSP: 0018:ffffb62c4120caa8 EFLAGS: 00010287 RAX: 0000000000000003 RBX: ffffb62c4120cb14 RCX: 0000000000000ec0 RDX: 0000000000001000 RSI: ffffa06e5d7dc000 RDI: 0000000000000003 RBP: ffffa06e5d7ddec0 R08: ffffa06e6120a800 R09: ffffa06e7a119900 R10: 0000000000002310 R11: ffffa06e5d7dcec0 R12: ffffe4360575f740 R13: ffffe43600000000 R14: 0000000000000002 R15: 0000000000000002 FS: 0000000000000000(0000) GS:ffffa0755f700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f147b76b0f8 CR3: 00000001615d4000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <IRQ> ? __warn+0x84/0x130 ? skb_try_coalesce+0x504/0x590 ? report_bug+0x18a/0x1a0 ? handle_bug+0x53/0x90 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? skb_try_coalesce+0x504/0x590 inet_frag_reasm_finish+0x11f/0x2e0 ip_defrag+0x37a/0x900 ip_local_deliver+0x51/0x120 ip_sublist_rcv_finish+0x64/0x70 ip_sublist_rcv+0x179/0x210 ip_list_rcv+0xf9/0x130 How to reproduce: <Node A> ip link set $interface1 xdp obj xdp_pass.o ip link set $interface1 mtu 9000 up ip a a 10.0.0.1/24 dev $interface1 <Node B> ip link set $interfac2 mtu 9000 up ip a a 10.0.0.2/24 dev $interface2 ping 10.0.0.1 -s 65000 Following ping.py patch adds xdp-mb-pass case. so ping.py is going to be able to reproduce this issue. Fixes: 1dc4c55 ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250309134219.91670-2-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rodrigovivi
pushed a commit
that referenced
this pull request
Apr 3, 2025
We have recently seen report of lockdep circular lock dependency warnings on platforms like skykale and kabylake: ====================================================== WARNING: possible circular locking dependency detected 6.14.0-rc6-CI_DRM_16276-gca2c04fe76e8+ #1 Not tainted ------------------------------------------------------ swapper/0/1 is trying to acquire lock: ffffffff8360ee48 (iommu_probe_device_lock){+.+.}-{3:3}, at: iommu_probe_device+0x1d/0x70 but task is already holding lock: ffff888102c7efa8 (&device->physical_node_lock){+.+.}-{3:3}, at: intel_iommu_init+0xe75/0x11f0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #6 (&device->physical_node_lock){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 intel_iommu_init+0xe75/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #5 (dmar_global_lock){++++}-{3:3}: down_read+0x43/0x1d0 enable_drhd_fault_handling+0x21/0x110 cpuhp_invoke_callback+0x4c6/0x870 cpuhp_issue_call+0xbf/0x1f0 __cpuhp_setup_state_cpuslocked+0x111/0x320 __cpuhp_setup_state+0xb0/0x220 irq_remap_enable_fault_handling+0x3f/0xa0 apic_intr_mode_init+0x5c/0x110 x86_late_time_init+0x24/0x40 start_kernel+0x895/0xbd0 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0xbf/0x110 common_startup_64+0x13e/0x141 -> #4 (cpuhp_state_mutex){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 __cpuhp_setup_state_cpuslocked+0x67/0x320 __cpuhp_setup_state+0xb0/0x220 page_alloc_init_cpuhp+0x2d/0x60 mm_core_init+0x18/0x2c0 start_kernel+0x576/0xbd0 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0xbf/0x110 common_startup_64+0x13e/0x141 -> #3 (cpu_hotplug_lock){++++}-{0:0}: __cpuhp_state_add_instance+0x4f/0x220 iova_domain_init_rcaches+0x214/0x280 iommu_setup_dma_ops+0x1a4/0x710 iommu_device_register+0x17d/0x260 intel_iommu_init+0xda4/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #2 (&domain->iova_cookie->mutex){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 iommu_setup_dma_ops+0x16b/0x710 iommu_device_register+0x17d/0x260 intel_iommu_init+0xda4/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #1 (&group->mutex){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 __iommu_probe_device+0x24c/0x4e0 probe_iommu_group+0x2b/0x50 bus_for_each_dev+0x7d/0xe0 iommu_device_register+0xe1/0x260 intel_iommu_init+0xda4/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #0 (iommu_probe_device_lock){+.+.}-{3:3}: __lock_acquire+0x1637/0x2810 lock_acquire+0xc9/0x300 __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 iommu_probe_device+0x1d/0x70 intel_iommu_init+0xe90/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 other info that might help us debug this: Chain exists of: iommu_probe_device_lock --> dmar_global_lock --> &device->physical_node_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&device->physical_node_lock); lock(dmar_global_lock); lock(&device->physical_node_lock); lock(iommu_probe_device_lock); *** DEADLOCK *** This driver uses a global lock to protect the list of enumerated DMA remapping units. It is necessary due to the driver's support for dynamic addition and removal of remapping units at runtime. Two distinct code paths require iteration over this remapping unit list: - Device registration and probing: the driver iterates the list to register each remapping unit with the upper layer IOMMU framework and subsequently probe the devices managed by that unit. - Global configuration: Upper layer components may also iterate the list to apply configuration changes. The lock acquisition order between these two code paths was reversed. This caused lockdep warnings, indicating a risk of deadlock. Fix this warning by releasing the global lock before invoking upper layer interfaces for device registration. Fixes: b150654 ("iommu/vt-d: Fix suspicious RCU usage") Closes: https://lore.kernel.org/linux-iommu/SJ1PR11MB612953431F94F18C954C4A9CB9D32@SJ1PR11MB6129.namprd11.prod.outlook.com/ Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> [ for topic/core-for-CI only ] References: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13927 Link: https://lore.kernel.org/r/20250317035714.1041549-1-baolu.lu@linux.intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 9, 2025
The existing code move the VF NIC to new namespace when NETDEV_REGISTER is received on netvsc NIC. During deletion of the namespace, default_device_exit_batch() >> default_device_exit_net() is called. When netvsc NIC is moved back and registered to the default namespace, it automatically brings VF NIC back to the default namespace. This will cause the default_device_exit_net() >> for_each_netdev_safe loop unable to detect the list end, and hit NULL ptr: [ 231.449420] mana 7870:00:00.0 enP30832s1: Moved VF to namespace with: eth0 [ 231.449656] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 231.450246] #PF: supervisor read access in kernel mode [ 231.450579] #PF: error_code(0x0000) - not-present page [ 231.450916] PGD 17b8a8067 P4D 0 [ 231.451163] Oops: Oops: 0000 [#1] SMP NOPTI [ 231.451450] CPU: 82 UID: 0 PID: 1394 Comm: kworker/u768:1 Not tainted 6.16.0-rc4+ #3 VOLUNTARY [ 231.452042] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024 [ 231.452692] Workqueue: netns cleanup_net [ 231.452947] RIP: 0010:default_device_exit_batch+0x16c/0x3f0 [ 231.453326] Code: c0 0c f5 b3 e8 d5 db fe ff 48 85 c0 74 15 48 c7 c2 f8 fd ca b2 be 10 00 00 00 48 8d 7d c0 e8 7b 77 25 00 49 8b 86 28 01 00 00 <48> 8b 50 10 4c 8b 2a 4c 8d 62 f0 49 83 ed 10 4c 39 e0 0f 84 d6 00 [ 231.454294] RSP: 0018:ff75fc7c9bf9fd00 EFLAGS: 00010246 [ 231.454610] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 61c8864680b583eb [ 231.455094] RDX: ff1fa9f71462d800 RSI: ff75fc7c9bf9fd38 RDI: 0000000030766564 [ 231.455686] RBP: ff75fc7c9bf9fd78 R08: 0000000000000000 R09: 0000000000000000 [ 231.456126] R10: 0000000000000001 R11: 0000000000000004 R12: ff1fa9f70088e340 [ 231.456621] R13: ff1fa9f70088e340 R14: ffffffffb3f50c20 R15: ff1fa9f7103e6340 [ 231.457161] FS: 0000000000000000(0000) GS:ff1faa6783a08000(0000) knlGS:0000000000000000 [ 231.457707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 231.458031] CR2: 0000000000000010 CR3: 0000000179ab2006 CR4: 0000000000b73ef0 [ 231.458434] Call Trace: [ 231.458600] <TASK> [ 231.458777] ops_undo_list+0x100/0x220 [ 231.459015] cleanup_net+0x1b8/0x300 [ 231.459285] process_one_work+0x184/0x340 To fix it, move the ns change to a workqueue, and take rtnl_lock to avoid changing the netdev list when default_device_exit_net() is using it. Cc: stable@vger.kernel.org Fixes: 4c26280 ("hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1754511711-11188-1-git-send-email-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 9, 2025
…dlock
When a user creates a dualpi2 qdisc it automatically sets a timer. This
timer will run constantly and update the qdisc's probability field.
The issue is that the timer acquires the qdisc root lock and runs in
hardirq. The qdisc root lock is also acquired in dev.c whenever a packet
arrives for this qdisc. Since the dualpi2 timer callback runs in hardirq,
it may interrupt the packet processing running in softirq. If that happens
and it runs on the same CPU, it will acquire the same lock and cause a
deadlock. The following splat shows up when running a kernel compiled with
lock debugging:
[ +0.000224] WARNING: inconsistent lock state
[ +0.000224] 6.16.0+ #10 Not tainted
[ +0.000169] --------------------------------
[ +0.000029] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[ +0.000000] ping/156 [HC0[0]:SC0[2]:HE1:SE0] takes:
[ +0.000000] ffff897841242110 (&sch->root_lock_key){?.-.}-{3:3}, at: __dev_queue_xmit+0x86d/0x1140
[ +0.000000] {IN-HARDIRQ-W} state was registered at:
[ +0.000000] lock_acquire.part.0+0xb6/0x220
[ +0.000000] _raw_spin_lock+0x31/0x80
[ +0.000000] dualpi2_timer+0x6f/0x270
[ +0.000000] __hrtimer_run_queues+0x1c5/0x360
[ +0.000000] hrtimer_interrupt+0x115/0x260
[ +0.000000] __sysvec_apic_timer_interrupt+0x6d/0x1a0
[ +0.000000] sysvec_apic_timer_interrupt+0x6e/0x80
[ +0.000000] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ +0.000000] pv_native_safe_halt+0xf/0x20
[ +0.000000] default_idle+0x9/0x10
[ +0.000000] default_idle_call+0x7e/0x1e0
[ +0.000000] do_idle+0x1e8/0x250
[ +0.000000] cpu_startup_entry+0x29/0x30
[ +0.000000] rest_init+0x151/0x160
[ +0.000000] start_kernel+0x6f3/0x700
[ +0.000000] x86_64_start_reservations+0x24/0x30
[ +0.000000] x86_64_start_kernel+0xc8/0xd0
[ +0.000000] common_startup_64+0x13e/0x148
[ +0.000000] irq event stamp: 6884
[ +0.000000] hardirqs last enabled at (6883): [<ffffffffa75700b3>] neigh_resolve_output+0x223/0x270
[ +0.000000] hardirqs last disabled at (6882): [<ffffffffa7570078>] neigh_resolve_output+0x1e8/0x270
[ +0.000000] softirqs last enabled at (6880): [<ffffffffa757006b>] neigh_resolve_output+0x1db/0x270
[ +0.000000] softirqs last disabled at (6884): [<ffffffffa755b533>] __dev_queue_xmit+0x73/0x1140
[ +0.000000]
other info that might help us debug this:
[ +0.000000] Possible unsafe locking scenario:
[ +0.000000] CPU0
[ +0.000000] ----
[ +0.000000] lock(&sch->root_lock_key);
[ +0.000000] <Interrupt>
[ +0.000000] lock(&sch->root_lock_key);
[ +0.000000]
*** DEADLOCK ***
[ +0.000000] 4 locks held by ping/156:
[ +0.000000] #0: ffff897842332e08 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x41e/0xf40
[ +0.000000] #1: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_output+0x2c/0x190
[ +0.000000] #2: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_finish_output2+0xad/0x950
[ +0.000000] #3: ffffffffa816f840 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x73/0x1140
I am able to reproduce it consistently when running the following:
tc qdisc add dev lo handle 1: root dualpi2
ping -f 127.0.0.1
To fix it, make the timer run in softirq.
Fixes: 320d031 ("sched: Struct definition and parsing of dualpi2 qdisc")
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250815135317.664993-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 9, 2025
In the snd_utimer_create() function, if the kasprintf() function return NULL, snd_utimer_put_id() will be called, finally use ida_free() to free the unallocated id 0. the syzkaller reported the following information: ------------[ cut here ]------------ ida_free called for id=0 which is not allocated. WARNING: CPU: 1 PID: 1286 at lib/idr.c:592 ida_free+0x1fd/0x2f0 lib/idr.c:592 Modules linked in: CPU: 1 UID: 0 PID: 1286 Comm: syz-executor164 Not tainted 6.15.8 #3 PREEMPT(lazy) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.fc42 04/01/2014 RIP: 0010:ida_free+0x1fd/0x2f0 lib/idr.c:592 Code: f8 fc 41 83 fc 3e 76 69 e8 70 b2 f8 (...) RSP: 0018:ffffc900007f79c8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 1ffff920000fef3b RCX: ffffffff872176a5 RDX: ffff88800369d200 RSI: 0000000000000000 RDI: ffff88800369d200 RBP: 0000000000000000 R08: ffffffff87ba60a5 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f6f1abc1740(0000) GS:ffff8880d76a0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6f1ad7a784 CR3: 000000007a6e2000 CR4: 00000000000006f0 Call Trace: <TASK> snd_utimer_put_id sound/core/timer.c:2043 [inline] [snd_timer] snd_utimer_create+0x59b/0x6a0 sound/core/timer.c:2184 [snd_timer] snd_utimer_ioctl_create sound/core/timer.c:2202 [inline] [snd_timer] __snd_timer_user_ioctl.isra.0+0x724/0x1340 sound/core/timer.c:2287 [snd_timer] snd_timer_user_ioctl+0x75/0xc0 sound/core/timer.c:2298 [snd_timer] vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __x64_sys_ioctl+0x198/0x200 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x7b/0x160 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e [...] The utimer->id should be set properly before the kasprintf() function, ensures the snd_utimer_put_id() function will free the allocated id. Fixes: 3774591 ("ALSA: timer: Introduce virtual userspace-driven timers") Signed-off-by: Dewei Meng <mengdewei@cqsoftware.com.cn> Link: https://patch.msgid.link/20250821014317.40786-1-mengdewei@cqsoftware.com.cn Signed-off-by: Takashi Iwai <tiwai@suse.de>
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 9, 2025
Commit 3c7ac40 ("scsi: ufs: core: Delegate the interrupt service routine to a threaded IRQ handler") introduced an IRQ lock inversion issue. Fix this lock inversion by changing the spin_lock_irq() calls into spin_lock_irqsave() calls in code that can be called either from interrupt context or from thread context. This patch fixes the following lockdep complaint: WARNING: possible irq lock inversion dependency detected 6.12.30-android16-5-maybe-dirty-4k #1 Tainted: G W OE -------------------------------------------------------- kworker/u28:0/12 just changed the state of lock: ffffff881e29dd60 (&hba->clk_gating.lock){-...}-{2:2}, at: ufshcd_release_scsi_cmd+0x60/0x110 but this lock took another, HARDIRQ-unsafe lock in the past: (shost->host_lock){+.+.}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(shost->host_lock); local_irq_disable(); lock(&hba->clk_gating.lock); lock(shost->host_lock); <Interrupt> lock(&hba->clk_gating.lock); *** DEADLOCK *** 4 locks held by kworker/u28:0/12: #0: ffffff8800ac6158 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x1bc/0x65c #1: ffffffc085c93d70 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1e4/0x65c #2: ffffff881e29c0e0 (&shost->scan_mutex){+.+.}-{3:3}, at: __scsi_add_device+0x74/0x120 #3: ffffff881960ea00 (&hwq->cq_lock){-...}-{2:2}, at: ufshcd_mcq_poll_cqe_lock+0x28/0x104 the shortest dependencies between 2nd lock and 1st lock: -> (shost->host_lock){+.+.}-{2:2} { HARDIRQ-ON-W at: lock_acquire+0x134/0x2b4 _raw_spin_lock+0x48/0x64 ufshcd_sl_intr+0x4c/0xa08 ufshcd_threaded_intr+0x70/0x12c irq_thread_fn+0x48/0xa8 irq_thread+0x130/0x1ec kthread+0x110/0x134 ret_from_fork+0x10/0x20 SOFTIRQ-ON-W at: lock_acquire+0x134/0x2b4 _raw_spin_lock+0x48/0x64 ufshcd_sl_intr+0x4c/0xa08 ufshcd_threaded_intr+0x70/0x12c irq_thread_fn+0x48/0xa8 irq_thread+0x130/0x1ec kthread+0x110/0x134 ret_from_fork+0x10/0x20 INITIAL USE at: lock_acquire+0x134/0x2b4 _raw_spin_lock+0x48/0x64 ufshcd_sl_intr+0x4c/0xa08 ufshcd_threaded_intr+0x70/0x12c irq_thread_fn+0x48/0xa8 irq_thread+0x130/0x1ec kthread+0x110/0x134 ret_from_fork+0x10/0x20 } ... key at: [<ffffffc085ba1a98>] scsi_host_alloc.__key+0x0/0x10 ... acquired at: _raw_spin_lock_irqsave+0x5c/0x80 __ufshcd_release+0x78/0x118 ufshcd_send_uic_cmd+0xe4/0x118 ufshcd_dme_set_attr+0x88/0x1c8 ufs_google_phy_initialization+0x68/0x418 [ufs] ufs_google_link_startup_notify+0x78/0x27c [ufs] ufshcd_link_startup+0x84/0x720 ufshcd_init+0xf3c/0x1330 ufshcd_pltfrm_init+0x728/0x7d8 ufs_google_probe+0x30/0x84 [ufs] platform_probe+0xa0/0xe0 really_probe+0x114/0x454 __driver_probe_device+0xa4/0x160 driver_probe_device+0x44/0x23c __driver_attach_async_helper+0x60/0xd4 async_run_entry_fn+0x4c/0x17c process_one_work+0x26c/0x65c worker_thread+0x33c/0x498 kthread+0x110/0x134 ret_from_fork+0x10/0x20 -> (&hba->clk_gating.lock){-...}-{2:2} { IN-HARDIRQ-W at: lock_acquire+0x134/0x2b4 _raw_spin_lock_irqsave+0x5c/0x80 ufshcd_release_scsi_cmd+0x60/0x110 ufshcd_compl_one_cqe+0x2c0/0x3f4 ufshcd_mcq_poll_cqe_lock+0xb0/0x104 ufs_google_mcq_intr+0x80/0xa0 [ufs] __handle_irq_event_percpu+0x104/0x32c handle_irq_event+0x40/0x9c handle_fasteoi_irq+0x170/0x2e8 generic_handle_domain_irq+0x58/0x80 gic_handle_irq+0x48/0x104 call_on_irq_stack+0x3c/0x50 do_interrupt_handler+0x7c/0xd8 el1_interrupt+0x34/0x58 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x68/0x6c _raw_spin_unlock_irqrestore+0x3c/0x6c debug_object_assert_init+0x16c/0x21c __mod_timer+0x4c/0x48c schedule_timeout+0xd4/0x16c io_schedule_timeout+0x48/0x70 do_wait_for_common+0x100/0x194 wait_for_completion_io_timeout+0x48/0x6c blk_execute_rq+0x124/0x17c scsi_execute_cmd+0x18c/0x3f8 scsi_probe_and_add_lun+0x204/0xd74 __scsi_add_device+0xbc/0x120 ufshcd_async_scan+0x80/0x3c0 async_run_entry_fn+0x4c/0x17c process_one_work+0x26c/0x65c worker_thread+0x33c/0x498 kthread+0x110/0x134 ret_from_fork+0x10/0x20 INITIAL USE at: lock_acquire+0x134/0x2b4 _raw_spin_lock_irqsave+0x5c/0x80 ufshcd_hold+0x34/0x14c ufshcd_send_uic_cmd+0x28/0x118 ufshcd_dme_set_attr+0x88/0x1c8 ufs_google_phy_initialization+0x68/0x418 [ufs] ufs_google_link_startup_notify+0x78/0x27c [ufs] ufshcd_link_startup+0x84/0x720 ufshcd_init+0xf3c/0x1330 ufshcd_pltfrm_init+0x728/0x7d8 ufs_google_probe+0x30/0x84 [ufs] platform_probe+0xa0/0xe0 really_probe+0x114/0x454 __driver_probe_device+0xa4/0x160 driver_probe_device+0x44/0x23c __driver_attach_async_helper+0x60/0xd4 async_run_entry_fn+0x4c/0x17c process_one_work+0x26c/0x65c worker_thread+0x33c/0x498 kthread+0x110/0x134 ret_from_fork+0x10/0x20 } ... key at: [<ffffffc085ba6fe8>] ufshcd_init.__key+0x0/0x10 ... acquired at: mark_lock+0x1c4/0x224 __lock_acquire+0x438/0x2e1c lock_acquire+0x134/0x2b4 _raw_spin_lock_irqsave+0x5c/0x80 ufshcd_release_scsi_cmd+0x60/0x110 ufshcd_compl_one_cqe+0x2c0/0x3f4 ufshcd_mcq_poll_cqe_lock+0xb0/0x104 ufs_google_mcq_intr+0x80/0xa0 [ufs] __handle_irq_event_percpu+0x104/0x32c handle_irq_event+0x40/0x9c handle_fasteoi_irq+0x170/0x2e8 generic_handle_domain_irq+0x58/0x80 gic_handle_irq+0x48/0x104 call_on_irq_stack+0x3c/0x50 do_interrupt_handler+0x7c/0xd8 el1_interrupt+0x34/0x58 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x68/0x6c _raw_spin_unlock_irqrestore+0x3c/0x6c debug_object_assert_init+0x16c/0x21c __mod_timer+0x4c/0x48c schedule_timeout+0xd4/0x16c io_schedule_timeout+0x48/0x70 do_wait_for_common+0x100/0x194 wait_for_completion_io_timeout+0x48/0x6c blk_execute_rq+0x124/0x17c scsi_execute_cmd+0x18c/0x3f8 scsi_probe_and_add_lun+0x204/0xd74 __scsi_add_device+0xbc/0x120 ufshcd_async_scan+0x80/0x3c0 async_run_entry_fn+0x4c/0x17c process_one_work+0x26c/0x65c worker_thread+0x33c/0x498 kthread+0x110/0x134 ret_from_fork+0x10/0x20 stack backtrace: CPU: 6 UID: 0 PID: 12 Comm: kworker/u28:0 Tainted: G W OE 6.12.30-android16-5-maybe-dirty-4k #1 ccd4020fe444bdf629efc3b86df6be920b8df7d0 Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: Spacecraft board based on MALIBU (DT) Workqueue: async async_run_entry_fn Call trace: dump_backtrace+0xfc/0x17c show_stack+0x18/0x28 dump_stack_lvl+0x40/0xa0 dump_stack+0x18/0x24 print_irq_inversion_bug+0x2fc/0x304 mark_lock_irq+0x388/0x4fc mark_lock+0x1c4/0x224 __lock_acquire+0x438/0x2e1c lock_acquire+0x134/0x2b4 _raw_spin_lock_irqsave+0x5c/0x80 ufshcd_release_scsi_cmd+0x60/0x110 ufshcd_compl_one_cqe+0x2c0/0x3f4 ufshcd_mcq_poll_cqe_lock+0xb0/0x104 ufs_google_mcq_intr+0x80/0xa0 [ufs dd6f385554e109da094ab91d5f7be18625a2222a] __handle_irq_event_percpu+0x104/0x32c handle_irq_event+0x40/0x9c handle_fasteoi_irq+0x170/0x2e8 generic_handle_domain_irq+0x58/0x80 gic_handle_irq+0x48/0x104 call_on_irq_stack+0x3c/0x50 do_interrupt_handler+0x7c/0xd8 el1_interrupt+0x34/0x58 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x68/0x6c _raw_spin_unlock_irqrestore+0x3c/0x6c debug_object_assert_init+0x16c/0x21c __mod_timer+0x4c/0x48c schedule_timeout+0xd4/0x16c io_schedule_timeout+0x48/0x70 do_wait_for_common+0x100/0x194 wait_for_completion_io_timeout+0x48/0x6c blk_execute_rq+0x124/0x17c scsi_execute_cmd+0x18c/0x3f8 scsi_probe_and_add_lun+0x204/0xd74 __scsi_add_device+0xbc/0x120 ufshcd_async_scan+0x80/0x3c0 async_run_entry_fn+0x4c/0x17c process_one_work+0x26c/0x65c worker_thread+0x33c/0x498 kthread+0x110/0x134 ret_from_fork+0x10/0x20 Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: André Draszik <andre.draszik@linaro.org> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Fixes: 3c7ac40 ("scsi: ufs: core: Delegate the interrupt service routine to a threaded IRQ handler") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20250815155842.472867-2-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 9, 2025
These iterations require the read lock, otherwise RCU
lockdep will splat:
=============================
WARNING: suspicious RCU usage
6.17.0-rc3-00014-g31419c045d64 #6 Tainted: G O
-----------------------------
drivers/base/power/main.c:1333 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
5 locks held by rtcwake/547:
#0: 00000000643ab418 (sb_writers#6){.+.+}-{0:0}, at: file_start_write+0x2b/0x3a
#1: 0000000067a0ca88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x181/0x24b
#2: 00000000631eac40 (kn->active#3){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x191/0x24b
#3: 00000000609a1308 (system_transition_mutex){+.+.}-{4:4}, at: pm_suspend+0xaf/0x30b
#4: 0000000060c0fdb0 (device_links_srcu){.+.+}-{0:0}, at: device_links_read_lock+0x75/0x98
stack backtrace:
CPU: 0 UID: 0 PID: 547 Comm: rtcwake Tainted: G O 6.17.0-rc3-00014-g31419c045d64 #6 VOLUNTARY
Tainted: [O]=OOT_MODULE
Stack:
223721b3a80 6089eac6 00000001 00000001
ffffff00 6089eac6 00000535 6086e528
721b3ac0 6003c294 00000000 60031fc0
Call Trace:
[<600407ed>] show_stack+0x10e/0x127
[<6003c294>] dump_stack_lvl+0x77/0xc6
[<6003c2fd>] dump_stack+0x1a/0x20
[<600bc2f8>] lockdep_rcu_suspicious+0x116/0x13e
[<603d8ea1>] dpm_async_suspend_superior+0x117/0x17e
[<603d980f>] device_suspend+0x528/0x541
[<603da24b>] dpm_suspend+0x1a2/0x267
[<603da837>] dpm_suspend_start+0x5d/0x72
[<600ca0c9>] suspend_devices_and_enter+0xab/0x736
[...]
Add the fourth argument to the iteration to annotate
this and avoid the splat.
Fixes: 0679963 ("PM: sleep: Make async suspend handle suppliers like parents")
Fixes: ed18738 ("PM: sleep: Make async resume handle consumers like children")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250826134348.aba79f6e6299.I9ecf55da46ccf33778f2c018a82e1819d815b348@changeid
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 9, 2025
The QuickI2C ACPI _DSD methods return ICRS and ISUB data with a trailing byte, making the actual length is one more byte than the structs defined. It caused stack-out-of-bounds and kernel crash: kernel: BUG: KASAN: stack-out-of-bounds in quicki2c_acpi_get_dsd_property.constprop.0+0x111/0x1b0 [intel_quicki2c] kernel: Write of size 12 at addr ffff888106d1f900 by task kworker/u33:2/75 kernel: kernel: CPU: 3 UID: 0 PID: 75 Comm: kworker/u33:2 Not tainted 6.16.0+ #3 PREEMPT(voluntary) kernel: Workqueue: async async_run_entry_fn kernel: Call Trace: kernel: <TASK> kernel: dump_stack_lvl+0x76/0xa0 kernel: print_report+0xd1/0x660 kernel: ? __pfx__raw_spin_lock_irqsave+0x10/0x10 kernel: ? __kasan_slab_free+0x5d/0x80 kernel: ? kasan_addr_to_slab+0xd/0xb0 kernel: kasan_report+0xe1/0x120 kernel: ? quicki2c_acpi_get_dsd_property.constprop.0+0x111/0x1b0 [intel_quicki2c] kernel: ? quicki2c_acpi_get_dsd_property.constprop.0+0x111/0x1b0 [intel_quicki2c] kernel: kasan_check_range+0x11c/0x200 kernel: __asan_memcpy+0x3b/0x80 kernel: quicki2c_acpi_get_dsd_property.constprop.0+0x111/0x1b0 [intel_quicki2c] kernel: ? __pfx_quicki2c_acpi_get_dsd_property.constprop.0+0x10/0x10 [intel_quicki2c] kernel: quicki2c_get_acpi_resources+0x237/0x730 [intel_quicki2c] [...] kernel: </TASK> kernel: kernel: The buggy address belongs to stack of task kworker/u33:2/75 kernel: and is located at offset 48 in frame: kernel: quicki2c_get_acpi_resources+0x0/0x730 [intel_quicki2c] kernel: kernel: This frame has 3 objects: kernel: [32, 36) 'hid_desc_addr' kernel: [48, 59) 'i2c_param' kernel: [80, 224) 'i2c_config' ACPI DSD methods return: \_SB.PC00.THC0.ICRS Buffer 000000003fdc947b 001 Len 0C = 0A 00 80 1A 06 00 00 00 00 00 00 00 \_SB.PC00.THC0.ISUB Buffer 00000000f2fcbdc4 001 Len 91 = 00 00 00 00 00 00 00 00 00 00 00 00 Adding reserved padding to quicki2c_subip_acpi_parameter/config. Fixes: 5282e45 ("HID: intel-thc-hid: intel-quicki2c: Add THC QuickI2C ACPI interfaces") Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Reviewed-by: Even Xu <even.xu@intel.com> Tested-by: Even Xu <even.xu@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 9, 2025
…C regs save Improper use of secondary pointer (&dev->i2c_subip_regs) caused kernel crash and out-of-bounds error: BUG: KASAN: slab-out-of-bounds in _regmap_bulk_read+0x449/0x510 Write of size 4 at addr ffff888136005dc0 by task kworker/u33:5/5107 CPU: 3 UID: 0 PID: 5107 Comm: kworker/u33:5 Not tainted 6.16.0+ #3 PREEMPT(voluntary) Workqueue: async async_run_entry_fn Call Trace: <TASK> dump_stack_lvl+0x76/0xa0 print_report+0xd1/0x660 ? __pfx__raw_spin_lock_irqsave+0x10/0x10 ? kasan_complete_mode_report_info+0x26/0x200 kasan_report+0xe1/0x120 ? _regmap_bulk_read+0x449/0x510 ? _regmap_bulk_read+0x449/0x510 __asan_report_store4_noabort+0x17/0x30 _regmap_bulk_read+0x449/0x510 ? __pfx__regmap_bulk_read+0x10/0x10 regmap_bulk_read+0x270/0x3d0 pio_complete+0x1ee/0x2c0 [intel_thc] ? __pfx_pio_complete+0x10/0x10 [intel_thc] ? __pfx_pio_wait+0x10/0x10 [intel_thc] ? regmap_update_bits_base+0x13b/0x1f0 thc_i2c_subip_pio_read+0x117/0x270 [intel_thc] thc_i2c_subip_regs_save+0xc2/0x140 [intel_thc] ? __pfx_thc_i2c_subip_regs_save+0x10/0x10 [intel_thc] [...] The buggy address belongs to the object at ffff888136005d00 which belongs to the cache kmalloc-rnd-12-192 of size 192 The buggy address is located 0 bytes to the right of allocated 192-byte region [ffff888136005d00, ffff888136005dc0) Replaced with direct array indexing (&dev->i2c_subip_regs[i]) to ensure safe memory access. Fixes: 4228966 ("HID: intel-thc-hid: intel-thc: Add THC I2C config interfaces") Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Reviewed-by: Even Xu <even.xu@intel.com> Tested-by: Even Xu <even.xu@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 9, 2025
A malicious HID device can trigger a slab out-of-bounds during mt_report_fixup() by passing in report descriptor smaller than 607 bytes. mt_report_fixup() attempts to patch byte offset 607 of the descriptor with 0x25 by first checking if byte offset 607 is 0x15 however it lacks bounds checks to verify if the descriptor is big enough before conducting this check. Fix this bug by ensuring the descriptor size is at least 608 bytes before accessing it. Below is the KASAN splat after the out of bounds access happens: [ 13.671954] ================================================================== [ 13.672667] BUG: KASAN: slab-out-of-bounds in mt_report_fixup+0x103/0x110 [ 13.673297] Read of size 1 at addr ffff888103df39df by task kworker/0:1/10 [ 13.673297] [ 13.673297] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted 6.15.0-00005-gec5d573d83f4-dirty #3 [ 13.673297] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/04 [ 13.673297] Call Trace: [ 13.673297] <TASK> [ 13.673297] dump_stack_lvl+0x5f/0x80 [ 13.673297] print_report+0xd1/0x660 [ 13.673297] kasan_report+0xe5/0x120 [ 13.673297] __asan_report_load1_noabort+0x18/0x20 [ 13.673297] mt_report_fixup+0x103/0x110 [ 13.673297] hid_open_report+0x1ef/0x810 [ 13.673297] mt_probe+0x422/0x960 [ 13.673297] hid_device_probe+0x2e2/0x6f0 [ 13.673297] really_probe+0x1c6/0x6b0 [ 13.673297] __driver_probe_device+0x24f/0x310 [ 13.673297] driver_probe_device+0x4e/0x220 [ 13.673297] __device_attach_driver+0x169/0x320 [ 13.673297] bus_for_each_drv+0x11d/0x1b0 [ 13.673297] __device_attach+0x1b8/0x3e0 [ 13.673297] device_initial_probe+0x12/0x20 [ 13.673297] bus_probe_device+0x13d/0x180 [ 13.673297] device_add+0xe3a/0x1670 [ 13.673297] hid_add_device+0x31d/0xa40 [...] Fixes: c8000de ("HID: multitouch: Add support for GT7868Q") Cc: stable@vger.kernel.org Signed-off-by: Qasim Ijaz <qasdev00@gmail.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.com>
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 9, 2025
When a large VM, specifically one that holds a significant number of PTEs, gets abruptly destroyed, the following warning is seen during the page-table walk: sched: CPU 0 need_resched set for > 100018840 ns (100 ticks) without schedule CPU: 0 UID: 0 PID: 9617 Comm: kvm_page_table_ Tainted: G O 6.16.0-smp-DEV #3 NONE Tainted: [O]=OOT_MODULE Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0x3c/0xb8 dump_stack+0x18/0x30 resched_latency_warn+0x7c/0x88 sched_tick+0x1c4/0x268 update_process_times+0xa8/0xd8 tick_nohz_handler+0xc8/0x168 __hrtimer_run_queues+0x11c/0x338 hrtimer_interrupt+0x104/0x308 arch_timer_handler_phys+0x40/0x58 handle_percpu_devid_irq+0x8c/0x1b0 generic_handle_domain_irq+0x48/0x78 gic_handle_irq+0x1b8/0x408 call_on_irq_stack+0x24/0x30 do_interrupt_handler+0x54/0x78 el1_interrupt+0x44/0x88 el1h_64_irq_handler+0x18/0x28 el1h_64_irq+0x84/0x88 stage2_free_walker+0x30/0xa0 (P) __kvm_pgtable_walk+0x11c/0x258 __kvm_pgtable_walk+0x180/0x258 __kvm_pgtable_walk+0x180/0x258 __kvm_pgtable_walk+0x180/0x258 kvm_pgtable_walk+0xc4/0x140 kvm_pgtable_stage2_destroy+0x5c/0xf0 kvm_free_stage2_pgd+0x6c/0xe8 kvm_uninit_stage2_mmu+0x24/0x48 kvm_arch_flush_shadow_all+0x80/0xa0 kvm_mmu_notifier_release+0x38/0x78 __mmu_notifier_release+0x15c/0x250 exit_mmap+0x68/0x400 __mmput+0x38/0x1c8 mmput+0x30/0x68 exit_mm+0xd4/0x198 do_exit+0x1a4/0xb00 do_group_exit+0x8c/0x120 get_signal+0x6d4/0x778 do_signal+0x90/0x718 do_notify_resume+0x70/0x170 el0_svc+0x74/0xd8 el0t_64_sync_handler+0x60/0xc8 el0t_64_sync+0x1b0/0x1b8 The warning is seen majorly on the host kernels that are configured not to force-preempt, such as CONFIG_PREEMPT_NONE=y. To avoid this, instead of walking the entire page-table in one go, split it into smaller ranges, by checking for cond_resched() between each range. Since the path is executed during VM destruction, after the page-table structure is unlinked from the KVM MMU, relying on cond_resched_rwlock_write() isn't necessary. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Suggested-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250820162242.2624752-3-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 24, 2025
When the "proxy" option is enabled on a VXLAN device, the device will suppress ARP requests and IPv6 Neighbor Solicitation messages if it is able to reply on behalf of the remote host. That is, if a matching and valid neighbor entry is configured on the VXLAN device whose MAC address is not behind the "any" remote (0.0.0.0 / ::). The code currently assumes that the FDB entry for the neighbor's MAC address points to a valid remote destination, but this is incorrect if the entry is associated with an FDB nexthop group. This can result in a NPD [1][3] which can be reproduced using [2][4]. Fix by checking that the remote destination exists before dereferencing it. [1] BUG: kernel NULL pointer dereference, address: 0000000000000000 [...] CPU: 4 UID: 0 PID: 365 Comm: arping Not tainted 6.17.0-rc2-virtme-g2a89cb21162c #2 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014 RIP: 0010:vxlan_xmit+0xb58/0x15f0 [...] Call Trace: <TASK> dev_hard_start_xmit+0x5d/0x1c0 __dev_queue_xmit+0x246/0xfd0 packet_sendmsg+0x113a/0x1850 __sock_sendmsg+0x38/0x70 __sys_sendto+0x126/0x180 __x64_sys_sendto+0x24/0x30 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x4b/0x53 [2] #!/bin/bash ip address add 192.0.2.1/32 dev lo ip nexthop add id 1 via 192.0.2.2 fdb ip nexthop add id 10 group 1 fdb ip link add name vx0 up type vxlan id 10010 local 192.0.2.1 dstport 4789 proxy ip neigh add 192.0.2.3 lladdr 00:11:22:33:44:55 nud perm dev vx0 bridge fdb add 00:11:22:33:44:55 dev vx0 self static nhid 10 arping -b -c 1 -s 192.0.2.1 -I vx0 192.0.2.3 [3] BUG: kernel NULL pointer dereference, address: 0000000000000000 [...] CPU: 13 UID: 0 PID: 372 Comm: ndisc6 Not tainted 6.17.0-rc2-virtmne-g6ee90cb26014 #3 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1v996), BIOS 1.17.0-4.fc41 04/01/2x014 RIP: 0010:vxlan_xmit+0x803/0x1600 [...] Call Trace: <TASK> dev_hard_start_xmit+0x5d/0x1c0 __dev_queue_xmit+0x246/0xfd0 ip6_finish_output2+0x210/0x6c0 ip6_finish_output+0x1af/0x2b0 ip6_mr_output+0x92/0x3e0 ip6_send_skb+0x30/0x90 rawv6_sendmsg+0xe6e/0x12e0 __sock_sendmsg+0x38/0x70 __sys_sendto+0x126/0x180 __x64_sys_sendto+0x24/0x30 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f383422ec77 [4] #!/bin/bash ip address add 2001:db8:1::1/128 dev lo ip nexthop add id 1 via 2001:db8:1::1 fdb ip nexthop add id 10 group 1 fdb ip link add name vx0 up type vxlan id 10010 local 2001:db8:1::1 dstport 4789 proxy ip neigh add 2001:db8:1::3 lladdr 00:11:22:33:44:55 nud perm dev vx0 bridge fdb add 00:11:22:33:44:55 dev vx0 self static nhid 10 ndisc6 -r 1 -s 2001:db8:1::1 -w 1 2001:db8:1::3 vx0 Fixes: 1274e1c ("vxlan: ecmp support for mac fdb entries") Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250901065035.159644-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 24, 2025
Ido Schimmel says:
====================
vxlan: Fix NPDs when using nexthop objects
With FDB nexthop groups, VXLAN FDB entries do not necessarily point to
a remote destination but rather to an FDB nexthop group. This means that
first_remote_{rcu,rtnl}() can return NULL and a few places in the driver
were not ready for that, resulting in NULL pointer dereferences.
Patches #1-#2 fix these NPDs.
Note that vxlan_fdb_find_uc() still dereferences the remote returned by
first_remote_rcu() without checking that it is not NULL, but this
function is only invoked by a single driver which vetoes the creation of
FDB nexthop groups. I will patch this in net-next to make the code less
fragile.
Patch #3 adds a selftests which exercises these code paths and tests
basic Tx functionality with FDB nexthop groups. I verified that the test
crashes the kernel without the first two patches.
====================
Link: https://patch.msgid.link/20250901065035.159644-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 24, 2025
When transmitting a PTP frame which is timestamp using 2 step, the
following warning appears if CONFIG_PROVE_LOCKING is enabled:
=============================
[ BUG: Invalid wait context ]
6.17.0-rc1-00326-ge6160462704e #427 Not tainted
-----------------------------
ptp4l/119 is trying to lock:
c2a44ed4 (&vsc8531->ts_lock){+.+.}-{3:3}, at: vsc85xx_txtstamp+0x50/0xac
other info that might help us debug this:
context-{4:4}
4 locks held by ptp4l/119:
#0: c145f068 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x58/0x1440
#1: c29df974 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x5c4/0x1440
#2: c2aaaad0 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x108/0x350
#3: c2aac170 (&lan966x->tx_lock){+.-.}-{2:2}, at: lan966x_port_xmit+0xd0/0x350
stack backtrace:
CPU: 0 UID: 0 PID: 119 Comm: ptp4l Not tainted 6.17.0-rc1-00326-ge6160462704e #427 NONE
Hardware name: Generic DT based system
Call trace:
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x7c/0xac
dump_stack_lvl from __lock_acquire+0x8e8/0x29dc
__lock_acquire from lock_acquire+0x108/0x38c
lock_acquire from __mutex_lock+0xb0/0xe78
__mutex_lock from mutex_lock_nested+0x1c/0x24
mutex_lock_nested from vsc85xx_txtstamp+0x50/0xac
vsc85xx_txtstamp from lan966x_fdma_xmit+0xd8/0x3a8
lan966x_fdma_xmit from lan966x_port_xmit+0x1bc/0x350
lan966x_port_xmit from dev_hard_start_xmit+0xc8/0x2c0
dev_hard_start_xmit from sch_direct_xmit+0x8c/0x350
sch_direct_xmit from __dev_queue_xmit+0x680/0x1440
__dev_queue_xmit from packet_sendmsg+0xfa4/0x1568
packet_sendmsg from __sys_sendto+0x110/0x19c
__sys_sendto from sys_send+0x18/0x20
sys_send from ret_fast_syscall+0x0/0x1c
Exception stack(0xf0b05fa8 to 0xf0b05ff0)
5fa0: 00000001 0000000e 0000000e 0004b47a 0000003a 00000000
5fc0: 00000001 0000000e 00000000 00000121 0004af58 00044874 00000000 00000000
5fe0: 00000001 bee9d420 00025a10 b6e75c7c
So, instead of using the ts_lock for tx_queue, use the spinlock that
skb_buff_head has.
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Fixes: 7d272e6 ("net: phy: mscc: timestamping and PHC support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20250902121259.3257536-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 24, 2025
Problem description
===================
Lockdep reports a possible circular locking dependency (AB/BA) between
&pl->state_mutex and &phy->lock, as follows.
phylink_resolve() // acquires &pl->state_mutex
-> phylink_major_config()
-> phy_config_inband() // acquires &pl->phydev->lock
whereas all the other call sites where &pl->state_mutex and
&pl->phydev->lock have the locking scheme reversed. Everywhere else,
&pl->phydev->lock is acquired at the top level, and &pl->state_mutex at
the lower level. A clear example is phylink_bringup_phy().
The outlier is the newly introduced phy_config_inband() and the existing
lock order is the correct one. To understand why it cannot be the other
way around, it is sufficient to consider phylink_phy_change(), phylink's
callback from the PHY device's phy->phy_link_change() virtual method,
invoked by the PHY state machine.
phy_link_up() and phy_link_down(), the (indirect) callers of
phylink_phy_change(), are called with &phydev->lock acquired.
Then phylink_phy_change() acquires its own &pl->state_mutex, to
serialize changes made to its pl->phy_state and pl->link_config.
So all other instances of &pl->state_mutex and &phydev->lock must be
consistent with this order.
Problem impact
==============
I think the kernel runs a serious deadlock risk if an existing
phylink_resolve() thread, which results in a phy_config_inband() call,
is concurrent with a phy_link_up() or phy_link_down() call, which will
deadlock on &pl->state_mutex in phylink_phy_change(). Practically
speaking, the impact may be limited by the slow speed of the medium
auto-negotiation protocol, which makes it unlikely for the current state
to still be unresolved when a new one is detected, but I think the
problem is there. Nonetheless, the problem was discovered using lockdep.
Proposed solution
=================
Practically speaking, the phy_config_inband() requirement of having
phydev->lock acquired must transfer to the caller (phylink is the only
caller). There, it must bubble up until immediately before
&pl->state_mutex is acquired, for the cases where that takes place.
Solution details, considerations, notes
=======================================
This is the phy_config_inband() call graph:
sfp_upstream_ops :: connect_phy()
|
v
phylink_sfp_connect_phy()
|
v
phylink_sfp_config_phy()
|
| sfp_upstream_ops :: module_insert()
| |
| v
| phylink_sfp_module_insert()
| |
| | sfp_upstream_ops :: module_start()
| | |
| | v
| | phylink_sfp_module_start()
| | |
| v v
| phylink_sfp_config_optical()
phylink_start() | |
| phylink_resume() v v
| | phylink_sfp_set_config()
| | |
v v v
phylink_mac_initial_config()
| phylink_resolve()
| | phylink_ethtool_ksettings_set()
v v v
phylink_major_config()
|
v
phy_config_inband()
phylink_major_config() caller #1, phylink_mac_initial_config(), does not
acquire &pl->state_mutex nor do its callers. It must acquire
&pl->phydev->lock prior to calling phylink_major_config().
phylink_major_config() caller #2, phylink_resolve() acquires
&pl->state_mutex, thus also needs to acquire &pl->phydev->lock.
phylink_major_config() caller #3, phylink_ethtool_ksettings_set(), is
completely uninteresting, because it only calls phylink_major_config()
if pl->phydev is NULL (otherwise it calls phy_ethtool_ksettings_set()).
We need to change nothing there.
Other solutions
===============
The lock inversion between &pl->state_mutex and &pl->phydev->lock has
occurred at least once before, as seen in commit c718af2 ("net:
phylink: fix ethtool -A with attached PHYs"). The solution there was to
simply not call phy_set_asym_pause() under the &pl->state_mutex. That
cannot be extended to our case though, where the phy_config_inband()
call is much deeper inside the &pl->state_mutex section.
Fixes: 5fd0f1a ("net: phylink: add negotiation of in-band capabilities")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250904125238.193990-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 24, 2025
…ux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, round #3 - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when visiting from an MMU notifier - Fixes to the TLB match process and TLB invalidation range for managing the VCNR pseudo-TLB - Prevent SPE from erroneously profiling guests due to UNKNOWN reset values in PMSCR_EL1 - Fix save/restore of host MDCR_EL2 to account for eagerly programming at vcpu_load() on VHE systems - Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios where an xarray's spinlock was nested with a *raw* spinlock - Permit stage-2 read permission aborts which are possible in the case of NV depending on the guest hypervisor's stage-2 translation - Call raw_spin_unlock() instead of the internal spinlock API - Fix parameter ordering when assigning VBAR_EL1
rodrigovivi
pushed a commit
that referenced
this pull request
Sep 24, 2025
The drm_gem_for_each_gpuvm_bo() call from lookup_vma() accesses drm_gem_obj.gpuva.list, which is not initialized when the drm driver does not support DRIVER_GEM_GPUVA feature. Enable it for msm_kms drm driver to fix the splat seen when msm.separate_gpu_drm=1 modparam is set: [ 9.506020] Unable to handle kernel paging request at virtual address fffffffffffffff0 [ 9.523160] Mem abort info: [ 9.523161] ESR = 0x0000000096000006 [ 9.523163] EC = 0x25: DABT (current EL), IL = 32 bits [ 9.523165] SET = 0, FnV = 0 [ 9.523166] EA = 0, S1PTW = 0 [ 9.523167] FSC = 0x06: level 2 translation fault [ 9.523169] Data abort info: [ 9.523170] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 [ 9.523171] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 9.523172] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 9.523174] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000ad370f000 [ 9.523176] [fffffffffffffff0] pgd=0000000000000000, p4d=0000000ad4787403, pud=0000000ad4788403, pmd=0000000000000000 [ 9.523184] Internal error: Oops: 0000000096000006 [#1] SMP [ 9.592968] CPU: 9 UID: 0 PID: 448 Comm: (udev-worker) Not tainted 6.17.0-rc4-assorted-fix-00005-g0e9bb53a2282-dirty #3 PREEMPT [ 9.592970] Hardware name: Qualcomm CRD, BIOS 6.0.240718.BOOT.MXF.2.4-00515-HAMOA-1 07/18/2024 [ 9.592971] pstate: a1400005 (NzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 9.592973] pc : lookup_vma+0x28/0xe0 [msm] [ 9.592996] lr : get_vma_locked+0x2c/0x128 [msm] [ 9.763632] sp : ffff800082dab460 [ 9.763666] Call trace: [ 9.763668] lookup_vma+0x28/0xe0 [msm] (P) [ 9.763688] get_vma_locked+0x2c/0x128 [msm] [ 9.763706] msm_gem_get_and_pin_iova_range+0x68/0x11c [msm] [ 9.763723] msm_gem_get_and_pin_iova+0x18/0x24 [msm] [ 9.763740] msm_fbdev_driver_fbdev_probe+0xd0/0x258 [msm] [ 9.763760] __drm_fb_helper_initial_config_and_unlock+0x288/0x528 [drm_kms_helper] [ 9.763771] drm_fb_helper_initial_config+0x44/0x54 [drm_kms_helper] [ 9.763779] drm_fbdev_client_hotplug+0x84/0xd4 [drm_client_lib] [ 9.763782] drm_client_register+0x58/0x9c [drm] [ 9.763806] drm_fbdev_client_setup+0xe8/0xcf0 [drm_client_lib] [ 9.763809] drm_client_setup+0xb4/0xd8 [drm_client_lib] [ 9.763811] msm_drm_kms_post_init+0x2c/0x3c [msm] [ 9.763830] msm_drm_init+0x1a8/0x22c [msm] [ 9.763848] msm_drm_bind+0x30/0x3c [msm] [ 9.919273] try_to_bring_up_aggregate_device+0x168/0x1d4 [ 9.919283] __component_add+0xa4/0x170 [ 9.919286] component_add+0x14/0x20 [ 9.919288] msm_dp_display_probe_tail+0x4c/0xac [msm] [ 9.919315] msm_dp_auxbus_done_probe+0x14/0x20 [msm] [ 9.919335] dp_aux_ep_probe+0x4c/0xf0 [drm_dp_aux_bus] [ 9.919341] really_probe+0xbc/0x298 [ 9.919345] __driver_probe_device+0x78/0x12c [ 9.919348] driver_probe_device+0x40/0x160 [ 9.919350] __driver_attach+0x94/0x19c [ 9.919353] bus_for_each_dev+0x74/0xd4 [ 9.919355] driver_attach+0x24/0x30 [ 9.919358] bus_add_driver+0xe4/0x208 [ 9.919360] driver_register+0x60/0x128 [ 9.919363] __dp_aux_dp_driver_register+0x24/0x30 [drm_dp_aux_bus] [ 9.919365] atana33xc20_init+0x20/0x1000 [panel_samsung_atna33xc20] [ 9.919370] do_one_initcall+0x6c/0x1b0 [ 9.919374] do_init_module+0x58/0x234 [ 9.919377] load_module+0x19cc/0x1bd4 [ 9.919380] init_module_from_file+0x84/0xc4 [ 9.919382] __arm64_sys_finit_module+0x1b8/0x2cc [ 9.919384] invoke_syscall+0x48/0x110 [ 9.919389] el0_svc_common.constprop.0+0xc8/0xe8 [ 9.919393] do_el0_svc+0x20/0x2c [ 9.919396] el0_svc+0x34/0xf0 [ 9.919401] el0t_64_sync_handler+0xa0/0xe4 [ 9.919403] el0t_64_sync+0x198/0x19c [ 9.919407] Code: eb0000bf 54000480 d100a003 aa0303e2 (f8418c44) [ 9.919410] ---[ end trace 0000000000000000 ]--- Fixes: 217ed15 ("drm/msm: enable separate binding of GPU and display devices") Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/672257/ Link: https://lore.kernel.org/r/20250902-assorted-sept-1-v1-1-f3ec9baed513@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
23d6ae0 to
11a8456
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps urllib3 from 2.0.7 to 2.2.2.
Release notes
Sourced from urllib3's releases.
... (truncated)
Changelog
Sourced from urllib3's changelog.
Commits
27e2a5cRelease 2.2.2 (#3406)accff72Merge pull request from GHSA-34jh-p97f-mpxf34be4a5Pin CFFI to a new release candidate instead of a Git commit (#3398)da41058Bump browser-actions/setup-chrome from 1.6.0 to 1.7.1 (#3399)b07a669Bump github/codeql-action from 2.13.4 to 3.25.6 (#3396)b8589ecMeasure coverage with v4 of artifact actions (#3394)f3bdc55Allow triggering CI manually (#3391)5239265Fix HTTP version in debug log (#3316)b34619fBump actions/checkout to 4.1.4 (#3387)9961d14Bump browser-actions/setup-chrome from 1.5.0 to 1.6.0 (#3386)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)You can disable automated security fix PRs for this repo from the Security Alerts page.