Skip to content

fix: ls2k500sfb build error, gpio_device_find_by_label should be used #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Zeno-sole
Copy link

commit: d62fcd9

MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
Florian reported the following kernel NULL pointer dereference issue on
a BCM7250 board:
[    2.829744] Unable to handle kernel NULL pointer dereference at virtual address 0000000c when read
[    2.838740] [0000000c] *pgd=80000000004003, *pmd=00000000
[    2.844178] Internal error: Oops: 206 [#1] SMP ARM
[    2.848990] Modules linked in:
[    2.852061] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-next-20240305-gd95fcdf4961d torvalds#66
[    2.860436] Hardware name: Broadcom STB (Flattened Device Tree)
[    2.866371] PC is at brcmnand_read_by_pio+0x180/0x278
[    2.871449] LR is at __wait_for_common+0x9c/0x1b0
[    2.876178] pc : [<c094b6cc>]    lr : [<c0e66310>]    psr: 60000053
[    2.882460] sp : f0811a80  ip : 00000012  fp : 00000000
[    2.887699] r10: 00000000  r9 : 00000000  r8 : c3790000
[    2.892936] r7 : 00000000  r6 : 00000000  r5 : c35db440  r4 : ffe00000
[    2.899479] r3 : f15cb814  r2 : 00000000  r1 : 00000000  r0 : 00000000

The issue only happens when dma mode is disabled or not supported on STB
chip. The pio mode transfer calls brcmnand_read_data_bus function which
dereferences ctrl->soc->read_data_bus. But the soc member in STB chip is
NULL hence triggers the access violation. The function needs to check
the soc pointer first.

Fixes: 546e425 ("mtd: rawnand: brcmnand: Add BCMBCA read data bus interface")

Reported-by: Florian Fainelli <[email protected]>
Tested-by: Florian Fainelli <[email protected]>
Signed-off-by: William Zhang <[email protected]>
Signed-off-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/linux-mtd/[email protected]
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

    BUG: unable to handle page fault for address: 000000000002a2b8
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 1470e1067 P4D 0
    Oops: 0002 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ torvalds#57
    Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
    RIP: 0010:mutex_lock+0x2e/0x50
    ...
    Call Trace:
    <TASK>
    __die+0x24/0x70
    page_fault_oops+0x82/0x160
    do_user_addr_fault+0x65/0x6b0
    __pfx___rdmsr_safe_on_cpu+0x10/0x10
    exc_page_fault+0x7d/0x170
    asm_exc_page_fault+0x26/0x30
    mutex_lock+0x2e/0x50
    mutex_lock+0x1e/0x50
    perf_pmu_migrate_context+0x87/0x1f0
    perf_event_cpu_offline+0x76/0x90 [idxd]
    cpuhp_invoke_callback+0xa2/0x4f0
    __pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
    cpuhp_thread_fun+0x98/0x150
    smpboot_thread_fn+0x27/0x260
    smpboot_thread_fn+0x1af/0x260
    __pfx_smpboot_thread_fn+0x10/0x10
    kthread+0x103/0x140
    __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x50
    __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1b/0x30
    <TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
Doug reported [1] the following hung task:

 INFO: task swapper/0:1 blocked for more than 122 seconds.
       Not tainted 5.15.149-21875-gf795ebc40eb8 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:swapper/0       state:D stack:    0 pid:    1 ppid:     0 flags:0x00000008
 Call trace:
  __switch_to+0xf4/0x1f4
  __schedule+0x418/0xb80
  schedule+0x5c/0x10c
  rpm_resume+0xe0/0x52c
  rpm_resume+0x178/0x52c
  __pm_runtime_resume+0x58/0x98
  clk_pm_runtime_get+0x30/0xb0
  clk_disable_unused_subtree+0x58/0x208
  clk_disable_unused_subtree+0x38/0x208
  clk_disable_unused_subtree+0x38/0x208
  clk_disable_unused_subtree+0x38/0x208
  clk_disable_unused_subtree+0x38/0x208
  clk_disable_unused+0x4c/0xe4
  do_one_initcall+0xcc/0x2d8
  do_initcall_level+0xa4/0x148
  do_initcalls+0x5c/0x9c
  do_basic_setup+0x24/0x30
  kernel_init_freeable+0xec/0x164
  kernel_init+0x28/0x120
  ret_from_fork+0x10/0x20
 INFO: task kworker/u16:0:9 blocked for more than 122 seconds.
       Not tainted 5.15.149-21875-gf795ebc40eb8 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/u16:0   state:D stack:    0 pid:    9 ppid:     2 flags:0x00000008
 Workqueue: events_unbound deferred_probe_work_func
 Call trace:
  __switch_to+0xf4/0x1f4
  __schedule+0x418/0xb80
  schedule+0x5c/0x10c
  schedule_preempt_disabled+0x2c/0x48
  __mutex_lock+0x238/0x488
  __mutex_lock_slowpath+0x1c/0x28
  mutex_lock+0x50/0x74
  clk_prepare_lock+0x7c/0x9c
  clk_core_prepare_lock+0x20/0x44
  clk_prepare+0x24/0x30
  clk_bulk_prepare+0x40/0xb0
  mdss_runtime_resume+0x54/0x1c8
  pm_generic_runtime_resume+0x30/0x44
  __genpd_runtime_resume+0x68/0x7c
  genpd_runtime_resume+0x108/0x1f4
  __rpm_callback+0x84/0x144
  rpm_callback+0x30/0x88
  rpm_resume+0x1f4/0x52c
  rpm_resume+0x178/0x52c
  __pm_runtime_resume+0x58/0x98
  __device_attach+0xe0/0x170
  device_initial_probe+0x1c/0x28
  bus_probe_device+0x3c/0x9c
  device_add+0x644/0x814
  mipi_dsi_device_register_full+0xe4/0x170
  devm_mipi_dsi_device_register_full+0x28/0x70
  ti_sn_bridge_probe+0x1dc/0x2c0
  auxiliary_bus_probe+0x4c/0x94
  really_probe+0xcc/0x2c8
  __driver_probe_device+0xa8/0x130
  driver_probe_device+0x48/0x110
  __device_attach_driver+0xa4/0xcc
  bus_for_each_drv+0x8c/0xd8
  __device_attach+0xf8/0x170
  device_initial_probe+0x1c/0x28
  bus_probe_device+0x3c/0x9c
  deferred_probe_work_func+0x9c/0xd8
  process_one_work+0x148/0x518
  worker_thread+0x138/0x350
  kthread+0x138/0x1e0
  ret_from_fork+0x10/0x20

The first thread is walking the clk tree and calling
clk_pm_runtime_get() to power on devices required to read the clk
hardware via struct clk_ops::is_enabled(). This thread holds the clk
prepare_lock, and is trying to runtime PM resume a device, when it finds
that the device is in the process of resuming so the thread schedule()s
away waiting for the device to finish resuming before continuing. The
second thread is runtime PM resuming the same device, but the runtime
resume callback is calling clk_prepare(), trying to grab the
prepare_lock waiting on the first thread.

This is a classic ABBA deadlock. To properly fix the deadlock, we must
never runtime PM resume or suspend a device with the clk prepare_lock
held. Actually doing that is near impossible today because the global
prepare_lock would have to be dropped in the middle of the tree, the
device runtime PM resumed/suspended, and then the prepare_lock grabbed
again to ensure consistency of the clk tree topology. If anything
changes with the clk tree in the meantime, we've lost and will need to
start the operation all over again.

Luckily, most of the time we're simply incrementing or decrementing the
runtime PM count on an active device, so we don't have the chance to
schedule away with the prepare_lock held. Let's fix this immediate
problem that can be triggered more easily by simply booting on Qualcomm
sc7180.

Introduce a list of clk_core structures that have been registered, or
are in the process of being registered, that require runtime PM to
operate. Iterate this list and call clk_pm_runtime_get() on each of them
without holding the prepare_lock during clk_disable_unused(). This way
we can be certain that the runtime PM state of the devices will be
active and resumed so we can't schedule away while walking the clk tree
with the prepare_lock held. Similarly, call clk_pm_runtime_put() without
the prepare_lock held to properly drop the runtime PM reference. We
remove the calls to clk_pm_runtime_{get,put}() in this path because
they're superfluous now that we know the devices are runtime resumed.

Reported-by: Douglas Anderson <[email protected]>
Closes: https://lore.kernel.org/all/20220922084322.RFC.2.I375b6b9e0a0a5348962f004beb3dafee6a12dfbb@changeid/ [1]
Closes: https://issuetracker.google.com/328070191
Cc: Marek Szyprowski <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Fixes: 9a34b45 ("clk: Add support for runtime PM")
Signed-off-by: Stephen Boyd <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Douglas Anderson <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
Drop support for virtualizing adaptive PEBS, as KVM's implementation is
architecturally broken without an obvious/easy path forward, and because
exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak
host kernel addresses to the guest.

Bug #1 is that KVM doesn't account for the upper 32 bits of
IA32_FIXED_CTR_CTRL when (re)programming fixed counters, e.g
fixed_ctrl_field() drops the upper bits, reprogram_fixed_counters()
stores local variables as u8s and truncates the upper bits too, etc.

Bug #2 is that, because KVM _always_ sets precise_ip to a non-zero value
for PEBS events, perf will _always_ generate an adaptive record, even if
the guest requested a basic record.  Note, KVM will also enable adaptive
PEBS in individual *counter*, even if adaptive PEBS isn't exposed to the
guest, but this is benign as MSR_PEBS_DATA_CFG is guaranteed to be zero,
i.e. the guest will only ever see Basic records.

Bug #3 is in perf.  intel_pmu_disable_fixed() doesn't clear the upper
bits either, i.e. leaves ICL_FIXED_0_ADAPTIVE set, and
intel_pmu_enable_fixed() effectively doesn't clear ICL_FIXED_0_ADAPTIVE
either.  I.e. perf _always_ enables ADAPTIVE counters, regardless of what
KVM requests.

Bug #4 is that adaptive PEBS *might* effectively bypass event filters set
by the host, as "Updated Memory Access Info Group" records information
that might be disallowed by userspace via KVM_SET_PMU_EVENT_FILTER.

Bug #5 is that KVM doesn't ensure LBR MSRs hold guest values (or at least
zeros) when entering a vCPU with adaptive PEBS, which allows the guest
to read host LBRs, i.e. host RIPs/addresses, by enabling "LBR Entries"
records.

Disable adaptive PEBS support as an immediate fix due to the severity of
the LBR leak in particular, and because fixing all of the bugs will be
non-trivial, e.g. not suitable for backporting to stable kernels.

Note!  This will break live migration, but trying to make KVM play nice
with live migration would be quite complicated, wouldn't be guaranteed to
work (i.e. KVM might still kill/confuse the guest), and it's not clear
that there are any publicly available VMMs that support adaptive PEBS,
let alone live migrate VMs that support adaptive PEBS, e.g. QEMU doesn't
support PEBS in any capacity.

Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/[email protected]
Fixes: c59a1f1 ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
Cc: [email protected]
Cc: Like Xu <[email protected]>
Cc: Mingwei Zhang <[email protected]>
Cc: Zhenyu Wang <[email protected]>
Cc: Zhang Xiong <[email protected]>
Cc: Lv Zhiyuan <[email protected]>
Cc: Dapeng Mi <[email protected]>
Cc: Jim Mattson <[email protected]>
Acked-by: Like Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
The uart_handle_cts_change() function in serial_core expects the caller
to hold uport->lock. For example, I have seen the below kernel splat,
when the Bluetooth driver is loaded on an i.MX28 board.

    [   85.119255] ------------[ cut here ]------------
    [   85.124413] WARNING: CPU: 0 PID: 27 at /drivers/tty/serial/serial_core.c:3453 uart_handle_cts_change+0xb4/0xec
    [   85.134694] Modules linked in: hci_uart bluetooth ecdh_generic ecc wlcore_sdio configfs
    [   85.143314] CPU: 0 PID: 27 Comm: kworker/u3:0 Not tainted 6.6.3-00021-gd62a2f068f92 #1
    [   85.151396] Hardware name: Freescale MXS (Device Tree)
    [   85.156679] Workqueue: hci0 hci_power_on [bluetooth]
    (...)
    [   85.191765]  uart_handle_cts_change from mxs_auart_irq_handle+0x380/0x3f4
    [   85.198787]  mxs_auart_irq_handle from __handle_irq_event_percpu+0x88/0x210
    (...)

Cc: [email protected]
Fixes: 4d90bb1 ("serial: core: Document and assert lock requirements for irq helpers")
Reviewed-by: Frank Li <[email protected]>
Signed-off-by: Emil Kronborg <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
…git/netfilter/nf

netfilter pull request 24-04-11

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patches #1 and #2 add missing rcu read side lock when iterating over
expression and object type list which could race with module removal.

Patch #3 prevents promisc packet from visiting the bridge/input hook
	 to amend a recent fix to address conntrack confirmation race
	 in br_netfilter and nf_conntrack_bridge.

Patch #4 adds and uses iterate decorator type to fetch the current
	 pipapo set backend datastructure view when netlink dumps the
	 set elements.

Patch #5 fixes removal of duplicate elements in the pipapo set backend.

Patch torvalds#6 flowtable validates pppoe header before accessing it.

Patch torvalds#7 fixes flowtable datapath for pppoe packets, otherwise lookup
         fails and pppoe packets follow classic path.
====================

Signed-off-by: David S. Miller <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
When disabling aRFS under the `priv->state_lock`, any scheduled
aRFS works are canceled using the `cancel_work_sync` function,
which waits for the work to end if it has already started.
However, while waiting for the work handler, the handler will
try to acquire the `state_lock` which is already acquired.

The worker acquires the lock to delete the rules if the state
is down, which is not the worker's responsibility since
disabling aRFS deletes the rules.

Add an aRFS state variable, which indicates whether the aRFS is
enabled and prevent adding rules when the aRFS is disabled.

Kernel log:

======================================================
WARNING: possible circular locking dependency detected
6.7.0-rc4_net_next_mlx5_5483eb2 #1 Tainted: G          I
------------------------------------------------------
ethtool/386089 is trying to acquire lock:
ffff88810f21ce68 ((work_completion)(&rule->arfs_work)){+.+.}-{0:0}, at: __flush_work+0x74/0x4e0

but task is already holding lock:
ffff8884a1808cc0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_ethtool_set_channels+0x53/0x200 [mlx5_core]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&priv->state_lock){+.+.}-{3:3}:
       __mutex_lock+0x80/0xc90
       arfs_handle_work+0x4b/0x3b0 [mlx5_core]
       process_one_work+0x1dc/0x4a0
       worker_thread+0x1bf/0x3c0
       kthread+0xd7/0x100
       ret_from_fork+0x2d/0x50
       ret_from_fork_asm+0x11/0x20

-> #0 ((work_completion)(&rule->arfs_work)){+.+.}-{0:0}:
       __lock_acquire+0x17b4/0x2c80
       lock_acquire+0xd0/0x2b0
       __flush_work+0x7a/0x4e0
       __cancel_work_timer+0x131/0x1c0
       arfs_del_rules+0x143/0x1e0 [mlx5_core]
       mlx5e_arfs_disable+0x1b/0x30 [mlx5_core]
       mlx5e_ethtool_set_channels+0xcb/0x200 [mlx5_core]
       ethnl_set_channels+0x28f/0x3b0
       ethnl_default_set_doit+0xec/0x240
       genl_family_rcv_msg_doit+0xd0/0x120
       genl_rcv_msg+0x188/0x2c0
       netlink_rcv_skb+0x54/0x100
       genl_rcv+0x24/0x40
       netlink_unicast+0x1a1/0x270
       netlink_sendmsg+0x214/0x460
       __sock_sendmsg+0x38/0x60
       __sys_sendto+0x113/0x170
       __x64_sys_sendto+0x20/0x30
       do_syscall_64+0x40/0xe0
       entry_SYSCALL_64_after_hwframe+0x46/0x4e

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&priv->state_lock);
                               lock((work_completion)(&rule->arfs_work));
                               lock(&priv->state_lock);
  lock((work_completion)(&rule->arfs_work));

 *** DEADLOCK ***

3 locks held by ethtool/386089:
 #0: ffffffff82ea7210 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40
 #1: ffffffff82e94c88 (rtnl_mutex){+.+.}-{3:3}, at: ethnl_default_set_doit+0xd3/0x240
 #2: ffff8884a1808cc0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_ethtool_set_channels+0x53/0x200 [mlx5_core]

stack backtrace:
CPU: 15 PID: 386089 Comm: ethtool Tainted: G          I        6.7.0-rc4_net_next_mlx5_5483eb2 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x60/0xa0
 check_noncircular+0x144/0x160
 __lock_acquire+0x17b4/0x2c80
 lock_acquire+0xd0/0x2b0
 ? __flush_work+0x74/0x4e0
 ? save_trace+0x3e/0x360
 ? __flush_work+0x74/0x4e0
 __flush_work+0x7a/0x4e0
 ? __flush_work+0x74/0x4e0
 ? __lock_acquire+0xa78/0x2c80
 ? lock_acquire+0xd0/0x2b0
 ? mark_held_locks+0x49/0x70
 __cancel_work_timer+0x131/0x1c0
 ? mark_held_locks+0x49/0x70
 arfs_del_rules+0x143/0x1e0 [mlx5_core]
 mlx5e_arfs_disable+0x1b/0x30 [mlx5_core]
 mlx5e_ethtool_set_channels+0xcb/0x200 [mlx5_core]
 ethnl_set_channels+0x28f/0x3b0
 ethnl_default_set_doit+0xec/0x240
 genl_family_rcv_msg_doit+0xd0/0x120
 genl_rcv_msg+0x188/0x2c0
 ? ethnl_ops_begin+0xb0/0xb0
 ? genl_family_rcv_msg_dumpit+0xf0/0xf0
 netlink_rcv_skb+0x54/0x100
 genl_rcv+0x24/0x40
 netlink_unicast+0x1a1/0x270
 netlink_sendmsg+0x214/0x460
 __sock_sendmsg+0x38/0x60
 __sys_sendto+0x113/0x170
 ? do_user_addr_fault+0x53f/0x8f0
 __x64_sys_sendto+0x20/0x30
 do_syscall_64+0x40/0xe0
 entry_SYSCALL_64_after_hwframe+0x46/0x4e
 </TASK>

Fixes: 45bf454 ("net/mlx5e: Enabling aRFS mechanism")
Signed-off-by: Carolina Jubran <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
Running a lot of VK CTS in parallel against nouveau, once every
few hours you might see something like this crash.

BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 8000000114e6e067 P4D 8000000114e6e067 PUD 109046067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ torvalds#27
Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1
RSP: 0000:ffffac20c5857838 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000004d8001 RCX: 0000000000000001
RDX: 00000000004d8001 RSI: 00000000000006d8 RDI: ffffa07afe332180
RBP: 00000000000006d8 R08: ffffac20c5857ad0 R09: 0000000000ffff10
R10: 0000000000000001 R11: ffffa07af27e2de0 R12: 000000000000001c
R13: ffffac20c5857ad0 R14: ffffa07a96fe9040 R15: 000000000000001c
FS:  00007fe395eed7c0(0000) GS:ffffa07e2c980000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000011febe001 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:

...

 ? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
 ? gp100_vmm_pgt_mem+0x37/0x180 [nouveau]
 nvkm_vmm_iter+0x351/0xa20 [nouveau]
 ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 ? __lock_acquire+0x3ed/0x2170
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau]
 ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 nvkm_vmm_map_locked+0x224/0x3a0 [nouveau]

Adding any sort of useful debug usually makes it go away, so I hand
wrote the function in a line, and debugged the asm.

Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in
the nv50_instobj_acquire called from nvkm_kmap.

If Thread A and Thread B both get to nv50_instobj_acquire around
the same time, and Thread A hits the refcount_set line, and in
lockstep thread B succeeds at refcount_inc_not_zero, there is a
chance the ptrs value won't have been stored since refcount_set
is unordered. Force a memory barrier here, I picked smp_mb, since
we want it on all CPUs and it's write followed by a read.

v2: use paired smp_rmb/smp_wmb.

Cc: <[email protected]>
Fixes: be55287 ("drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj")
Signed-off-by: Dave Airlie <[email protected]>
Signed-off-by: Danilo Krummrich <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
Currently normal HugeTLB fault ends up crashing the kernel, as p4dp derived
from p4d_offset() is an invalid address when PGTABLE_LEVEL = 5. A p4d level
entry needs to be allocated when not available while walking the page table
during HugeTLB faults. Let's call p4d_alloc() to allocate such entries when
required instead of current p4d_offset().

 Unable to handle kernel paging request at virtual address ffffffff80000000
 Mem abort info:
   ESR = 0x0000000096000005
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x05: level 1 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
 swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000081da9000
 [ffffffff80000000] pgd=1000000082cec003, p4d=0000000082c32003, pud=0000000000000000
 Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
 Modules linked in:
 CPU: 1 PID: 108 Comm: high_addr_hugep Not tainted 6.9.0-rc4 torvalds#48
 Hardware name: Foundation-v8A (DT)
 pstate: 01402005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
 pc : huge_pte_alloc+0xd4/0x334
 lr : hugetlb_fault+0x1b8/0xc68
 sp : ffff8000833bbc20
 x29: ffff8000833bbc20 x28: fff000080080cb58 x27: ffff800082a7cc58
 x26: 0000000000000000 x25: fff0000800378e40 x24: fff00008008d6c60
 x23: 00000000de9dbf07 x22: fff0000800378e40 x21: 0004000000000000
 x20: 0004000000000000 x19: ffffffff80000000 x18: 1ffe00010011d7a1
 x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000001
 x14: 0000000000000000 x13: ffff8000816120d0 x12: ffffffffffffffff
 x11: 0000000000000000 x10: fff00008008ebd0c x9 : 0004000000000000
 x8 : 0000000000001255 x7 : fff00008003e2000 x6 : 00000000061d54b0
 x5 : 0000000000001000 x4 : ffffffff80000000 x3 : 0000000000200000
 x2 : 0000000000000004 x1 : 0000000080000000 x0 : 0000000000000000
 Call trace:
 huge_pte_alloc+0xd4/0x334
 hugetlb_fault+0x1b8/0xc68
 handle_mm_fault+0x260/0x29c
 do_page_fault+0xfc/0x47c
 do_translation_fault+0x68/0x74
 do_mem_abort+0x44/0x94
 el0_da+0x2c/0x9c
 el0t_64_sync_handler+0x70/0xc4
 el0t_64_sync+0x190/0x194
 Code: aa000084 cb010084 b24c2c84 8b130c93 (f9400260)
 ---[ end trace 0000000000000000 ]---

Cc: Will Deacon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Cc: [email protected]
Fixes: a6bbf5d ("arm64: mm: Add definitions to support 5 levels of paging")
Reported-by: Dev Jain <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Signed-off-by: Anshuman Khandual <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
On arm64, UBSAN traps can be decoded from the trap instruction. Add the
add, sub, and mul overflow trap codes now that CONFIG_UBSAN_SIGNED_WRAP
exists. Seen under clang 19:

  Internal error: UBSAN: unrecognized failure code: 00000000f2005515 [#1] PREEMPT SMP

Reported-by: Nathan Chancellor <[email protected]>
Closes: https://lore.kernel.org/lkml/20240411-fix-ubsan-in-hardening-config-v1-0-e0177c80ffaa@kernel.org
Fixes: 557f8c5 ("ubsan: Reintroduce signed overflow sanitizer")
Tested-by: Nathan Chancellor <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
When I did hard offline test with hugetlb pages, below deadlock occurs:

======================================================
WARNING: possible circular locking dependency detected
6.8.0-11409-gf6cef5f8c37f #1 Not tainted
------------------------------------------------------
bash/46904 is trying to acquire lock:
ffffffffabe68910 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_dec+0x16/0x60

but task is already holding lock:
ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (pcp_batch_high_lock){+.+.}-{3:3}:
       __mutex_lock+0x6c/0x770
       page_alloc_cpu_online+0x3c/0x70
       cpuhp_invoke_callback+0x397/0x5f0
       __cpuhp_invoke_callback_range+0x71/0xe0
       _cpu_up+0xeb/0x210
       cpu_up+0x91/0xe0
       cpuhp_bringup_mask+0x49/0xb0
       bringup_nonboot_cpus+0xb7/0xe0
       smp_init+0x25/0xa0
       kernel_init_freeable+0x15f/0x3e0
       kernel_init+0x15/0x1b0
       ret_from_fork+0x2f/0x50
       ret_from_fork_asm+0x1a/0x30

-> #0 (cpu_hotplug_lock){++++}-{0:0}:
       __lock_acquire+0x1298/0x1cd0
       lock_acquire+0xc0/0x2b0
       cpus_read_lock+0x2a/0xc0
       static_key_slow_dec+0x16/0x60
       __hugetlb_vmemmap_restore_folio+0x1b9/0x200
       dissolve_free_huge_page+0x211/0x260
       __page_handle_poison+0x45/0xc0
       memory_failure+0x65e/0xc70
       hard_offline_page_store+0x55/0xa0
       kernfs_fop_write_iter+0x12c/0x1d0
       vfs_write+0x387/0x550
       ksys_write+0x64/0xe0
       do_syscall_64+0xca/0x1e0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(pcp_batch_high_lock);
                               lock(cpu_hotplug_lock);
                               lock(pcp_batch_high_lock);
  rlock(cpu_hotplug_lock);

 *** DEADLOCK ***

5 locks held by bash/46904:
 #0: ffff98f6c3bb23f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
 #1: ffff98f6c328e488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
 #2: ffff98ef83b31890 (kn->active#113){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
 #3: ffffffffabf9db48 (mf_mutex){+.+.}-{3:3}, at: memory_failure+0x44/0xc70
 #4: ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40

stack backtrace:
CPU: 10 PID: 46904 Comm: bash Kdump: loaded Not tainted 6.8.0-11409-gf6cef5f8c37f #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0xa0
 check_noncircular+0x129/0x140
 __lock_acquire+0x1298/0x1cd0
 lock_acquire+0xc0/0x2b0
 cpus_read_lock+0x2a/0xc0
 static_key_slow_dec+0x16/0x60
 __hugetlb_vmemmap_restore_folio+0x1b9/0x200
 dissolve_free_huge_page+0x211/0x260
 __page_handle_poison+0x45/0xc0
 memory_failure+0x65e/0xc70
 hard_offline_page_store+0x55/0xa0
 kernfs_fop_write_iter+0x12c/0x1d0
 vfs_write+0x387/0x550
 ksys_write+0x64/0xe0
 do_syscall_64+0xca/0x1e0
 entry_SYSCALL_64_after_hwframe+0x6d/0x75
RIP: 0033:0x7fc862314887
Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007fff19311268 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc862314887
RDX: 000000000000000c RSI: 000056405645fe10 RDI: 0000000000000001
RBP: 000056405645fe10 R08: 00007fc8623d1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
R13: 00007fc86241b780 R14: 00007fc862417600 R15: 00007fc862416a00

In short, below scene breaks the lock dependency chain:

 memory_failure
  __page_handle_poison
   zone_pcp_disable -- lock(pcp_batch_high_lock)
   dissolve_free_huge_page
    __hugetlb_vmemmap_restore_folio
     static_key_slow_dec
      cpus_read_lock -- rlock(cpu_hotplug_lock)

Fix this by calling drain_all_pages() instead.

This issue won't occur until commit a6b4085 ("mm: hugetlb: replace
hugetlb_free_vmemmap_enabled with a static_key").  As it introduced
rlock(cpu_hotplug_lock) in dissolve_free_huge_page() code path while
lock(pcp_batch_high_lock) is already in the __page_handle_poison().

[[email protected]: extend comment per Oscar]
[[email protected]: reflow block comment]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: a6b4085 ("mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key")
Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: Oscar Salvador <[email protected]>
Reviewed-by: Jane Chu <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
vhost_worker will call tun call backs to receive packets. If too many
illegal packets arrives, tun_do_read will keep dumping packet contents.
When console is enabled, it will costs much more cpu time to dump
packet and soft lockup will be detected.

net_ratelimit mechanism can be used to limit the dumping rate.

PID: 33036    TASK: ffff949da6f20000  CPU: 23   COMMAND: "vhost-32980"
 #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253
 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3
 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e
 #3 [fffffe00003fced0] do_nmi at ffffffff8922660d
 #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663
    [exception RIP: io_serial_in+20]
    RIP: ffffffff89792594  RSP: ffffa655314979e8  RFLAGS: 00000002
    RAX: ffffffff89792500  RBX: ffffffff8af428a0  RCX: 0000000000000000
    RDX: 00000000000003fd  RSI: 0000000000000005  RDI: ffffffff8af428a0
    RBP: 0000000000002710   R8: 0000000000000004   R9: 000000000000000f
    R10: 0000000000000000  R11: ffffffff8acbf64f  R12: 0000000000000020
    R13: ffffffff8acbf698  R14: 0000000000000058  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594
 torvalds#6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470
 torvalds#7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6
 torvalds#8 [ffffa65531497a20] uart_console_write at ffffffff8978b605
 torvalds#9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558
 torvalds#10 [ffffa65531497ac8] console_unlock at ffffffff89316124
 torvalds#11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07
 torvalds#12 [ffffa65531497b68] printk at ffffffff89318306
 torvalds#13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765
 torvalds#14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun]
 torvalds#15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun]
 torvalds#16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net]
 torvalds#17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost]
 torvalds#18 [ffffa65531497f10] kthread at ffffffff892d2e72
 torvalds#19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f

Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors")
Signed-off-by: Lei Chen <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
…git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patch #1 amends a missing spot where the set iterator type is unset.
	 This is fixing a issue in the previous pull request.

Patch #2 fixes the delete set command abort path by restoring state
         of the elements. Reverse logic for the activate (abort) case
	 otherwise element state is not restored, this requires to move
	 the check for active/inactive elements to the set iterator
	 callback. From the deactivate path, toggle the next generation
	 bit and from the activate (abort) path, clear the next generation
	 bitmask.

Patch #3 skips elements already restored by delete set command from the
	 abort path in case there is a previous delete element command in
	 the batch. Check for the next generation bit just like it is done
	 via set iteration to restore maps.

netfilter pull request 24-04-18

* tag 'nf-24-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: fix memleak in map from abort path
  netfilter: nf_tables: restore set elements when delete set fails
  netfilter: nf_tables: missing iterator type in lookup walk
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
On arm64 machines, swsusp_save() faults if it attempts to access
MEMBLOCK_NOMAP memory ranges. This can be reproduced in QEMU using UEFI
when booting with rodata=off debug_pagealloc=off and CONFIG_KFENCE=n:

  Unable to handle kernel paging request at virtual address ffffff8000000000
  Mem abort info:
    ESR = 0x0000000096000007
    EC = 0x25: DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
    FSC = 0x07: level 3 translation fault
  Data abort info:
    ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
    CM = 0, WnR = 0, TnD = 0, TagAccess = 0
    GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
  swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000eeb0b000
  [ffffff8000000000] pgd=180000217fff9803, p4d=180000217fff9803, pud=180000217fff9803, pmd=180000217fff8803, pte=0000000000000000
  Internal error: Oops: 0000000096000007 [#1] SMP
  Internal error: Oops: 0000000096000007 [#1] SMP
  Modules linked in: xt_multiport ipt_REJECT nf_reject_ipv4 xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter bpfilter rfkill at803x snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg dwmac_generic stmmac_platform snd_hda_codec stmmac joydev pcs_xpcs snd_hda_core phylink ppdev lp parport ramoops reed_solomon ip_tables x_tables nls_iso8859_1 vfat multipath linear amdgpu amdxcp drm_exec gpu_sched drm_buddy hid_generic usbhid hid radeon video drm_suballoc_helper drm_ttm_helper ttm i2c_algo_bit drm_display_helper cec drm_kms_helper drm
  CPU: 0 PID: 3663 Comm: systemd-sleep Not tainted 6.6.2+ torvalds#76
  Source Version: 4e22ed63a0a48e7a7cff9b98b7806d8d4add7dc0
  Hardware name: Greatwall GW-XXXXXX-XXX/GW-XXXXXX-XXX, BIOS KunLun BIOS V4.0 01/19/2021
  pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : swsusp_save+0x280/0x538
  lr : swsusp_save+0x280/0x538
  sp : ffffffa034a3fa40
  x29: ffffffa034a3fa40 x28: ffffff8000001000 x27: 0000000000000000
  x26: ffffff8001400000 x25: ffffffc08113e248 x24: 0000000000000000
  x23: 0000000000080000 x22: ffffffc08113e280 x21: 00000000000c69f2
  x20: ffffff8000000000 x19: ffffffc081ae2500 x18: 0000000000000000
  x17: 6666662074736420 x16: 3030303030303030 x15: 3038666666666666
  x14: 0000000000000b69 x13: ffffff9f89088530 x12: 00000000ffffffea
  x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffffffc08193f0d0
  x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 0000000000000001
  x5 : ffffffa0fff09dc8 x4 : 0000000000000000 x3 : 0000000000000027
  x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000004e
  Call trace:
   swsusp_save+0x280/0x538
   swsusp_arch_suspend+0x148/0x190
   hibernation_snapshot+0x240/0x39c
   hibernate+0xc4/0x378
   state_store+0xf0/0x10c
   kobj_attr_store+0x14/0x24

The reason is swsusp_save() -> copy_data_pages() -> page_is_saveable()
-> kernel_page_present() assuming that a page is always present when
can_set_direct_map() is false (all of rodata_full,
debug_pagealloc_enabled() and arm64_kfence_can_set_direct_map() false),
irrespective of the MEMBLOCK_NOMAP ranges. Such MEMBLOCK_NOMAP regions
should not be saved during hibernation.

This problem was introduced by changes to the pfn_valid() logic in
commit a7d9f30 ("arm64: drop pfn_valid_within() and simplify
pfn_valid()").

Similar to other architectures, drop the !can_set_direct_map() check in
kernel_page_present() so that page_is_savable() skips such pages.

Fixes: a7d9f30 ("arm64: drop pfn_valid_within() and simplify pfn_valid()")
Cc: <[email protected]> # 5.14.x
Suggested-by: Mike Rapoport <[email protected]>
Suggested-by: Catalin Marinas <[email protected]>
Co-developed-by: xiongxin <[email protected]>
Signed-off-by: xiongxin <[email protected]>
Signed-off-by: Yaxiong Tian <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[[email protected]: rework commit message]
Signed-off-by: Catalin Marinas <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
Petr Machata says:

====================
mlxsw: Fixes

This patchset fixes the following issues:

- During driver de-initialization the driver unregisters the EMAD
  response trap by setting its action to DISCARD. However the manual
  only permits TRAP and FORWARD, and future firmware versions will
  enforce this.

  In patch #1, suppress the error message by aligning the driver to the
  manual and use a FORWARD (NOP) action when unregistering the trap.

- The driver queries the Management Capabilities Mask (MCAM) register
  during initialization to understand if certain features are supported.

  However, not all firmware versions support this register, leading to
  the driver failing to load.

  Patches #2 and #3 fix this issue by treating an error in the register
  query as an indication that the feature is not supported.

v2:
- Patch #2:
    - Make mlxsw_env_max_module_eeprom_len_query() void
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
syzbot was able to trigger a NULL deref in fib_validate_source()
in an old tree [1].

It appears the bug exists in latest trees.

All calls to __in_dev_get_rcu() must be checked for a NULL result.

[1]
general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 2 PID: 3257 Comm: syz-executor.3 Not tainted 5.10.0-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
 RIP: 0010:fib_validate_source+0xbf/0x15a0 net/ipv4/fib_frontend.c:425
Code: 18 f2 f2 f2 f2 42 c7 44 20 23 f3 f3 f3 f3 48 89 44 24 78 42 c6 44 20 27 f3 e8 5d 88 48 fc 4c 89 e8 48 c1 e8 03 48 89 44 24 18 <42> 80 3c 20 00 74 08 4c 89 ef e8 d2 15 98 fc 48 89 5c 24 10 41 bf
RSP: 0018:ffffc900015fee40 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88800f7a4000 RCX: ffff88800f4f90c0
RDX: 0000000000000000 RSI: 0000000004001eac RDI: ffff8880160c64c0
RBP: ffffc900015ff060 R08: 0000000000000000 R09: ffff88800f7a4000
R10: 0000000000000002 R11: ffff88800f4f90c0 R12: dffffc0000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88800f7a4000
FS:  00007f938acfe6c0(0000) GS:ffff888058c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f938acddd58 CR3: 000000001248e000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  ip_route_use_hint+0x410/0x9b0 net/ipv4/route.c:2231
  ip_rcv_finish_core+0x2c4/0x1a30 net/ipv4/ip_input.c:327
  ip_list_rcv_finish net/ipv4/ip_input.c:612 [inline]
  ip_sublist_rcv+0x3ed/0xe50 net/ipv4/ip_input.c:638
  ip_list_rcv+0x422/0x470 net/ipv4/ip_input.c:673
  __netif_receive_skb_list_ptype net/core/dev.c:5572 [inline]
  __netif_receive_skb_list_core+0x6b1/0x890 net/core/dev.c:5620
  __netif_receive_skb_list net/core/dev.c:5672 [inline]
  netif_receive_skb_list_internal+0x9f9/0xdc0 net/core/dev.c:5764
  netif_receive_skb_list+0x55/0x3e0 net/core/dev.c:5816
  xdp_recv_frames net/bpf/test_run.c:257 [inline]
  xdp_test_run_batch net/bpf/test_run.c:335 [inline]
  bpf_test_run_xdp_live+0x1818/0x1d00 net/bpf/test_run.c:363
  bpf_prog_test_run_xdp+0x81f/0x1170 net/bpf/test_run.c:1376
  bpf_prog_test_run+0x349/0x3c0 kernel/bpf/syscall.c:3736
  __sys_bpf+0x45c/0x710 kernel/bpf/syscall.c:5115
  __do_sys_bpf kernel/bpf/syscall.c:5201 [inline]
  __se_sys_bpf kernel/bpf/syscall.c:5199 [inline]
  __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5199

Fixes: 02b2494 ("ipv4: use dst hint for ipv4 list receive")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
If stack_depot_save_flags() allocates memory it always drops
__GFP_NOLOCKDEP flag.  So when KASAN tries to track __GFP_NOLOCKDEP
allocation we may end up with lockdep splat like bellow:

======================================================
 WARNING: possible circular locking dependency detected
 6.9.0-rc3+ torvalds#49 Not tainted
 ------------------------------------------------------
 kswapd0/149 is trying to acquire lock:
 ffff88811346a920
(&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_reclaim_inode+0x3ac/0x590
[xfs]

 but task is already holding lock:
 ffffffff8bb33100 (fs_reclaim){+.+.}-{0:0}, at:
balance_pgdat+0x5d9/0xad0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:
 -> #1 (fs_reclaim){+.+.}-{0:0}:
        __lock_acquire+0x7da/0x1030
        lock_acquire+0x15d/0x400
        fs_reclaim_acquire+0xb5/0x100
 prepare_alloc_pages.constprop.0+0xc5/0x230
        __alloc_pages+0x12a/0x3f0
        alloc_pages_mpol+0x175/0x340
        stack_depot_save_flags+0x4c5/0x510
        kasan_save_stack+0x30/0x40
        kasan_save_track+0x10/0x30
        __kasan_slab_alloc+0x83/0x90
        kmem_cache_alloc+0x15e/0x4a0
        __alloc_object+0x35/0x370
        __create_object+0x22/0x90
 __kmalloc_node_track_caller+0x477/0x5b0
        krealloc+0x5f/0x110
        xfs_iext_insert_raw+0x4b2/0x6e0 [xfs]
        xfs_iext_insert+0x2e/0x130 [xfs]
        xfs_iread_bmbt_block+0x1a9/0x4d0 [xfs]
        xfs_btree_visit_block+0xfb/0x290 [xfs]
        xfs_btree_visit_blocks+0x215/0x2c0 [xfs]
        xfs_iread_extents+0x1a2/0x2e0 [xfs]
 xfs_buffered_write_iomap_begin+0x376/0x10a0 [xfs]
        iomap_iter+0x1d1/0x2d0
 iomap_file_buffered_write+0x120/0x1a0
        xfs_file_buffered_write+0x128/0x4b0 [xfs]
        vfs_write+0x675/0x890
        ksys_write+0xc3/0x160
        do_syscall_64+0x94/0x170
 entry_SYSCALL_64_after_hwframe+0x71/0x79

Always preserve __GFP_NOLOCKDEP to fix this.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: cd11016 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
Signed-off-by: Andrey Ryabinin <[email protected]>
Reported-by: Xiubo Li <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Reported-by: Damien Le Moal <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Suggested-by: Dave Chinner <[email protected]>
Tested-by: Xiubo Li <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
Issue reported by customer during SRIOV testing, call trace:
When both i40e and the i40iw driver are loaded, a warning
in check_flush_dependency is being triggered. This seems
to be because of the i40e driver workqueue is allocated with
the WQ_MEM_RECLAIM flag, and the i40iw one is not.

Similar error was encountered on ice too and it was fixed by
removing the flag. Do the same for i40e too.

[Feb 9 09:08] ------------[ cut here ]------------
[  +0.000004] workqueue: WQ_MEM_RECLAIM i40e:i40e_service_task [i40e] is
flushing !WQ_MEM_RECLAIM infiniband:0x0
[  +0.000060] WARNING: CPU: 0 PID: 937 at kernel/workqueue.c:2966
check_flush_dependency+0x10b/0x120
[  +0.000007] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq
snd_timer snd_seq_device snd soundcore nls_utf8 cifs cifs_arc4
nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 dns_resolver netfs qrtr
rfkill sunrpc vfat fat intel_rapl_msr intel_rapl_common irdma
intel_uncore_frequency intel_uncore_frequency_common ice ipmi_ssif
isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal
intel_powerclamp gnss coretemp ib_uverbs rapl intel_cstate ib_core
iTCO_wdt iTCO_vendor_support acpi_ipmi mei_me ipmi_si intel_uncore
ioatdma i2c_i801 joydev pcspkr mei ipmi_devintf lpc_ich
intel_pch_thermal i2c_smbus ipmi_msghandler acpi_power_meter acpi_pad
xfs libcrc32c ast sd_mod drm_shmem_helper t10_pi drm_kms_helper sg ixgbe
drm i40e ahci crct10dif_pclmul libahci crc32_pclmul igb crc32c_intel
libata ghash_clmulni_intel i2c_algo_bit mdio dca wmi dm_mirror
dm_region_hash dm_log dm_mod fuse
[  +0.000050] CPU: 0 PID: 937 Comm: kworker/0:3 Kdump: loaded Not
tainted 6.8.0-rc2-Feb-net_dev-Qiueue-00279-gbd43c5687e05 #1
[  +0.000003] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS
SE5C620.86B.02.01.0013.121520200651 12/15/2020
[  +0.000001] Workqueue: i40e i40e_service_task [i40e]
[  +0.000024] RIP: 0010:check_flush_dependency+0x10b/0x120
[  +0.000003] Code: ff 49 8b 54 24 18 48 8d 8b b0 00 00 00 49 89 e8 48
81 c6 b0 00 00 00 48 c7 c7 b0 97 fa 9f c6 05 8a cc 1f 02 01 e8 35 b3 fd
ff <0f> 0b e9 10 ff ff ff 80 3d 78 cc 1f 02 00 75 94 e9 46 ff ff ff 90
[  +0.000002] RSP: 0018:ffffbd294976bcf8 EFLAGS: 00010282
[  +0.000002] RAX: 0000000000000000 RBX: ffff94d4c483c000 RCX:
0000000000000027
[  +0.000001] RDX: ffff94d47f620bc8 RSI: 0000000000000001 RDI:
ffff94d47f620bc0
[  +0.000001] RBP: 0000000000000000 R08: 0000000000000000 R09:
00000000ffff7fff
[  +0.000001] R10: ffffbd294976bb98 R11: ffffffffa0be65e8 R12:
ffff94c5451ea180
[  +0.000001] R13: ffff94c5ab5e8000 R14: ffff94c5c20b6e05 R15:
ffff94c5f1330ab0
[  +0.000001] FS:  0000000000000000(0000) GS:ffff94d47f600000(0000)
knlGS:0000000000000000
[  +0.000002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000001] CR2: 00007f9e6f1fca70 CR3: 0000000038e20004 CR4:
00000000007706f0
[  +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  +0.000001] PKRU: 55555554
[  +0.000001] Call Trace:
[  +0.000001]  <TASK>
[  +0.000002]  ? __warn+0x80/0x130
[  +0.000003]  ? check_flush_dependency+0x10b/0x120
[  +0.000002]  ? report_bug+0x195/0x1a0
[  +0.000005]  ? handle_bug+0x3c/0x70
[  +0.000003]  ? exc_invalid_op+0x14/0x70
[  +0.000002]  ? asm_exc_invalid_op+0x16/0x20
[  +0.000006]  ? check_flush_dependency+0x10b/0x120
[  +0.000002]  ? check_flush_dependency+0x10b/0x120
[  +0.000002]  __flush_workqueue+0x126/0x3f0
[  +0.000015]  ib_cache_cleanup_one+0x1c/0xe0 [ib_core]
[  +0.000056]  __ib_unregister_device+0x6a/0xb0 [ib_core]
[  +0.000023]  ib_unregister_device_and_put+0x34/0x50 [ib_core]
[  +0.000020]  i40iw_close+0x4b/0x90 [irdma]
[  +0.000022]  i40e_notify_client_of_netdev_close+0x54/0xc0 [i40e]
[  +0.000035]  i40e_service_task+0x126/0x190 [i40e]
[  +0.000024]  process_one_work+0x174/0x340
[  +0.000003]  worker_thread+0x27e/0x390
[  +0.000001]  ? __pfx_worker_thread+0x10/0x10
[  +0.000002]  kthread+0xdf/0x110
[  +0.000002]  ? __pfx_kthread+0x10/0x10
[  +0.000002]  ret_from_fork+0x2d/0x50
[  +0.000003]  ? __pfx_kthread+0x10/0x10
[  +0.000001]  ret_from_fork_asm+0x1b/0x30
[  +0.000004]  </TASK>
[  +0.000001] ---[ end trace 0000000000000000 ]---

Fixes: 4d5957c ("i40e: remove WQ_UNBOUND and the task limit of our workqueue")
Signed-off-by: Sindhu Devale <[email protected]>
Reviewed-by: Arkadiusz Kubalewski <[email protected]>
Reviewed-by: Mateusz Polchlopek <[email protected]>
Signed-off-by: Aleksandr Loktionov <[email protected]>
Tested-by: Robert Ganzynkowicz <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
9f74a3d ("ice: Fix VF Reset paths when interface in a failed over
aggregate"), the ice driver has acquired the LAG mutex in ice_reset_vf().
The commit placed this lock acquisition just prior to the acquisition of
the VF configuration lock.

If ice_reset_vf() acquires the configuration lock via the ICE_VF_RESET_LOCK
flag, this could deadlock with ice_vc_cfg_qs_msg() because it always
acquires the locks in the order of the VF configuration lock and then the
LAG mutex.

Lockdep reports this violation almost immediately on creating and then
removing 2 VF:

======================================================
WARNING: possible circular locking dependency detected
6.8.0-rc6 torvalds#54 Tainted: G        W  O
------------------------------------------------------
kworker/60:3/6771 is trying to acquire lock:
ff40d43e099380a0 (&vf->cfg_lock){+.+.}-{3:3}, at: ice_reset_vf+0x22f/0x4d0 [ice]

but task is already holding lock:
ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&pf->lag_mutex){+.+.}-{3:3}:
       __lock_acquire+0x4f8/0xb40
       lock_acquire+0xd4/0x2d0
       __mutex_lock+0x9b/0xbf0
       ice_vc_cfg_qs_msg+0x45/0x690 [ice]
       ice_vc_process_vf_msg+0x4f5/0x870 [ice]
       __ice_clean_ctrlq+0x2b5/0x600 [ice]
       ice_service_task+0x2c9/0x480 [ice]
       process_one_work+0x1e9/0x4d0
       worker_thread+0x1e1/0x3d0
       kthread+0x104/0x140
       ret_from_fork+0x31/0x50
       ret_from_fork_asm+0x1b/0x30

-> #0 (&vf->cfg_lock){+.+.}-{3:3}:
       check_prev_add+0xe2/0xc50
       validate_chain+0x558/0x800
       __lock_acquire+0x4f8/0xb40
       lock_acquire+0xd4/0x2d0
       __mutex_lock+0x9b/0xbf0
       ice_reset_vf+0x22f/0x4d0 [ice]
       ice_process_vflr_event+0x98/0xd0 [ice]
       ice_service_task+0x1cc/0x480 [ice]
       process_one_work+0x1e9/0x4d0
       worker_thread+0x1e1/0x3d0
       kthread+0x104/0x140
       ret_from_fork+0x31/0x50
       ret_from_fork_asm+0x1b/0x30

other info that might help us debug this:
 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&pf->lag_mutex);
                               lock(&vf->cfg_lock);
                               lock(&pf->lag_mutex);
  lock(&vf->cfg_lock);

 *** DEADLOCK ***
4 locks held by kworker/60:3/6771:
 #0: ff40d43e05428b38 ((wq_completion)ice){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0
 #1: ff50d06e05197e58 ((work_completion)(&pf->serv_task)){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0
 #2: ff40d43ea1960e50 (&pf->vfs.table_lock){+.+.}-{3:3}, at: ice_process_vflr_event+0x48/0xd0 [ice]
 #3: ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice]

stack backtrace:
CPU: 60 PID: 6771 Comm: kworker/60:3 Tainted: G        W  O       6.8.0-rc6 torvalds#54
Hardware name:
Workqueue: ice ice_service_task [ice]
Call Trace:
 <TASK>
 dump_stack_lvl+0x4a/0x80
 check_noncircular+0x12d/0x150
 check_prev_add+0xe2/0xc50
 ? save_trace+0x59/0x230
 ? add_chain_cache+0x109/0x450
 validate_chain+0x558/0x800
 __lock_acquire+0x4f8/0xb40
 ? lockdep_hardirqs_on+0x7d/0x100
 lock_acquire+0xd4/0x2d0
 ? ice_reset_vf+0x22f/0x4d0 [ice]
 ? lock_is_held_type+0xc7/0x120
 __mutex_lock+0x9b/0xbf0
 ? ice_reset_vf+0x22f/0x4d0 [ice]
 ? ice_reset_vf+0x22f/0x4d0 [ice]
 ? rcu_is_watching+0x11/0x50
 ? ice_reset_vf+0x22f/0x4d0 [ice]
 ice_reset_vf+0x22f/0x4d0 [ice]
 ? process_one_work+0x176/0x4d0
 ice_process_vflr_event+0x98/0xd0 [ice]
 ice_service_task+0x1cc/0x480 [ice]
 process_one_work+0x1e9/0x4d0
 worker_thread+0x1e1/0x3d0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x104/0x140
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x31/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>

To avoid deadlock, we must acquire the LAG mutex only after acquiring the
VF configuration lock. Fix the ice_reset_vf() to acquire the LAG mutex only
after we either acquire or check that the VF configuration lock is held.

Fixes: 9f74a3d ("ice: Fix VF Reset paths when interface in a failed over aggregate")
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Dave Ertman <[email protected]>
Reviewed-by: Mateusz Polchlopek <[email protected]>
Tested-by: Przemek Kitszel <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
…nix_gc().

syzbot reported a lockdep splat regarding unix_gc_lock and
unix_state_lock().

One is called from recvmsg() for a connected socket, and another
is called from GC for TCP_LISTEN socket.

So, the splat is false-positive.

Let's add a dedicated lock class for the latter to suppress the splat.

Note that this change is not necessary for net-next.git as the issue
is only applied to the old GC impl.

[0]:
WARNING: possible circular locking dependency detected
6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Not tainted
 -----------------------------------------------------
kworker/u8:1/11 is trying to acquire lock:
ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302

but task is already holding lock:
ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

 -> #1 (unix_gc_lock){+.+.}-{2:2}:
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
       __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
       _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
       spin_lock include/linux/spinlock.h:351 [inline]
       unix_notinflight+0x13d/0x390 net/unix/garbage.c:140
       unix_detach_fds net/unix/af_unix.c:1819 [inline]
       unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876
       skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188
       skb_release_all net/core/skbuff.c:1200 [inline]
       __kfree_skb net/core/skbuff.c:1216 [inline]
       kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252
       kfree_skb include/linux/skbuff.h:1262 [inline]
       manage_oob net/unix/af_unix.c:2672 [inline]
       unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749
       unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981
       do_splice_read fs/splice.c:985 [inline]
       splice_file_to_pipe+0x299/0x500 fs/splice.c:1295
       do_splice+0xf2d/0x1880 fs/splice.c:1379
       __do_splice fs/splice.c:1436 [inline]
       __do_sys_splice fs/splice.c:1652 [inline]
       __se_sys_splice+0x331/0x4a0 fs/splice.c:1634
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

 -> #0 (&u->lock){+.+.}-{2:2}:
       check_prev_add kernel/locking/lockdep.c:3134 [inline]
       check_prevs_add kernel/locking/lockdep.c:3253 [inline]
       validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
       __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
       __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
       _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
       spin_lock include/linux/spinlock.h:351 [inline]
       __unix_gc+0x40e/0xf70 net/unix/garbage.c:302
       process_one_work kernel/workqueue.c:3254 [inline]
       process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
       worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
       kthread+0x2f0/0x390 kernel/kthread.c:388
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(unix_gc_lock);
                               lock(&u->lock);
                               lock(unix_gc_lock);
  lock(&u->lock);

 *** DEADLOCK ***

3 locks held by kworker/u8:1/11:
 #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline]
 #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335
 #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline]
 #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335
 #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
 #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261

stack backtrace:
CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Workqueue: events_unbound __unix_gc
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
 check_prev_add kernel/locking/lockdep.c:3134 [inline]
 check_prevs_add kernel/locking/lockdep.c:3253 [inline]
 validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
 _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
 spin_lock include/linux/spinlock.h:351 [inline]
 __unix_gc+0x40e/0xf70 net/unix/garbage.c:302
 process_one_work kernel/workqueue.c:3254 [inline]
 process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
 kthread+0x2f0/0x390 kernel/kthread.c:388
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>

Fixes: 47d8ac0 ("af_unix: Fix garbage collector racing against connect()")
Reported-and-tested-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
…git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains two Netfilter/IPVS fixes for net:

Patch #1 fixes SCTP checksumming for IPVS with gso packets,
	 from Ismael Luceno.

Patch #2 honor dormant flag from netdev event path to fix a possible
	 double hook unregistration.

* tag 'nf-24-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: honor table dormant flag from netdev release event path
  ipvs: Fix checksumming on GSO of SCTP packets
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
MingcongBai pushed a commit that referenced this pull request Apr 30, 2024
LoongArch inclusion
category: feature

--------------------------------

The old memory should be reserved after efi_runtime_init() to avoid destroying
the EFI space and causing failure when executing svam().

Fix the following problems when executing kdump:

[    0.000000] The BIOS Version: Loongson-UDK2018-V2.0.04082-beta7
[    0.000000] CPU 0 Unable to handle kernel paging request at virtual address 00000000fdeb0e7c, era == 00000000fdeb0e7c, ra == 90000000dae6585c
[    0.000000] Oops[#1]:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.137+ torvalds#86
[    0.000000] Hardware name: Loongson Loongson-3A5000-7A1000-1w-A2101/Loongson-LS3A5000-7A1000-1w-A2101, BIOS vUDK2018-LoongArch-V2.0.pre-beta8 06/15/2022
[    0.000000] $ 0   : 0000000000000000 90000000dae6585c 90000000db200000 90000000db203840
[    0.000000] $ 4   : 0000000000000078 0000000000000028 0000000000000001 00000000db203860
[    0.000000] $ 8   : 0000000000000000 0000000000000040 90000000db203680 0000000000000000
[    0.000000] $12   : 00000000fdeb0e7c ffffffffffffffc0 00000000fbffffff 0000000020000000
[    0.000000] $16   : 000000000003e780 0000000020000000 90000000dad8c348 0000000000003fff
[    0.000000] $20   : 0000000000000018 90000000dad8bdd0 90000000db203850 0000000000000040
[    0.000000] $24   : 000000000000000f 90000000db21a570 90000000daeb07a0 90000000db217000
[    0.000000] $28   : 90000000db203858 0000000001ffffff 90000000db2171b0 0000000000000040
[    0.000000] era   : 00000000fdeb0e7c 0xfdeb0e7c
[    0.000000] ra    : 90000000dae6585c set_virtual_map.isra.0+0x23c/0x394
[    0.000000] CSR crmd: 90000000db21a570
[    0.000000] CSR prmd: 00000000
[    0.000000] CSR euen: 00000000
[    0.000000] CSR ecfg: 90000000db203850
[    0.000000] CSR estat: 90000000dae65800
[    0.000000] ExcCode : 26 (SubCode 16b)
[    0.000000] PrId  : 0014c012 (Loongson-64bit)
[    0.000000] Modules linked in:
[    0.000000] Process swapper (pid: 0, threadinfo=(____ptrval____), task=(____ptrval____))
[    0.000000] Stack : 0000000000000001 00000000fdeb0e7c 0000000000036780 000000000003e780
[    0.000000]         0000000000000006 0000000010000000 8000000010000000 0000000000010000
[    0.000000]         8000000000000001 0000000000000005 00000000fde40000 90000000fde40000
[    0.000000]         0000000000000100 800000000000000f 0000000000000006 00000000fdf40000
[    0.000000]         90000000fdf40000 0000000000000300 800000000000000f 00000000000000b0
[    0.000000]         0000000000000001 90000000da094cf0 0000000000000000 ffffffffffffffea
[    0.000000]         90000000db2039b8 ffff0a1000000609 0000000000000035 0000000000000030
[    0.000000]         90000000dad7b258 0000000000000400 00000000000000b0 ffff0a1000000609
[    0.000000]         90000000db2039a8 90000000db095730 000000007fffffff ffff0a1000000609
[    0.000000]         90000000db203a90 90000000db203a30 90000000db2039d8 90000000db09570b
[    0.000000]         ...
[    0.000000] Call Trace:
[    0.000000]
[    0.000000] Code: (Bad address in era)
[    0.000000]
[    0.000000]

Signed-off-by: Youling Tang <[email protected]>
Signed-off-by: Tiezhu Yang <[email protected]>
@MingcongBai
Copy link

Picked on the rc6 branch, thank you!

MingcongBai pushed a commit that referenced this pull request May 6, 2024
Lockdep detects a possible deadlock as listed below. This is because it
detects the IA55 interrupt controller .irq_eoi() API is called from
interrupt context while configuration-specific API (e.g., .irq_enable())
could be called from process context on resume path (by calling
rzg2l_gpio_irq_restore()). To avoid this, protect the call of
rzg2l_gpio_irq_enable() with spin_lock_irqsave()/spin_unlock_irqrestore().
With this the same approach that is available in __setup_irq() is mimicked
to pinctrl IRQ resume function.

Below is the lockdep report:

    WARNING: inconsistent lock state
    6.8.0-rc5-next-20240219-arm64-renesas-00030-gb17a289abf1f torvalds#90 Not tainted
    --------------------------------
    inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
    str_rwdt_t_001./159 [HC0[0]:SC0[0]:HE1:SE1] takes:
    ffff00000b001d70 (&rzg2l_irqc_data->lock){?...}-{2:2}, at: rzg2l_irqc_irq_enable+0x60/0xa4
    {IN-HARDIRQ-W} state was registered at:
    lock_acquire+0x1e0/0x310
    _raw_spin_lock+0x44/0x58
    rzg2l_irqc_eoi+0x2c/0x130
    irq_chip_eoi_parent+0x18/0x20
    rzg2l_gpio_irqc_eoi+0xc/0x14
    handle_fasteoi_irq+0x134/0x230
    generic_handle_domain_irq+0x28/0x3c
    gic_handle_irq+0x4c/0xbc
    call_on_irq_stack+0x24/0x34
    do_interrupt_handler+0x78/0x7c
    el1_interrupt+0x30/0x5c
    el1h_64_irq_handler+0x14/0x1c
    el1h_64_irq+0x64/0x68
    _raw_spin_unlock_irqrestore+0x34/0x70
    __setup_irq+0x4d4/0x6b8
    request_threaded_irq+0xe8/0x1a0
    request_any_context_irq+0x60/0xb8
    devm_request_any_context_irq+0x74/0x104
    gpio_keys_probe+0x374/0xb08
    platform_probe+0x64/0xcc
    really_probe+0x140/0x2ac
    __driver_probe_device+0x74/0x124
    driver_probe_device+0x3c/0x15c
    __driver_attach+0xec/0x1c4
    bus_for_each_dev+0x70/0xcc
    driver_attach+0x20/0x28
    bus_add_driver+0xdc/0x1d0
    driver_register+0x5c/0x118
    __platform_driver_register+0x24/0x2c
    gpio_keys_init+0x18/0x20
    do_one_initcall+0x70/0x290
    kernel_init_freeable+0x294/0x504
    kernel_init+0x20/0x1cc
    ret_from_fork+0x10/0x20
    irq event stamp: 69071
    hardirqs last enabled at (69071): [<ffff800080e0dafc>] _raw_spin_unlock_irqrestore+0x6c/0x70
    hardirqs last disabled at (69070): [<ffff800080e0cfec>] _raw_spin_lock_irqsave+0x7c/0x80
    softirqs last enabled at (67654): [<ffff800080010614>] __do_softirq+0x494/0x4dc
    softirqs last disabled at (67645): [<ffff800080015238>] ____do_softirq+0xc/0x14

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(&rzg2l_irqc_data->lock);
    <Interrupt>
    lock(&rzg2l_irqc_data->lock);

    *** DEADLOCK ***

    4 locks held by str_rwdt_t_001./159:
    #0: ffff00000b10f3f0 (sb_writers#4){.+.+}-{0:0}, at: vfs_write+0x1a4/0x35c
    #1: ffff00000e43ba88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xe8/0x1a8
    #2: ffff00000aa21dc8 (kn->active#40){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xf0/0x1a8
    #3: ffff80008179d970 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend+0x9c/0x278

    stack backtrace:
    CPU: 0 PID: 159 Comm: str_rwdt_t_001. Not tainted 6.8.0-rc5-next-20240219-arm64-renesas-00030-gb17a289abf1f torvalds#90
    Hardware name: Renesas SMARC EVK version 2 based on r9a08g045s33 (DT)
    Call trace:
    dump_backtrace+0x94/0xe8
    show_stack+0x14/0x1c
    dump_stack_lvl+0x88/0xc4
    dump_stack+0x14/0x1c
    print_usage_bug.part.0+0x294/0x348
    mark_lock+0x6b0/0x948
    __lock_acquire+0x750/0x20b0
    lock_acquire+0x1e0/0x310
    _raw_spin_lock+0x44/0x58
    rzg2l_irqc_irq_enable+0x60/0xa4
    irq_chip_enable_parent+0x1c/0x34
    rzg2l_gpio_irq_enable+0xc4/0xd8
    rzg2l_pinctrl_resume_noirq+0x4cc/0x520
    pm_generic_resume_noirq+0x28/0x3c
    genpd_finish_resume+0xc0/0xdc
    genpd_resume_noirq+0x14/0x1c
    dpm_run_callback+0x34/0x90
    device_resume_noirq+0xa8/0x268
    dpm_noirq_resume_devices+0x13c/0x160
    dpm_resume_noirq+0xc/0x1c
    suspend_devices_and_enter+0x2c8/0x570
    pm_suspend+0x1ac/0x278
    state_store+0x88/0x124
    kobj_attr_store+0x14/0x24
    sysfs_kf_write+0x48/0x6c
    kernfs_fop_write_iter+0x118/0x1a8
    vfs_write+0x270/0x35c
    ksys_write+0x64/0xec
    __arm64_sys_write+0x18/0x20
    invoke_syscall+0x44/0x108
    el0_svc_common.constprop.0+0xb4/0xd4
    do_el0_svc+0x18/0x20
    el0_svc+0x3c/0xb8
    el0t_64_sync_handler+0xb8/0xbc
    el0t_64_sync+0x14c/0x150

Fixes: 254203f ("pinctrl: renesas: rzg2l: Add suspend/resume support")
Signed-off-by: Claudiu Beznea <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Geert Uytterhoeven <[email protected]>
MingcongBai pushed a commit that referenced this pull request May 6, 2024
Commit 1548036 ("nfs: make the rpc_stat per net namespace") added
functionality to specify rpc_stats function but missed adding it to the
TCP TLS functionality. As the result, mounting with xprtsec=tls lead to
the following kernel oops.

[  128.984192] Unable to handle kernel NULL pointer dereference at
virtual address 000000000000001c
[  128.985058] Mem abort info:
[  128.985372]   ESR = 0x0000000096000004
[  128.985709]   EC = 0x25: DABT (current EL), IL = 32 bits
[  128.986176]   SET = 0, FnV = 0
[  128.986521]   EA = 0, S1PTW = 0
[  128.986804]   FSC = 0x04: level 0 translation fault
[  128.987229] Data abort info:
[  128.987597]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  128.988169]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  128.988811]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  128.989302] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000106c84000
[  128.990048] [000000000000001c] pgd=0000000000000000, p4d=0000000000000000
[  128.990736] Internal error: Oops: 0000000096000004 [#1] SMP
[  128.991168] Modules linked in: nfs_layout_nfsv41_files
rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace netfs
uinput dm_mod nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill
ip_set nf_tables nfnetlink qrtr vsock_loopback
vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
sunrpc vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops uvc
videobuf2_v4l2 videodev videobuf2_common mc vmw_vmci xfs libcrc32c
e1000e crct10dif_ce ghash_ce sha2_ce vmwgfx nvme sha256_arm64
nvme_core sr_mod cdrom sha1_ce drm_ttm_helper ttm drm_kms_helper drm
sg fuse
[  128.996466] CPU: 0 PID: 179 Comm: kworker/u4:26 Kdump: loaded Not
tainted 6.8.0-rc6+ torvalds#12
[  128.997226] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS
VMW201.00V.21805430.BA64.2305221830 05/22/2023
[  128.998084] Workqueue: xprtiod xs_tcp_tls_setup_socket [sunrpc]
[  128.998701] pstate: 81400005 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[  128.999384] pc : call_start+0x74/0x138 [sunrpc]
[  128.999809] lr : __rpc_execute+0xb8/0x3e0 [sunrpc]
[  129.000244] sp : ffff8000832b3a00
[  129.000508] x29: ffff8000832b3a00 x28: ffff800081ac79c0 x27: ffff800081ac7000
[  129.001111] x26: 0000000004248060 x25: 0000000000000000 x24: ffff800081596008
[  129.001757] x23: ffff80007b087240 x22: ffff00009a509d30 x21: 0000000000000000
[  129.002345] x20: ffff000090075600 x19: ffff00009a509d00 x18: ffffffffffffffff
[  129.002912] x17: 733d4d4554535953 x16: 42555300312d746e x15: ffff8000832b3a88
[  129.003464] x14: ffffffffffffffff x13: ffff8000832b3a7d x12: 0000000000000008
[  129.004021] x11: 0101010101010101 x10: ffff8000150cb560 x9 : ffff80007b087c00
[  129.004577] x8 : ffff00009a509de0 x7 : 0000000000000000 x6 : 00000000be8c4ee3
[  129.005026] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff000094d56680
[  129.005425] x2 : ffff80007b0637f8 x1 : ffff000090075600 x0 : ffff00009a509d00
[  129.005824] Call trace:
[  129.005967]  call_start+0x74/0x138 [sunrpc]
[  129.006233]  __rpc_execute+0xb8/0x3e0 [sunrpc]
[  129.006506]  rpc_execute+0x160/0x1d8 [sunrpc]
[  129.006778]  rpc_run_task+0x148/0x1f8 [sunrpc]
[  129.007204]  tls_probe+0x80/0xd0 [sunrpc]
[  129.007460]  rpc_ping+0x28/0x80 [sunrpc]
[  129.007715]  rpc_create_xprt+0x134/0x1a0 [sunrpc]
[  129.007999]  rpc_create+0x128/0x2a0 [sunrpc]
[  129.008264]  xs_tcp_tls_setup_socket+0xdc/0x508 [sunrpc]
[  129.008583]  process_one_work+0x174/0x3c8
[  129.008813]  worker_thread+0x2c8/0x3e0
[  129.009033]  kthread+0x100/0x110
[  129.009225]  ret_from_fork+0x10/0x20
[  129.009432] Code: f0ffffc2 911fe042 aa1403e1 aa1303e0 (b9401c83)

Fixes: 1548036 ("nfs: make the rpc_stat per net namespace")
Signed-off-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
MingcongBai pushed a commit that referenced this pull request May 6, 2024
At the time of LPAR boot up, partition firmware provides Open Firmware
property ibm,dma-window for the PE. This property is provided on the PCI
bus the PE is attached to.

There are execptions where the partition firmware might not provide this
property for the PE at the time of LPAR boot up. One of the scenario is
where the firmware has frozen the PE due to some error condition. This
PE is frozen for 24 hours or unless the whole system is reinitialized.

Within this time frame, if the LPAR is booted, the frozen PE will be
presented to the LPAR but ibm,dma-window property could be missing.

Today, under these circumstances, the LPAR oopses with NULL pointer
dereference, when configuring the PCI bus the PE is attached to.

  BUG: Kernel NULL pointer dereference on read at 0x000000c8
  Faulting instruction address: 0xc0000000001024c0
  Oops: Kernel access of bad area, sig: 7 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  Supported: Yes
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
  Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries
  NIP:  c0000000001024c0 LR: c0000000001024b0 CTR: c000000000102450
  REGS: c0000000037db5c0 TRAP: 0300   Not tainted  (6.4.0-150600.9-default)
  MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000822  XER: 00000000
  CFAR: c00000000010254c DAR: 00000000000000c8 DSISR: 00080000 IRQMASK: 0
  ...
  NIP [c0000000001024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
  LR [c0000000001024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
  Call Trace:
    pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
    pcibios_setup_bus_self+0x1c0/0x370
    __of_scan_bus+0x2f8/0x330
    pcibios_scan_phb+0x280/0x3d0
    pcibios_init+0x88/0x12c
    do_one_initcall+0x60/0x320
    kernel_init_freeable+0x344/0x3e4
    kernel_init+0x34/0x1d0
    ret_from_kernel_user_thread+0x14/0x1c

Fixes: b1fc44e ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window")
Signed-off-by: Gaurav Batra <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
MingcongBai pushed a commit that referenced this pull request May 6, 2024
…active

The default nna (node_nr_active) is used when the pool isn't tied to a
specific NUMA node. This can happen in the following cases:

 1. On NUMA, if per-node pwq init failure and the fallback pwq is used.
 2. On NUMA, if a pool is configured to span multiple nodes.
 3. On single node setups.

5797b1c ("workqueue: Implement system-wide nr_active enforcement for
unbound workqueues") set the default nna->max to min_active because only #1
was being considered. For #2 and #3, using min_active means that the max
concurrency in normal operation is pushed down to min_active which is
currently 8, which can obviously lead to performance issues.

exact value nna->max is set to doesn't really matter. #2 can only happen if
the workqueue is intentionally configured to ignore NUMA boundaries and
there's no good way to distribute max_active in this case. #3 is the default
behavior on single node machines.

Let's set it the default nna->max to max_active. This fixes the artificially
lowered concurrency problem on single node machines and shouldn't hurt
anything for other cases.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Shinichiro Kawasaki <[email protected]>
Fixes: 5797b1c ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues")
Link: https://lore.kernel.org/dm-devel/[email protected]/
Signed-off-by: Tejun Heo <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 9, 2025
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G           O        6.15.0-rc2-00635-g521139ac3840 torvalds#272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS:  0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
 <TASK>
 unlock_ovpn+0x8b/0xe0 [ovpn]
 ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
 ? ovpn_peers_free+0x780/0x780 [ovpn]
 ? lock_acquire+0x56/0x70
 ? process_one_work+0x888/0x1740
 process_one_work+0x933/0x1740
 ? pwq_dec_nr_in_flight+0x10b0/0x10b0
 ? move_linked_works+0x12d/0x2c0
 ? assign_work+0x163/0x270
 worker_thread+0x4d6/0xd90
 ? preempt_count_sub+0x4c/0x70
 ? process_one_work+0x1740/0x1740
 kthread+0x36c/0x710
 ? trace_preempt_on+0x8c/0x1e0
 ? kthread_is_per_cpu+0xc0/0xc0
 ? preempt_count_sub+0x4c/0x70
 ? _raw_spin_unlock_irq+0x36/0x60
 ? calculate_sigpending+0x7b/0xa0
 ? kthread_is_per_cpu+0xc0/0xc0
 ret_from_fork+0x3a/0x80
 ? kthread_is_per_cpu+0xc0/0xc0
 ret_from_fork_asm+0x11/0x20
 </TASK>
Modules linked in: ovpn(O)

This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.

After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.

The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.

This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.

To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.

This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".

While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).

By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.

ovpn_socket_release() will ultimately decrease the reference
counter.

Cc: Oleksandr Natalenko <[email protected]>
Fixes: 11851cb ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <[email protected]>
Closes: OpenVPN#1
Tested-by: Gert Doering <[email protected]>
Link: https://www.mail-archive.com/[email protected]/msg31575.html
Reviewed-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 9, 2025
According to OpenMDK, bit 2 of the RGMII register has a different
meaning for BCM53115 [1]:

"DLL_IQQD         1: In the IDDQ mode, power is down0: Normal function
                  mode"

Configuring RGMII delay works without setting this bit, so let's keep it
at the default. For other chips, we always set it, so not clearing it
is not an issue.

One would assume BCM53118 works the same, but OpenMDK is not quite sure
what this bit actually means [2]:

"BYPASS_IMP_2NS_DEL #1: In the IDDQ mode, power is down#0: Normal
                    function mode1: Bypass dll65_2ns_del IP0: Use
                    dll65_2ns_del IP"

So lets keep setting it for now.

[1] https://github.com/Broadcom-Network-Switching-Software/OpenMDK/blob/master/cdk/PKG/chip/bcm53115/bcm53115_a0_defs.h#L19871
[2] https://github.com/Broadcom-Network-Switching-Software/OpenMDK/blob/master/cdk/PKG/chip/bcm53118/bcm53118_a0_defs.h#L14392

Fixes: 967dd82 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Jonas Gorski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 9, 2025
Kernel user spaces accesses to not exported pages in atomic context
incorrectly try to resolve the page fault.
With debug options enabled call traces like this can be seen:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1523
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 419074, name: qemu-system-s39
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
Preemption disabled at:
[<00000383ea47cfa2>] copy_page_from_iter_atomic+0xa2/0x8a0
CPU: 12 UID: 0 PID: 419074 Comm: qemu-system-s39
Tainted: G        W           6.16.0-20250531.rc0.git0.69b3a602feac.63.fc42.s390x+debug #1 PREEMPT
Tainted: [W]=WARN
Hardware name: IBM 3931 A01 703 (LPAR)
Call Trace:
 [<00000383e990d282>] dump_stack_lvl+0xa2/0xe8
 [<00000383e99bf152>] __might_resched+0x292/0x2d0
 [<00000383eaa7c374>] down_read+0x34/0x2d0
 [<00000383e99432f8>] do_secure_storage_access+0x108/0x360
 [<00000383eaa724b0>] __do_pgm_check+0x130/0x220
 [<00000383eaa842e4>] pgm_check_handler+0x114/0x160
 [<00000383ea47d028>] copy_page_from_iter_atomic+0x128/0x8a0
([<00000383ea47d016>] copy_page_from_iter_atomic+0x116/0x8a0)
 [<00000383e9c45eae>] generic_perform_write+0x16e/0x310
 [<00000383e9eb87f4>] ext4_buffered_write_iter+0x84/0x160
 [<00000383e9da0de4>] vfs_write+0x1c4/0x460
 [<00000383e9da123c>] ksys_write+0x7c/0x100
 [<00000383eaa7284e>] __do_syscall+0x15e/0x280
 [<00000383eaa8417e>] system_call+0x6e/0x90
INFO: lockdep is turned off.

It is not allowed to take the mmap_lock while in atomic context. Therefore
handle such a secure storage access fault as if the accessed page is not
mapped: the uaccess function will return -EFAULT, and the caller has to
deal with this. Usually this means that the access is retried in process
context, which allows to resolve the page fault (or in this case export the
page).

Reviewed-by: Claudio Imbrenda <[email protected]>
Acked-by: Alexander Gordeev <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Carstens <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 9, 2025
KexyBiscuit pushed a commit that referenced this pull request Jun 9, 2025
When threads/tasks are switched we need to ensure the old execution's
SR_SUM state is saved and the new thread has the old SR_SUM state
restored.

The issue was seen under heavy load especially with the syz-stress tool
running, with crashes as follows in schedule_tail:

Unable to handle kernel access to user memory without uaccess routines
at virtual address 000000002749f0d0
Oops [#1]
Modules linked in:
CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
Hardware name: riscv-virtio,qemu (DT)
epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
 ra : task_pid_vnr include/linux/sched.h:1421 [inline]
 ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
 gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
 t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
 s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
 a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
 a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
 s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
 s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
 s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
 s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
 t5 : ffffffc4043cafba t6 : 0000000000040000
status: 0000000000000120 badaddr: 000000002749f0d0 cause:
000000000000000f
Call Trace:
[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
[<ffffffe000005570>] ret_from_exception+0x0/0x14
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace b5f8f9231dc87dda ]---

The issue comes from the put_user() in schedule_tail
(kernel/sched/core.c) doing the following:

asmlinkage __visible void schedule_tail(struct task_struct *prev)
{
...
        if (current->set_child_tid)
                put_user(task_pid_vnr(current), current->set_child_tid);
...
}

the put_user() macro causes the code sequence to come out as follows:

1:	__enable_user_access()
2:	reg = task_pid_vnr(current);
3:	*current->set_child_tid = reg;
4:	__disable_user_access()

The problem is that we may have a sleeping function as argument which
could clear SR_SUM causing the panic above. This was fixed by
evaluating the argument of the put_user() macro outside the user-enabled
section in commit 285a76b ("riscv: evaluate put_user() arg before
enabling user access")"

In order for riscv to take advantage of unsafe_get/put_XXX() macros and
to avoid the same issue we had with put_user() and sleeping functions we
must ensure code flow can go through switch_to() from within a region of
code with SR_SUM enabled and come back with SR_SUM still enabled. This
patch addresses the problem allowing future work to enable full use of
unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
on every access. Make switch_to() save and restore SR_SUM.

Reported-by: [email protected]
Signed-off-by: Ben Dooks <[email protected]>
Signed-off-by: Cyril Bur <[email protected]>
Reviewed-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexandre Ghiti <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 9, 2025
When threads/tasks are switched we need to ensure the old execution's
SR_SUM state is saved and the new thread has the old SR_SUM state
restored.

The issue was seen under heavy load especially with the syz-stress tool
running, with crashes as follows in schedule_tail:

Unable to handle kernel access to user memory without uaccess routines
at virtual address 000000002749f0d0
Oops [#1]
Modules linked in:
CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
Hardware name: riscv-virtio,qemu (DT)
epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
 ra : task_pid_vnr include/linux/sched.h:1421 [inline]
 ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
 gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
 t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
 s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
 a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
 a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
 s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
 s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
 s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
 s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
 t5 : ffffffc4043cafba t6 : 0000000000040000
status: 0000000000000120 badaddr: 000000002749f0d0 cause:
000000000000000f
Call Trace:
[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
[<ffffffe000005570>] ret_from_exception+0x0/0x14
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace b5f8f9231dc87dda ]---

The issue comes from the put_user() in schedule_tail
(kernel/sched/core.c) doing the following:

asmlinkage __visible void schedule_tail(struct task_struct *prev)
{
...
        if (current->set_child_tid)
                put_user(task_pid_vnr(current), current->set_child_tid);
...
}

the put_user() macro causes the code sequence to come out as follows:

1:	__enable_user_access()
2:	reg = task_pid_vnr(current);
3:	*current->set_child_tid = reg;
4:	__disable_user_access()

The problem is that we may have a sleeping function as argument which
could clear SR_SUM causing the panic above. This was fixed by
evaluating the argument of the put_user() macro outside the user-enabled
section in commit 285a76b ("riscv: evaluate put_user() arg before
enabling user access")"

In order for riscv to take advantage of unsafe_get/put_XXX() macros and
to avoid the same issue we had with put_user() and sleeping functions we
must ensure code flow can go through switch_to() from within a region of
code with SR_SUM enabled and come back with SR_SUM still enabled. This
patch addresses the problem allowing future work to enable full use of
unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
on every access. Make switch_to() save and restore SR_SUM.

Reported-by: [email protected]
Signed-off-by: Ben Dooks <[email protected]>
Signed-off-by: Cyril Bur <[email protected]>
Reviewed-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexandre Ghiti <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
The bo/ttm interfaces with kernel memory mapping from dedicated GPU
memory. It is not correct to assume that SZ_4K would suffice for page
alignment as there are a few hardware platforms that commonly uses non-4K
pages - for instance, currently, Loongson 3A5000/6000 devices (of the
LoongArch architecture) commonly uses 16K kernel pages.

Per my testing Intel Xe/Arc families of GPUs works on at least
Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable
BAR" were enabled in the EFI firmware settings. I tested this patch series
on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.

Without this fix, the kernel will hang at a kernel BUG():

[    7.425445] ------------[ cut here ]------------
[    7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181!
[    7.435330] Oops - BUG[#1]:
[    7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G            E      6.13.3-aosc-main-00336-g60829239b300-dirty #3
[    7.449511] Tainted: [E]=UNSIGNED_MODULE
[    7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[    7.467144] Workqueue: events work_for_cpu_fn
[    7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960
[    7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000
[    7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001
[    7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000
[    7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[    7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000
[    7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001
[    7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000
[    7.537855]    ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe]
[    7.544893]   ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0
[    7.551639]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[    7.557785]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[    7.562111]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[    7.566870]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[    7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[    7.577163]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[    7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E)
[    7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707)
[    7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000
[    7.629563]         0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877
[    7.637519]         000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000
[    7.645475]         ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000
[    7.653431]         0000000000000001 ffff800002533660 0000000000022022 9000000000234470
[    7.661386]         90000001010cba28 0000000000001000 0000000000000000 000000000005c300
[    7.669342]         900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000
[    7.677298]         ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14
[    7.685254]         0000000000022022 0000000000000000 900000000209c000 90000000010589e0
[    7.693209]         90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000
[    7.701165]         ...
[    7.703588] Call Trace:
[    7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0
[    7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe]
[    7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe]
[    7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe]
[    7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe]
[    7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe]
[    7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe]
[    7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe]
[    7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe]
[    7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe]
[    7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe]
[    7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe]
[    7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4
[    7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40
[    7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458
[    7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc
[    7.805591] [<90000000002bacac>] kthread+0x114/0x138
[    7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4
[    7.816489]
[    7.817961] Code: 4c000020  29c3e2f9  53ff93ff <002a0001> 0015002c  03400000  02ff8063  29c04077  001500f7
[    7.827651]
[    7.829140] ---[ end trace 0000000000000000 ]---

Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to
`drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call
before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to
fix the above error.

Cc: [email protected]
Fixes: 4e03b58 ("drm/xe/uapi: Reject bo creation of unaligned size")
Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Tested-by: Mingcong Bai <[email protected]>
Tested-by: Haien Liang <[email protected]>
Tested-by: Shirong Liu <[email protected]>
Tested-by: Haofeng Wu <[email protected]>
Link: FanFansfan@22c55ab
Co-developed-by: Shang Yatsen <[email protected]>
Signed-off-by: Shang Yatsen <[email protected]>
Signed-off-by: Mingcong Bai <[email protected]>

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
…to the MPAM registers

maillist inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I8T2RT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot/v6.7-rc2

---------------------------

commit 011e5f5 ("arm64/cpufeature: Add remaining feature bits in
ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests,
but didn't add trap handling.

If you are unlucky, this results in an MPAM aware guest being delivered
an undef during boot. The host prints:
| kvm [97]: Unsupported guest sys_reg access at: ffff800080024c64 [00000005]
| { Op0( 3), Op1( 0), CRn(10), CRm( 5), Op2( 0), func_read },

Which results in:
| Internal error: Oops - Undefined instruction: 0000000002000000 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc7-00559-gd89c186d50b2 #14616
| Hardware name: linux,dummy-virt (DT)
| pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : test_has_mpam+0x18/0x30
| lr : test_has_mpam+0x10/0x30
| sp : ffff80008000bd90
...
| Call trace:
|  test_has_mpam+0x18/0x30
|  update_cpu_capabilities+0x7c/0x11c
|  setup_cpu_features+0x14/0xd8
|  smp_cpus_done+0x24/0xb8
|  smp_init+0x7c/0x8c
|  kernel_init_freeable+0xf8/0x280
|  kernel_init+0x24/0x1e0
|  ret_from_fork+0x10/0x20
| Code: 910003fd 97ffffde 72001c0 5400008 (d538a500)
| ---[ end trace 0000000000000000 ]---
| Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
| ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Add the support to enable the traps, and handle the three guest accessible
registers as RAZ/WI. This allows guests to keep the invariant id-register
value, while advertising that MPAM isn't really supported.

With MPAM v1.0 we can trap the MPAMIDR_EL1 register only if
ARM64_HAS_MPAM_HCR, with v1.1 an additional MPAM2_EL2.TIDR bit traps
MPAMIDR_EL1 on platforms that don't have MPAMHCR_EL2. Enable one of
these if either is supported. If neither is supported, the guest can
discover that the CPU has MPAM support, and how many PARTID etc the
host has ... but it can't influence anything, so its harmless.

Full support for the feature would only expose MPAM to the guest
if a psuedo-device has been created to describe the virt->phys partid
mapping the VMM expects. This will depend on ARM64_HAS_MPAM_HCR.

Fixes: 011e5f5 ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register")
CC: Anshuman Khandual <[email protected]>
Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Zeng Heng <[email protected]>

Link: https://gitee.com/openeuler/kernel/commit/c55eb641d53ac3d0a7ab9f1d1fae8a6460fc4f98
[Kexy: Dropped changes in arch/arm64/include/asm/kvm_arm.h,
 arch/arm64/kernel/image-vars.h,
 arch/arm64/kvm/hyp/include/hyp/switch.h, and arch/arm64/kvm/sys_regs.c]
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
…ocation"

When this change was introduced between v6.10.4 and v6.10.5, the Broadcom
Tigon3 Ethernet interface (tg3) found on Apple MacBook Pro (15'',
Mid 2010) would throw many rcu stall errors during boot up, causing
peripherals such as the wireless card to misbehave.

[   24.153855] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 2-.... } 21 jiffies s: 973 root: 0x4/.
[   24.166938] rcu: blocking rcu_node structures (internal RCU debug):
[   24.177800] Sending NMI from CPU 3 to CPUs 2:
[   24.183113] NMI backtrace for cpu 2
[   24.183119] CPU: 2 PID: 1049 Comm: NetworkManager Not tainted 6.10.5-aosc-main #1
[   24.183123] Hardware name: Apple Inc. MacBookPro6,2/Mac-F22586C8, BIOS    MBP61.88Z.005D.B00.1804100943 04/10/18
[   24.183125] RIP: 0010:__this_module+0x2d3d1/0x4f310 [tg3]
[   24.183135] Code: c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 89 f6 48 03 77 30 8b 06 <31> f6 31 ff c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90
[   24.183138] RSP: 0018:ffffbf1a011d75e8 EFLAGS: 00000082
[   24.183141] RAX: 0000000000000000 RBX: ffffa04ec78f8a00 RCX: 0000000000000000
[   24.183143] RDX: 0000000000000000 RSI: ffffbf1a00fb007c RDI: ffffa04ec78f8a00
[   24.183145] RBP: 0000000000000b50 R08: 0000000000000000 R09: 0000000000000000
[   24.183147] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000216
[   24.183148] R13: ffffbf1a011d7624 R14: ffffa04ec78f8a08 R15: ffffa04ec78f8b40
[   24.183151] FS:  00007f4c524b2140(0000) GS:ffffa05007d00000(0000) knlGS:0000000000000000
[   24.183153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   24.183155] CR2: 00007f7025eae3e8 CR3: 00000001040f8000 CR4: 00000000000006f0
[   24.183157] Call Trace:
[   24.183162]  <NMI>
[   24.183167]  ? nmi_cpu_backtrace+0xbf/0x140
[   24.183175]  ? nmi_cpu_backtrace_handler+0x11/0x20
[   24.183181]  ? nmi_handle+0x61/0x160
[   24.183186]  ? default_do_nmi+0x42/0x110
[   24.183191]  ? exc_nmi+0x1bd/0x290
[   24.183194]  ? end_repeat_nmi+0xf/0x53
[   24.183203]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183207]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183210]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183213]  </NMI>
[   24.183214]  <TASK>
[   24.183215]  __this_module+0x31828/0x4f310 [tg3]
[   24.183218]  ? __this_module+0x2d390/0x4f310 [tg3]
[   24.183221]  __this_module+0x398e6/0x4f310 [tg3]
[   24.183225]  __this_module+0x3baf8/0x4f310 [tg3]
[   24.183229]  __this_module+0x4733f/0x4f310 [tg3]
[   24.183233]  ? _raw_spin_unlock_irqrestore+0x25/0x70
[   24.183237]  ? __this_module+0x398e6/0x4f310 [tg3]
[   24.183241]  __this_module+0x4b943/0x4f310 [tg3]
[   24.183244]  ? delay_tsc+0x89/0xf0
[   24.183249]  ? preempt_count_sub+0x51/0x60
[   24.183254]  __this_module+0x4be4b/0x4f310 [tg3]
[   24.183258]  __dev_open+0x103/0x1c0
[   24.183265]  __dev_change_flags+0x1bd/0x230
[   24.183269]  ? rtnl_getlink+0x362/0x400
[   24.183276]  dev_change_flags+0x26/0x70
[   24.183280]  do_setlink+0xe16/0x11f0
[   24.183286]  ? __nla_validate_parse+0x61/0xd40
[   24.183295]  __rtnl_newlink+0x63d/0x9f0
[   24.183301]  ? kmem_cache_alloc_node_noprof+0x12b/0x360
[   24.183308]  ? kmalloc_trace_noprof+0x11e/0x350
[   24.183312]  ? rtnl_newlink+0x2e/0x70
[   24.183316]  rtnl_newlink+0x47/0x70
[   24.183320]  rtnetlink_rcv_msg+0x152/0x400
[   24.183324]  ? __netlink_sendskb+0x68/0x90
[   24.183329]  ? netlink_unicast+0x237/0x290
[   24.183333]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[   24.183336]  netlink_rcv_skb+0x5b/0x110
[   24.183343]  netlink_unicast+0x1a4/0x290
[   24.183347]  netlink_sendmsg+0x222/0x4a0
[   24.183350]  ? proc_get_long.constprop.0+0x116/0x210
[   24.183358]  ____sys_sendmsg+0x379/0x3b0
[   24.183363]  ? copy_msghdr_from_user+0x6d/0xb0
[   24.183368]  ___sys_sendmsg+0x86/0xe0
[   24.183372]  ? addrconf_sysctl_forward+0xf3/0x270
[   24.183378]  ? _copy_from_iter+0x8b/0x570
[   24.183384]  ? __pfx_addrconf_sysctl_forward+0x10/0x10
[   24.183388]  ? _raw_spin_unlock+0x19/0x50
[   24.183392]  ? proc_sys_call_handler+0xf3/0x2f0
[   24.183397]  ? trace_hardirqs_on+0x29/0x90
[   24.183401]  ? __fdget+0xc2/0xf0
[   24.183405]  __sys_sendmsg+0x5b/0xc0
[   24.183410]  ? syscall_trace_enter+0x110/0x1b0
[   24.183416]  do_syscall_64+0x64/0x150
[   24.183423]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

I have bisected the error to this commit. Reverting it caused no new or
perceivable issues on both the MacBook and a Zen4-based laptop. Revert
this commit as a workaround.

This reverts commit aa162aa.

Upstream report: https://bugzilla.kernel.org/show_bug.cgi?id=219390
Signed-off-by: Mingcong Bai <[email protected]>

Bug: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
There is no disagreement that we should check both ptp->is_virtual_clock
and ptp->n_vclocks to check if the ptp virtual clock is in use.

However, when we acquire ptp->n_vclocks_mux to read ptp->n_vclocks in
ptp_vclock_in_use(), we observe a recursive lock in the call trace
starting from n_vclocks_store().

============================================
WARNING: possible recursive locking detected
6.15.0-rc6 #1 Not tainted
--------------------------------------------
syz.0.1540/13807 is trying to acquire lock:
ffff888035a24868 (&ptp->n_vclocks_mux){+.+.}-{4:4}, at:
 ptp_vclock_in_use drivers/ptp/ptp_private.h:103 [inline]
ffff888035a24868 (&ptp->n_vclocks_mux){+.+.}-{4:4}, at:
 ptp_clock_unregister+0x21/0x250 drivers/ptp/ptp_clock.c:415

but task is already holding lock:
ffff888030704868 (&ptp->n_vclocks_mux){+.+.}-{4:4}, at:
 n_vclocks_store+0xf1/0x6d0 drivers/ptp/ptp_sysfs.c:215

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&ptp->n_vclocks_mux);
  lock(&ptp->n_vclocks_mux);

 *** DEADLOCK ***
....
============================================

The best way to solve this is to remove the logic that checks
ptp->n_vclocks in ptp_vclock_in_use().

The reason why this is appropriate is that any path that uses
ptp->n_vclocks must unconditionally check if ptp->n_vclocks is greater
than 0 before unregistering vclocks, and all functions are already
written this way. And in the function that uses ptp->n_vclocks, we
already get ptp->n_vclocks_mux before unregistering vclocks.

Therefore, we need to remove the redundant check for ptp->n_vclocks in
ptp_vclock_in_use() to prevent recursive locking.

Fixes: 73f3706 ("ptp: support ptp physical/virtual clocks conversion")
Signed-off-by: Jeongjun Park <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
Some architectures (e.g. arm64) only support memory hotplug operations on
a restricted set of physical addresses. This applies even when we are
faking some CXL fixed memory windows for the purposes of cxl_test.
That range can be queried with mhp_get_pluggable_range(true). Use the
minimum of that the top of that range and iomem_resource.end to establish
the 64GiB region used by cxl_test.

From thread #2 which was related to the issue in #1.

[ dj: Add CONFIG_MEMORY_HOTPLUG config check, from Alison ]

Link: https://lore.kernel.org/linux-cxl/[email protected]/ #2
Reported-by: Itaru Kitayama <[email protected]>
Closes: pmem/ndctl#278 #1
Reviewed-by: Dan Williams <[email protected]>
Tested-by: Itaru Kitayama <[email protected] <mailto:[email protected]>
Tested-by: Marc Herbert <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Dave Jiang <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
commit e3975aa upstream.

No device was set which caused serial_base_ctrl_add to crash.

 BUG: kernel NULL pointer dereference, address: 0000000000000050
 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
 CPU: 16 UID: 0 PID: 368 Comm: (udev-worker) Not tainted 6.12.25-amd64 #1  Debian 6.12.25-1
 RIP: 0010:serial_base_ctrl_add+0x96/0x120
 Call Trace:
  <TASK>
  serial_core_register_port+0x1a0/0x580
  ? __setup_irq+0x39c/0x660
  ? __kmalloc_cache_noprof+0x111/0x310
  jsm_uart_port_init+0xe8/0x180 [jsm]
  jsm_probe_one+0x1f4/0x410 [jsm]
  local_pci_probe+0x42/0x90
  pci_device_probe+0x22f/0x270
  really_probe+0xdb/0x340
  ? pm_runtime_barrier+0x54/0x90
  ? __pfx___driver_attach+0x10/0x10
  __driver_probe_device+0x78/0x110
  driver_probe_device+0x1f/0xa0
  __driver_attach+0xba/0x1c0
  bus_for_each_dev+0x8c/0xe0
  bus_add_driver+0x112/0x1f0
  driver_register+0x72/0xd0
  jsm_init_module+0x36/0xff0 [jsm]
  ? __pfx_jsm_init_module+0x10/0x10 [jsm]
  do_one_initcall+0x58/0x310
  do_init_module+0x60/0x230

Tested with Digi Neo PCIe 8 port card.

Fixes: 84a9582 ("serial: core: Start managing serial controllers to enable runtime PM")
Cc: stable <[email protected]>
Signed-off-by: Dustin Lundquist <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
commit e3975aa upstream.

No device was set which caused serial_base_ctrl_add to crash.

 BUG: kernel NULL pointer dereference, address: 0000000000000050
 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
 CPU: 16 UID: 0 PID: 368 Comm: (udev-worker) Not tainted 6.12.25-amd64 #1  Debian 6.12.25-1
 RIP: 0010:serial_base_ctrl_add+0x96/0x120
 Call Trace:
  <TASK>
  serial_core_register_port+0x1a0/0x580
  ? __setup_irq+0x39c/0x660
  ? __kmalloc_cache_noprof+0x111/0x310
  jsm_uart_port_init+0xe8/0x180 [jsm]
  jsm_probe_one+0x1f4/0x410 [jsm]
  local_pci_probe+0x42/0x90
  pci_device_probe+0x22f/0x270
  really_probe+0xdb/0x340
  ? pm_runtime_barrier+0x54/0x90
  ? __pfx___driver_attach+0x10/0x10
  __driver_probe_device+0x78/0x110
  driver_probe_device+0x1f/0xa0
  __driver_attach+0xba/0x1c0
  bus_for_each_dev+0x8c/0xe0
  bus_add_driver+0x112/0x1f0
  driver_register+0x72/0xd0
  jsm_init_module+0x36/0xff0 [jsm]
  ? __pfx_jsm_init_module+0x10/0x10 [jsm]
  do_one_initcall+0x58/0x310
  do_init_module+0x60/0x230

Tested with Digi Neo PCIe 8 port card.

Fixes: 84a9582 ("serial: core: Start managing serial controllers to enable runtime PM")
Cc: stable <[email protected]>
Signed-off-by: Dustin Lundquist <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
commit e3975aa upstream.

No device was set which caused serial_base_ctrl_add to crash.

 BUG: kernel NULL pointer dereference, address: 0000000000000050
 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
 CPU: 16 UID: 0 PID: 368 Comm: (udev-worker) Not tainted 6.12.25-amd64 #1  Debian 6.12.25-1
 RIP: 0010:serial_base_ctrl_add+0x96/0x120
 Call Trace:
  <TASK>
  serial_core_register_port+0x1a0/0x580
  ? __setup_irq+0x39c/0x660
  ? __kmalloc_cache_noprof+0x111/0x310
  jsm_uart_port_init+0xe8/0x180 [jsm]
  jsm_probe_one+0x1f4/0x410 [jsm]
  local_pci_probe+0x42/0x90
  pci_device_probe+0x22f/0x270
  really_probe+0xdb/0x340
  ? pm_runtime_barrier+0x54/0x90
  ? __pfx___driver_attach+0x10/0x10
  __driver_probe_device+0x78/0x110
  driver_probe_device+0x1f/0xa0
  __driver_attach+0xba/0x1c0
  bus_for_each_dev+0x8c/0xe0
  bus_add_driver+0x112/0x1f0
  driver_register+0x72/0xd0
  jsm_init_module+0x36/0xff0 [jsm]
  ? __pfx_jsm_init_module+0x10/0x10 [jsm]
  do_one_initcall+0x58/0x310
  do_init_module+0x60/0x230

Tested with Digi Neo PCIe 8 port card.

Fixes: 84a9582 ("serial: core: Start managing serial controllers to enable runtime PM")
Cc: stable <[email protected]>
Signed-off-by: Dustin Lundquist <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
The bo/ttm interfaces with kernel memory mapping from dedicated GPU
memory. It is not correct to assume that SZ_4K would suffice for page
alignment as there are a few hardware platforms that commonly uses non-4K
pages - for instance, currently, Loongson 3A5000/6000 devices (of the
LoongArch architecture) commonly uses 16K kernel pages.

Per my testing Intel Xe/Arc families of GPUs works on at least
Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable
BAR" were enabled in the EFI firmware settings. I tested this patch series
on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.

Without this fix, the kernel will hang at a kernel BUG():

[    7.425445] ------------[ cut here ]------------
[    7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181!
[    7.435330] Oops - BUG[#1]:
[    7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G            E      6.13.3-aosc-main-00336-g60829239b300-dirty #3
[    7.449511] Tainted: [E]=UNSIGNED_MODULE
[    7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[    7.467144] Workqueue: events work_for_cpu_fn
[    7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960
[    7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000
[    7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001
[    7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000
[    7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[    7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000
[    7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001
[    7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000
[    7.537855]    ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe]
[    7.544893]   ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0
[    7.551639]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[    7.557785]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[    7.562111]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[    7.566870]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[    7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[    7.577163]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[    7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E)
[    7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707)
[    7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000
[    7.629563]         0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877
[    7.637519]         000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000
[    7.645475]         ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000
[    7.653431]         0000000000000001 ffff800002533660 0000000000022022 9000000000234470
[    7.661386]         90000001010cba28 0000000000001000 0000000000000000 000000000005c300
[    7.669342]         900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000
[    7.677298]         ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14
[    7.685254]         0000000000022022 0000000000000000 900000000209c000 90000000010589e0
[    7.693209]         90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000
[    7.701165]         ...
[    7.703588] Call Trace:
[    7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0
[    7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe]
[    7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe]
[    7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe]
[    7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe]
[    7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe]
[    7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe]
[    7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe]
[    7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe]
[    7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe]
[    7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe]
[    7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe]
[    7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4
[    7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40
[    7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458
[    7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc
[    7.805591] [<90000000002bacac>] kthread+0x114/0x138
[    7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4
[    7.816489]
[    7.817961] Code: 4c000020  29c3e2f9  53ff93ff <002a0001> 0015002c  03400000  02ff8063  29c04077  001500f7
[    7.827651]
[    7.829140] ---[ end trace 0000000000000000 ]---

Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to
`drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call
before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to
fix the above error.

Cc: [email protected]
Fixes: 4e03b58 ("drm/xe/uapi: Reject bo creation of unaligned size")
Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Tested-by: Mingcong Bai <[email protected]>
Tested-by: Haien Liang <[email protected]>
Tested-by: Shirong Liu <[email protected]>
Tested-by: Haofeng Wu <[email protected]>
Link: FanFansfan@22c55ab
Co-developed-by: Shang Yatsen <[email protected]>
Signed-off-by: Shang Yatsen <[email protected]>
Signed-off-by: Mingcong Bai <[email protected]>

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
…to the MPAM registers

maillist inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I8T2RT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot/v6.7-rc2

---------------------------

commit 011e5f5 ("arm64/cpufeature: Add remaining feature bits in
ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests,
but didn't add trap handling.

If you are unlucky, this results in an MPAM aware guest being delivered
an undef during boot. The host prints:
| kvm [97]: Unsupported guest sys_reg access at: ffff800080024c64 [00000005]
| { Op0( 3), Op1( 0), CRn(10), CRm( 5), Op2( 0), func_read },

Which results in:
| Internal error: Oops - Undefined instruction: 0000000002000000 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc7-00559-gd89c186d50b2 #14616
| Hardware name: linux,dummy-virt (DT)
| pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : test_has_mpam+0x18/0x30
| lr : test_has_mpam+0x10/0x30
| sp : ffff80008000bd90
...
| Call trace:
|  test_has_mpam+0x18/0x30
|  update_cpu_capabilities+0x7c/0x11c
|  setup_cpu_features+0x14/0xd8
|  smp_cpus_done+0x24/0xb8
|  smp_init+0x7c/0x8c
|  kernel_init_freeable+0xf8/0x280
|  kernel_init+0x24/0x1e0
|  ret_from_fork+0x10/0x20
| Code: 910003fd 97ffffde 72001c0 5400008 (d538a500)
| ---[ end trace 0000000000000000 ]---
| Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
| ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Add the support to enable the traps, and handle the three guest accessible
registers as RAZ/WI. This allows guests to keep the invariant id-register
value, while advertising that MPAM isn't really supported.

With MPAM v1.0 we can trap the MPAMIDR_EL1 register only if
ARM64_HAS_MPAM_HCR, with v1.1 an additional MPAM2_EL2.TIDR bit traps
MPAMIDR_EL1 on platforms that don't have MPAMHCR_EL2. Enable one of
these if either is supported. If neither is supported, the guest can
discover that the CPU has MPAM support, and how many PARTID etc the
host has ... but it can't influence anything, so its harmless.

Full support for the feature would only expose MPAM to the guest
if a psuedo-device has been created to describe the virt->phys partid
mapping the VMM expects. This will depend on ARM64_HAS_MPAM_HCR.

Fixes: 011e5f5 ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register")
CC: Anshuman Khandual <[email protected]>
Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Zeng Heng <[email protected]>

Link: https://gitee.com/openeuler/kernel/commit/c55eb641d53ac3d0a7ab9f1d1fae8a6460fc4f98
[Kexy: Dropped changes in arch/arm64/include/asm/kvm_arm.h,
 arch/arm64/kernel/image-vars.h,
 arch/arm64/kvm/hyp/include/hyp/switch.h, and arch/arm64/kvm/sys_regs.c]
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
…ocation"

When this change was introduced between v6.10.4 and v6.10.5, the Broadcom
Tigon3 Ethernet interface (tg3) found on Apple MacBook Pro (15'',
Mid 2010) would throw many rcu stall errors during boot up, causing
peripherals such as the wireless card to misbehave.

[   24.153855] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 2-.... } 21 jiffies s: 973 root: 0x4/.
[   24.166938] rcu: blocking rcu_node structures (internal RCU debug):
[   24.177800] Sending NMI from CPU 3 to CPUs 2:
[   24.183113] NMI backtrace for cpu 2
[   24.183119] CPU: 2 PID: 1049 Comm: NetworkManager Not tainted 6.10.5-aosc-main #1
[   24.183123] Hardware name: Apple Inc. MacBookPro6,2/Mac-F22586C8, BIOS    MBP61.88Z.005D.B00.1804100943 04/10/18
[   24.183125] RIP: 0010:__this_module+0x2d3d1/0x4f310 [tg3]
[   24.183135] Code: c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 89 f6 48 03 77 30 8b 06 <31> f6 31 ff c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90
[   24.183138] RSP: 0018:ffffbf1a011d75e8 EFLAGS: 00000082
[   24.183141] RAX: 0000000000000000 RBX: ffffa04ec78f8a00 RCX: 0000000000000000
[   24.183143] RDX: 0000000000000000 RSI: ffffbf1a00fb007c RDI: ffffa04ec78f8a00
[   24.183145] RBP: 0000000000000b50 R08: 0000000000000000 R09: 0000000000000000
[   24.183147] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000216
[   24.183148] R13: ffffbf1a011d7624 R14: ffffa04ec78f8a08 R15: ffffa04ec78f8b40
[   24.183151] FS:  00007f4c524b2140(0000) GS:ffffa05007d00000(0000) knlGS:0000000000000000
[   24.183153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   24.183155] CR2: 00007f7025eae3e8 CR3: 00000001040f8000 CR4: 00000000000006f0
[   24.183157] Call Trace:
[   24.183162]  <NMI>
[   24.183167]  ? nmi_cpu_backtrace+0xbf/0x140
[   24.183175]  ? nmi_cpu_backtrace_handler+0x11/0x20
[   24.183181]  ? nmi_handle+0x61/0x160
[   24.183186]  ? default_do_nmi+0x42/0x110
[   24.183191]  ? exc_nmi+0x1bd/0x290
[   24.183194]  ? end_repeat_nmi+0xf/0x53
[   24.183203]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183207]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183210]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183213]  </NMI>
[   24.183214]  <TASK>
[   24.183215]  __this_module+0x31828/0x4f310 [tg3]
[   24.183218]  ? __this_module+0x2d390/0x4f310 [tg3]
[   24.183221]  __this_module+0x398e6/0x4f310 [tg3]
[   24.183225]  __this_module+0x3baf8/0x4f310 [tg3]
[   24.183229]  __this_module+0x4733f/0x4f310 [tg3]
[   24.183233]  ? _raw_spin_unlock_irqrestore+0x25/0x70
[   24.183237]  ? __this_module+0x398e6/0x4f310 [tg3]
[   24.183241]  __this_module+0x4b943/0x4f310 [tg3]
[   24.183244]  ? delay_tsc+0x89/0xf0
[   24.183249]  ? preempt_count_sub+0x51/0x60
[   24.183254]  __this_module+0x4be4b/0x4f310 [tg3]
[   24.183258]  __dev_open+0x103/0x1c0
[   24.183265]  __dev_change_flags+0x1bd/0x230
[   24.183269]  ? rtnl_getlink+0x362/0x400
[   24.183276]  dev_change_flags+0x26/0x70
[   24.183280]  do_setlink+0xe16/0x11f0
[   24.183286]  ? __nla_validate_parse+0x61/0xd40
[   24.183295]  __rtnl_newlink+0x63d/0x9f0
[   24.183301]  ? kmem_cache_alloc_node_noprof+0x12b/0x360
[   24.183308]  ? kmalloc_trace_noprof+0x11e/0x350
[   24.183312]  ? rtnl_newlink+0x2e/0x70
[   24.183316]  rtnl_newlink+0x47/0x70
[   24.183320]  rtnetlink_rcv_msg+0x152/0x400
[   24.183324]  ? __netlink_sendskb+0x68/0x90
[   24.183329]  ? netlink_unicast+0x237/0x290
[   24.183333]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[   24.183336]  netlink_rcv_skb+0x5b/0x110
[   24.183343]  netlink_unicast+0x1a4/0x290
[   24.183347]  netlink_sendmsg+0x222/0x4a0
[   24.183350]  ? proc_get_long.constprop.0+0x116/0x210
[   24.183358]  ____sys_sendmsg+0x379/0x3b0
[   24.183363]  ? copy_msghdr_from_user+0x6d/0xb0
[   24.183368]  ___sys_sendmsg+0x86/0xe0
[   24.183372]  ? addrconf_sysctl_forward+0xf3/0x270
[   24.183378]  ? _copy_from_iter+0x8b/0x570
[   24.183384]  ? __pfx_addrconf_sysctl_forward+0x10/0x10
[   24.183388]  ? _raw_spin_unlock+0x19/0x50
[   24.183392]  ? proc_sys_call_handler+0xf3/0x2f0
[   24.183397]  ? trace_hardirqs_on+0x29/0x90
[   24.183401]  ? __fdget+0xc2/0xf0
[   24.183405]  __sys_sendmsg+0x5b/0xc0
[   24.183410]  ? syscall_trace_enter+0x110/0x1b0
[   24.183416]  do_syscall_64+0x64/0x150
[   24.183423]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

I have bisected the error to this commit. Reverting it caused no new or
perceivable issues on both the MacBook and a Zen4-based laptop. Revert
this commit as a workaround.

This reverts commit aa162aa.

Upstream report: https://bugzilla.kernel.org/show_bug.cgi?id=219390
Signed-off-by: Mingcong Bai <[email protected]>

Bug: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
The bo/ttm interfaces with kernel memory mapping from dedicated GPU
memory. It is not correct to assume that SZ_4K would suffice for page
alignment as there are a few hardware platforms that commonly uses non-4K
pages - for instance, currently, Loongson 3A5000/6000 devices (of the
LoongArch architecture) commonly uses 16K kernel pages.

Per my testing Intel Xe/Arc families of GPUs works on at least
Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable
BAR" were enabled in the EFI firmware settings. I tested this patch series
on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.

Without this fix, the kernel will hang at a kernel BUG():

[    7.425445] ------------[ cut here ]------------
[    7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181!
[    7.435330] Oops - BUG[#1]:
[    7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G            E      6.13.3-aosc-main-00336-g60829239b300-dirty #3
[    7.449511] Tainted: [E]=UNSIGNED_MODULE
[    7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[    7.467144] Workqueue: events work_for_cpu_fn
[    7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960
[    7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000
[    7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001
[    7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000
[    7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[    7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000
[    7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001
[    7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000
[    7.537855]    ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe]
[    7.544893]   ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0
[    7.551639]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[    7.557785]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[    7.562111]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[    7.566870]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[    7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[    7.577163]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[    7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E)
[    7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707)
[    7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000
[    7.629563]         0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877
[    7.637519]         000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000
[    7.645475]         ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000
[    7.653431]         0000000000000001 ffff800002533660 0000000000022022 9000000000234470
[    7.661386]         90000001010cba28 0000000000001000 0000000000000000 000000000005c300
[    7.669342]         900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000
[    7.677298]         ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14
[    7.685254]         0000000000022022 0000000000000000 900000000209c000 90000000010589e0
[    7.693209]         90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000
[    7.701165]         ...
[    7.703588] Call Trace:
[    7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0
[    7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe]
[    7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe]
[    7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe]
[    7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe]
[    7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe]
[    7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe]
[    7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe]
[    7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe]
[    7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe]
[    7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe]
[    7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe]
[    7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4
[    7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40
[    7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458
[    7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc
[    7.805591] [<90000000002bacac>] kthread+0x114/0x138
[    7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4
[    7.816489]
[    7.817961] Code: 4c000020  29c3e2f9  53ff93ff <002a0001> 0015002c  03400000  02ff8063  29c04077  001500f7
[    7.827651]
[    7.829140] ---[ end trace 0000000000000000 ]---

Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to
`drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call
before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to
fix the above error.

Cc: [email protected]
Fixes: 4e03b58 ("drm/xe/uapi: Reject bo creation of unaligned size")
Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Tested-by: Mingcong Bai <[email protected]>
Tested-by: Haien Liang <[email protected]>
Tested-by: Shirong Liu <[email protected]>
Tested-by: Haofeng Wu <[email protected]>
Link: FanFansfan@22c55ab
Co-developed-by: Shang Yatsen <[email protected]>
Signed-off-by: Shang Yatsen <[email protected]>
Signed-off-by: Mingcong Bai <[email protected]>

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
…to the MPAM registers

maillist inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I8T2RT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot/v6.7-rc2

---------------------------

commit 011e5f5 ("arm64/cpufeature: Add remaining feature bits in
ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests,
but didn't add trap handling.

If you are unlucky, this results in an MPAM aware guest being delivered
an undef during boot. The host prints:
| kvm [97]: Unsupported guest sys_reg access at: ffff800080024c64 [00000005]
| { Op0( 3), Op1( 0), CRn(10), CRm( 5), Op2( 0), func_read },

Which results in:
| Internal error: Oops - Undefined instruction: 0000000002000000 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc7-00559-gd89c186d50b2 #14616
| Hardware name: linux,dummy-virt (DT)
| pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : test_has_mpam+0x18/0x30
| lr : test_has_mpam+0x10/0x30
| sp : ffff80008000bd90
...
| Call trace:
|  test_has_mpam+0x18/0x30
|  update_cpu_capabilities+0x7c/0x11c
|  setup_cpu_features+0x14/0xd8
|  smp_cpus_done+0x24/0xb8
|  smp_init+0x7c/0x8c
|  kernel_init_freeable+0xf8/0x280
|  kernel_init+0x24/0x1e0
|  ret_from_fork+0x10/0x20
| Code: 910003fd 97ffffde 72001c0 5400008 (d538a500)
| ---[ end trace 0000000000000000 ]---
| Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
| ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Add the support to enable the traps, and handle the three guest accessible
registers as RAZ/WI. This allows guests to keep the invariant id-register
value, while advertising that MPAM isn't really supported.

With MPAM v1.0 we can trap the MPAMIDR_EL1 register only if
ARM64_HAS_MPAM_HCR, with v1.1 an additional MPAM2_EL2.TIDR bit traps
MPAMIDR_EL1 on platforms that don't have MPAMHCR_EL2. Enable one of
these if either is supported. If neither is supported, the guest can
discover that the CPU has MPAM support, and how many PARTID etc the
host has ... but it can't influence anything, so its harmless.

Full support for the feature would only expose MPAM to the guest
if a psuedo-device has been created to describe the virt->phys partid
mapping the VMM expects. This will depend on ARM64_HAS_MPAM_HCR.

Fixes: 011e5f5 ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register")
CC: Anshuman Khandual <[email protected]>
Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Zeng Heng <[email protected]>

Link: https://gitee.com/openeuler/kernel/commit/c55eb641d53ac3d0a7ab9f1d1fae8a6460fc4f98
[Kexy: Dropped changes in arch/arm64/include/asm/kvm_arm.h,
 arch/arm64/kernel/image-vars.h,
 arch/arm64/kvm/hyp/include/hyp/switch.h, and arch/arm64/kvm/sys_regs.c]
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
…ocation"

When this change was introduced between v6.10.4 and v6.10.5, the Broadcom
Tigon3 Ethernet interface (tg3) found on Apple MacBook Pro (15'',
Mid 2010) would throw many rcu stall errors during boot up, causing
peripherals such as the wireless card to misbehave.

[   24.153855] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 2-.... } 21 jiffies s: 973 root: 0x4/.
[   24.166938] rcu: blocking rcu_node structures (internal RCU debug):
[   24.177800] Sending NMI from CPU 3 to CPUs 2:
[   24.183113] NMI backtrace for cpu 2
[   24.183119] CPU: 2 PID: 1049 Comm: NetworkManager Not tainted 6.10.5-aosc-main #1
[   24.183123] Hardware name: Apple Inc. MacBookPro6,2/Mac-F22586C8, BIOS    MBP61.88Z.005D.B00.1804100943 04/10/18
[   24.183125] RIP: 0010:__this_module+0x2d3d1/0x4f310 [tg3]
[   24.183135] Code: c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 89 f6 48 03 77 30 8b 06 <31> f6 31 ff c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90
[   24.183138] RSP: 0018:ffffbf1a011d75e8 EFLAGS: 00000082
[   24.183141] RAX: 0000000000000000 RBX: ffffa04ec78f8a00 RCX: 0000000000000000
[   24.183143] RDX: 0000000000000000 RSI: ffffbf1a00fb007c RDI: ffffa04ec78f8a00
[   24.183145] RBP: 0000000000000b50 R08: 0000000000000000 R09: 0000000000000000
[   24.183147] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000216
[   24.183148] R13: ffffbf1a011d7624 R14: ffffa04ec78f8a08 R15: ffffa04ec78f8b40
[   24.183151] FS:  00007f4c524b2140(0000) GS:ffffa05007d00000(0000) knlGS:0000000000000000
[   24.183153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   24.183155] CR2: 00007f7025eae3e8 CR3: 00000001040f8000 CR4: 00000000000006f0
[   24.183157] Call Trace:
[   24.183162]  <NMI>
[   24.183167]  ? nmi_cpu_backtrace+0xbf/0x140
[   24.183175]  ? nmi_cpu_backtrace_handler+0x11/0x20
[   24.183181]  ? nmi_handle+0x61/0x160
[   24.183186]  ? default_do_nmi+0x42/0x110
[   24.183191]  ? exc_nmi+0x1bd/0x290
[   24.183194]  ? end_repeat_nmi+0xf/0x53
[   24.183203]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183207]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183210]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183213]  </NMI>
[   24.183214]  <TASK>
[   24.183215]  __this_module+0x31828/0x4f310 [tg3]
[   24.183218]  ? __this_module+0x2d390/0x4f310 [tg3]
[   24.183221]  __this_module+0x398e6/0x4f310 [tg3]
[   24.183225]  __this_module+0x3baf8/0x4f310 [tg3]
[   24.183229]  __this_module+0x4733f/0x4f310 [tg3]
[   24.183233]  ? _raw_spin_unlock_irqrestore+0x25/0x70
[   24.183237]  ? __this_module+0x398e6/0x4f310 [tg3]
[   24.183241]  __this_module+0x4b943/0x4f310 [tg3]
[   24.183244]  ? delay_tsc+0x89/0xf0
[   24.183249]  ? preempt_count_sub+0x51/0x60
[   24.183254]  __this_module+0x4be4b/0x4f310 [tg3]
[   24.183258]  __dev_open+0x103/0x1c0
[   24.183265]  __dev_change_flags+0x1bd/0x230
[   24.183269]  ? rtnl_getlink+0x362/0x400
[   24.183276]  dev_change_flags+0x26/0x70
[   24.183280]  do_setlink+0xe16/0x11f0
[   24.183286]  ? __nla_validate_parse+0x61/0xd40
[   24.183295]  __rtnl_newlink+0x63d/0x9f0
[   24.183301]  ? kmem_cache_alloc_node_noprof+0x12b/0x360
[   24.183308]  ? kmalloc_trace_noprof+0x11e/0x350
[   24.183312]  ? rtnl_newlink+0x2e/0x70
[   24.183316]  rtnl_newlink+0x47/0x70
[   24.183320]  rtnetlink_rcv_msg+0x152/0x400
[   24.183324]  ? __netlink_sendskb+0x68/0x90
[   24.183329]  ? netlink_unicast+0x237/0x290
[   24.183333]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[   24.183336]  netlink_rcv_skb+0x5b/0x110
[   24.183343]  netlink_unicast+0x1a4/0x290
[   24.183347]  netlink_sendmsg+0x222/0x4a0
[   24.183350]  ? proc_get_long.constprop.0+0x116/0x210
[   24.183358]  ____sys_sendmsg+0x379/0x3b0
[   24.183363]  ? copy_msghdr_from_user+0x6d/0xb0
[   24.183368]  ___sys_sendmsg+0x86/0xe0
[   24.183372]  ? addrconf_sysctl_forward+0xf3/0x270
[   24.183378]  ? _copy_from_iter+0x8b/0x570
[   24.183384]  ? __pfx_addrconf_sysctl_forward+0x10/0x10
[   24.183388]  ? _raw_spin_unlock+0x19/0x50
[   24.183392]  ? proc_sys_call_handler+0xf3/0x2f0
[   24.183397]  ? trace_hardirqs_on+0x29/0x90
[   24.183401]  ? __fdget+0xc2/0xf0
[   24.183405]  __sys_sendmsg+0x5b/0xc0
[   24.183410]  ? syscall_trace_enter+0x110/0x1b0
[   24.183416]  do_syscall_64+0x64/0x150
[   24.183423]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

I have bisected the error to this commit. Reverting it caused no new or
perceivable issues on both the MacBook and a Zen4-based laptop. Revert
this commit as a workaround.

This reverts commit aa162aa.

Upstream report: https://bugzilla.kernel.org/show_bug.cgi?id=219390
Signed-off-by: Mingcong Bai <[email protected]>

Bug: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
The bo/ttm interfaces with kernel memory mapping from dedicated GPU
memory. It is not correct to assume that SZ_4K would suffice for page
alignment as there are a few hardware platforms that commonly uses non-
4KiB pages - for instance, 16KiB is the most commonly used kernel page
size used on Loongson devices (of the LoongArch architecture).

Per our testing, Intel Xe/Alchemist/Battlemage families of GPUs works on
Loongson platforms so long as "Above 4G Decoding" was enabled and
"Resizable BAR" was set to auto in the UEFI firmware settings.

Without this fix, the kernel will hang at a kernel BUG():

[    7.425445] ------------[ cut here ]------------
[    7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181!
[    7.435330] Oops - BUG[#1]:
[    7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G            E      6.13.3-aosc-main-00336-g60829239b300-dirty #3
[    7.449511] Tainted: [E]=UNSIGNED_MODULE
[    7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[    7.467144] Workqueue: events work_for_cpu_fn
[    7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960
[    7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000
[    7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001
[    7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000
[    7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[    7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000
[    7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001
[    7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000
[    7.537855]    ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe]
[    7.544893]   ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0
[    7.551639]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[    7.557785]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[    7.562111]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[    7.566870]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[    7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[    7.577163]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[    7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E)
[    7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707)
[    7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000
[    7.629563]         0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877
[    7.637519]         000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000
[    7.645475]         ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000
[    7.653431]         0000000000000001 ffff800002533660 0000000000022022 9000000000234470
[    7.661386]         90000001010cba28 0000000000001000 0000000000000000 000000000005c300
[    7.669342]         900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000
[    7.677298]         ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14
[    7.685254]         0000000000022022 0000000000000000 900000000209c000 90000000010589e0
[    7.693209]         90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000
[    7.701165]         ...
[    7.703588] Call Trace:
[    7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0
[    7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe]
[    7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe]
[    7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe]
[    7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe]
[    7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe]
[    7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe]
[    7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe]
[    7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe]
[    7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe]
[    7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe]
[    7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe]
[    7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4
[    7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40
[    7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458
[    7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc
[    7.805591] [<90000000002bacac>] kthread+0x114/0x138
[    7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4
[    7.816489]
[    7.817961] Code: 4c000020  29c3e2f9  53ff93ff <002a0001> 0015002c  03400000  02ff8063  29c04077  001500f7
[    7.827651]
[    7.829140] ---[ end trace 0000000000000000 ]---

Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to
`drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call
before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to
fix the above error.

Cc: [email protected]
Fixes: 4e03b58 ("drm/xe/uapi: Reject bo creation of unaligned size")
Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Tested-by: Mingcong Bai <[email protected]>
Tested-by: Wenbin Fang <[email protected]>
Tested-by: Haien Liang <[email protected]>
Tested-by: Jianfeng Liu <[email protected]>
Tested-by: Shirong Liu <[email protected]>
Tested-by: Haofeng Wu <[email protected]>
Link: FanFansfan@22c55ab
Link: https://t.me/c/1109254909/768552
Co-developed-by: Shang Yatsen <[email protected]>
Signed-off-by: Shang Yatsen <[email protected]>
Signed-off-by: Mingcong Bai <[email protected]>

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
…to the MPAM registers

maillist inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I8T2RT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot/v6.7-rc2

---------------------------

commit 011e5f5 ("arm64/cpufeature: Add remaining feature bits in
ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests,
but didn't add trap handling.

If you are unlucky, this results in an MPAM aware guest being delivered
an undef during boot. The host prints:
| kvm [97]: Unsupported guest sys_reg access at: ffff800080024c64 [00000005]
| { Op0( 3), Op1( 0), CRn(10), CRm( 5), Op2( 0), func_read },

Which results in:
| Internal error: Oops - Undefined instruction: 0000000002000000 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc7-00559-gd89c186d50b2 #14616
| Hardware name: linux,dummy-virt (DT)
| pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : test_has_mpam+0x18/0x30
| lr : test_has_mpam+0x10/0x30
| sp : ffff80008000bd90
...
| Call trace:
|  test_has_mpam+0x18/0x30
|  update_cpu_capabilities+0x7c/0x11c
|  setup_cpu_features+0x14/0xd8
|  smp_cpus_done+0x24/0xb8
|  smp_init+0x7c/0x8c
|  kernel_init_freeable+0xf8/0x280
|  kernel_init+0x24/0x1e0
|  ret_from_fork+0x10/0x20
| Code: 910003fd 97ffffde 72001c0 5400008 (d538a500)
| ---[ end trace 0000000000000000 ]---
| Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
| ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Add the support to enable the traps, and handle the three guest accessible
registers as RAZ/WI. This allows guests to keep the invariant id-register
value, while advertising that MPAM isn't really supported.

With MPAM v1.0 we can trap the MPAMIDR_EL1 register only if
ARM64_HAS_MPAM_HCR, with v1.1 an additional MPAM2_EL2.TIDR bit traps
MPAMIDR_EL1 on platforms that don't have MPAMHCR_EL2. Enable one of
these if either is supported. If neither is supported, the guest can
discover that the CPU has MPAM support, and how many PARTID etc the
host has ... but it can't influence anything, so its harmless.

Full support for the feature would only expose MPAM to the guest
if a psuedo-device has been created to describe the virt->phys partid
mapping the VMM expects. This will depend on ARM64_HAS_MPAM_HCR.

Fixes: 011e5f5 ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register")
CC: Anshuman Khandual <[email protected]>
Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Zeng Heng <[email protected]>

Link: https://gitee.com/openeuler/kernel/commit/c55eb641d53ac3d0a7ab9f1d1fae8a6460fc4f98
[Kexy: Dropped changes in arch/arm64/include/asm/kvm_arm.h,
 arch/arm64/kernel/image-vars.h,
 arch/arm64/kvm/hyp/include/hyp/switch.h, and arch/arm64/kvm/sys_regs.c]
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 10, 2025
…ocation"

When this change was introduced between v6.10.4 and v6.10.5, the Broadcom
Tigon3 Ethernet interface (tg3) found on Apple MacBook Pro (15'',
Mid 2010) would throw many rcu stall errors during boot up, causing
peripherals such as the wireless card to misbehave.

[   24.153855] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 2-.... } 21 jiffies s: 973 root: 0x4/.
[   24.166938] rcu: blocking rcu_node structures (internal RCU debug):
[   24.177800] Sending NMI from CPU 3 to CPUs 2:
[   24.183113] NMI backtrace for cpu 2
[   24.183119] CPU: 2 PID: 1049 Comm: NetworkManager Not tainted 6.10.5-aosc-main #1
[   24.183123] Hardware name: Apple Inc. MacBookPro6,2/Mac-F22586C8, BIOS    MBP61.88Z.005D.B00.1804100943 04/10/18
[   24.183125] RIP: 0010:__this_module+0x2d3d1/0x4f310 [tg3]
[   24.183135] Code: c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 89 f6 48 03 77 30 8b 06 <31> f6 31 ff c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90
[   24.183138] RSP: 0018:ffffbf1a011d75e8 EFLAGS: 00000082
[   24.183141] RAX: 0000000000000000 RBX: ffffa04ec78f8a00 RCX: 0000000000000000
[   24.183143] RDX: 0000000000000000 RSI: ffffbf1a00fb007c RDI: ffffa04ec78f8a00
[   24.183145] RBP: 0000000000000b50 R08: 0000000000000000 R09: 0000000000000000
[   24.183147] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000216
[   24.183148] R13: ffffbf1a011d7624 R14: ffffa04ec78f8a08 R15: ffffa04ec78f8b40
[   24.183151] FS:  00007f4c524b2140(0000) GS:ffffa05007d00000(0000) knlGS:0000000000000000
[   24.183153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   24.183155] CR2: 00007f7025eae3e8 CR3: 00000001040f8000 CR4: 00000000000006f0
[   24.183157] Call Trace:
[   24.183162]  <NMI>
[   24.183167]  ? nmi_cpu_backtrace+0xbf/0x140
[   24.183175]  ? nmi_cpu_backtrace_handler+0x11/0x20
[   24.183181]  ? nmi_handle+0x61/0x160
[   24.183186]  ? default_do_nmi+0x42/0x110
[   24.183191]  ? exc_nmi+0x1bd/0x290
[   24.183194]  ? end_repeat_nmi+0xf/0x53
[   24.183203]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183207]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183210]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183213]  </NMI>
[   24.183214]  <TASK>
[   24.183215]  __this_module+0x31828/0x4f310 [tg3]
[   24.183218]  ? __this_module+0x2d390/0x4f310 [tg3]
[   24.183221]  __this_module+0x398e6/0x4f310 [tg3]
[   24.183225]  __this_module+0x3baf8/0x4f310 [tg3]
[   24.183229]  __this_module+0x4733f/0x4f310 [tg3]
[   24.183233]  ? _raw_spin_unlock_irqrestore+0x25/0x70
[   24.183237]  ? __this_module+0x398e6/0x4f310 [tg3]
[   24.183241]  __this_module+0x4b943/0x4f310 [tg3]
[   24.183244]  ? delay_tsc+0x89/0xf0
[   24.183249]  ? preempt_count_sub+0x51/0x60
[   24.183254]  __this_module+0x4be4b/0x4f310 [tg3]
[   24.183258]  __dev_open+0x103/0x1c0
[   24.183265]  __dev_change_flags+0x1bd/0x230
[   24.183269]  ? rtnl_getlink+0x362/0x400
[   24.183276]  dev_change_flags+0x26/0x70
[   24.183280]  do_setlink+0xe16/0x11f0
[   24.183286]  ? __nla_validate_parse+0x61/0xd40
[   24.183295]  __rtnl_newlink+0x63d/0x9f0
[   24.183301]  ? kmem_cache_alloc_node_noprof+0x12b/0x360
[   24.183308]  ? kmalloc_trace_noprof+0x11e/0x350
[   24.183312]  ? rtnl_newlink+0x2e/0x70
[   24.183316]  rtnl_newlink+0x47/0x70
[   24.183320]  rtnetlink_rcv_msg+0x152/0x400
[   24.183324]  ? __netlink_sendskb+0x68/0x90
[   24.183329]  ? netlink_unicast+0x237/0x290
[   24.183333]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[   24.183336]  netlink_rcv_skb+0x5b/0x110
[   24.183343]  netlink_unicast+0x1a4/0x290
[   24.183347]  netlink_sendmsg+0x222/0x4a0
[   24.183350]  ? proc_get_long.constprop.0+0x116/0x210
[   24.183358]  ____sys_sendmsg+0x379/0x3b0
[   24.183363]  ? copy_msghdr_from_user+0x6d/0xb0
[   24.183368]  ___sys_sendmsg+0x86/0xe0
[   24.183372]  ? addrconf_sysctl_forward+0xf3/0x270
[   24.183378]  ? _copy_from_iter+0x8b/0x570
[   24.183384]  ? __pfx_addrconf_sysctl_forward+0x10/0x10
[   24.183388]  ? _raw_spin_unlock+0x19/0x50
[   24.183392]  ? proc_sys_call_handler+0xf3/0x2f0
[   24.183397]  ? trace_hardirqs_on+0x29/0x90
[   24.183401]  ? __fdget+0xc2/0xf0
[   24.183405]  __sys_sendmsg+0x5b/0xc0
[   24.183410]  ? syscall_trace_enter+0x110/0x1b0
[   24.183416]  do_syscall_64+0x64/0x150
[   24.183423]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

I have bisected the error to this commit. Reverting it caused no new or
perceivable issues on both the MacBook and a Zen4-based laptop. Revert
this commit as a workaround.

This reverts commit aa162aa.

Upstream report: https://bugzilla.kernel.org/show_bug.cgi?id=219390
Signed-off-by: Mingcong Bai <[email protected]>

Bug: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 12, 2025
When building the free space tree with the block group tree feature
enabled, we can hit an assertion failure like this:

  BTRFS info (device loop0 state M): rebuilding free space tree
  assertion failed: ret == 0, in fs/btrfs/free-space-tree.c:1102
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/free-space-tree.c:1102!
  Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
  Modules linked in:
  CPU: 1 UID: 0 PID: 6592 Comm: syz-executor322 Not tainted 6.15.0-rc7-syzkaller-gd7fa1af5b33e #0 PREEMPT
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : populate_free_space_tree+0x514/0x518 fs/btrfs/free-space-tree.c:1102
  lr : populate_free_space_tree+0x514/0x518 fs/btrfs/free-space-tree.c:1102
  sp : ffff8000a4ce7600
  x29: ffff8000a4ce76e0 x28: ffff0000c9bc6000 x27: ffff0000ddfff3d8
  x26: ffff0000ddfff378 x25: dfff800000000000 x24: 0000000000000001
  x23: ffff8000a4ce7660 x22: ffff70001499cecc x21: ffff0000e1d8c160
  x20: ffff0000e1cb7800 x19: ffff0000e1d8c0b0 x18: 00000000ffffffff
  x17: ffff800092f39000 x16: ffff80008ad27e48 x15: ffff700011e740c0
  x14: 1ffff00011e740c0 x13: 0000000000000004 x12: ffffffffffffffff
  x11: ffff700011e740c0 x10: 0000000000ff0100 x9 : 94ef24f55d2dbc00
  x8 : 94ef24f55d2dbc00 x7 : 0000000000000001 x6 : 0000000000000001
  x5 : ffff8000a4ce6f98 x4 : ffff80008f415ba0 x3 : ffff800080548ef0
  x2 : 0000000000000000 x1 : 0000000100000000 x0 : 000000000000003e
  Call trace:
   populate_free_space_tree+0x514/0x518 fs/btrfs/free-space-tree.c:1102 (P)
   btrfs_rebuild_free_space_tree+0x14c/0x54c fs/btrfs/free-space-tree.c:1337
   btrfs_start_pre_rw_mount+0xa78/0xe10 fs/btrfs/disk-io.c:3074
   btrfs_remount_rw fs/btrfs/super.c:1319 [inline]
   btrfs_reconfigure+0x828/0x2418 fs/btrfs/super.c:1543
   reconfigure_super+0x1d4/0x6f0 fs/super.c:1083
   do_remount fs/namespace.c:3365 [inline]
   path_mount+0xb34/0xde0 fs/namespace.c:4200
   do_mount fs/namespace.c:4221 [inline]
   __do_sys_mount fs/namespace.c:4432 [inline]
   __se_sys_mount fs/namespace.c:4409 [inline]
   __arm64_sys_mount+0x3e8/0x468 fs/namespace.c:4409
   __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
   invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
   el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
   do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
   el0_svc+0x58/0x17c arch/arm64/kernel/entry-common.c:767
   el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
   el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
  Code: f0047182 91178042 528089c3 9771d47 (d4210000)
  ---[ end trace 0000000000000000 ]---

This happens because we are processing an empty block group, which has
no extents allocated from it, there are no items for this block group,
including the block group item since block group items are stored in a
dedicated tree when using the block group tree feature. It also means
this is the block group with the highest start offset, so there are no
higher keys in the extent root, hence btrfs_search_slot_for_read()
returns 1 (no higher key found).

Fix this by asserting 'ret' is 0 only if the block group tree feature
is not enabled, in which case we should find a block group item for
the block group since it's stored in the extent root and block group
item keys are greater than extent item keys (the value for
BTRFS_BLOCK_GROUP_ITEM_KEY is 192 and for BTRFS_EXTENT_ITEM_KEY and
BTRFS_METADATA_ITEM_KEY the values are 168 and 169 respectively).
In case 'ret' is 1, we just need to add a record to the free space
tree which spans the whole block group, and we can achieve this by
making 'ret == 0' as the while loop's condition.

Reported-by: [email protected]
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 12, 2025
pert script tests fails with segmentation fault as below:

  92: perf script tests:
  --- start ---
  test child forked, pid 103769
  DB test
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.012 MB /tmp/perf-test-script.7rbftEpOzX/perf.data (9 samples) ]
  /usr/libexec/perf-core/tests/shell/script.sh: line 35:
  103780 Segmentation fault      (core dumped)
  perf script -i "${perfdatafile}" -s "${db_test}"
  --- Cleaning up ---
  ---- end(-1) ----
  92: perf script tests                                               : FAILED!

Backtrace pointed to :
	#0  0x0000000010247dd0 in maps.machine ()
	#1  0x00000000101d178c in db_export.sample ()
	#2  0x00000000103412c8 in python_process_event ()
	#3  0x000000001004eb28 in process_sample_event ()
	#4  0x000000001024fcd0 in machines.deliver_event ()
	#5  0x000000001025005c in perf_session.deliver_event ()
	torvalds#6  0x00000000102568b0 in __ordered_events__flush.part.0 ()
	torvalds#7  0x0000000010251618 in perf_session.process_events ()
	torvalds#8  0x0000000010053620 in cmd_script ()
	torvalds#9  0x00000000100b5a28 in run_builtin ()
	torvalds#10 0x00000000100b5f94 in handle_internal_command ()
	torvalds#11 0x0000000010011114 in main ()

Further investigation reveals that this occurs in the `perf script tests`,
because it uses `db_test.py` script. This script sets `perf_db_export_mode = True`.

With `perf_db_export_mode` enabled, if a sample originates from a hypervisor,
perf doesn't set maps for "[H]" sample in the code. Consequently, `al->maps` remains NULL
when `maps__machine(al->maps)` is called from `db_export__sample`.

As al->maps can be NULL in case of Hypervisor samples , use thread->maps
because even for Hypervisor sample, machine should exist.
If we don't have machine for some reason, return -1 to avoid segmentation fault.

Reported-by: Disha Goel <[email protected]>
Signed-off-by: Aditya Bodkhe <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
Tested-by: Disha Goel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Suggested-by: Adrian Hunter <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 12, 2025
The mailbox framework has a single inflight request at a time. If
a request is sent while another is still active, it will be queued
to the mailbox core ring buffer.

ACPM protocol did not serialize the calls to the mailbox subsystem so we
could start the timeout ticks in parallel for multiple requests, while
just one was being inflight.

Consider a hypothetical case where the xfer timeout is 100ms and an ACPM
transaction takes 90ms:
      | 0ms: Message #0 is queued in mailbox layer and sent out, then sits
      |      at acpm_dequeue_by_polling() with a timeout of 100ms
      | 1ms: Message #1 is queued in mailbox layer but not sent out yet.
      |      Since send_message() doesn't block, it also sits at
      |      acpm_dequeue_by_polling() with a timeout of 100ms
      |  ...
      | 90ms: Message #0 is completed, txdone is called and message #1 is sent
      | 101ms: Message #1 times out since the count started at 1ms. Even though
      |       it has only been inflight for 11ms.

Fix the problem by moving mbox_send_message() and mbox_client_txdone()
immediately after the message has been written to the TX queue and while
still keeping the ACPM TX queue lock. We thus tie together the TX write
with the doorbell ring and mark the TX as done after the doorbell has
been rung. This guarantees that the doorbell has been rang before
starting the timeout ticks. We should also see some performance
improvement as we no longer wait to receive a response before ringing
the doorbell for the next request, so the ACPM firmware shall be able to
drain faster the TX queue. Another benefit is that requests are no
longer able to ring the doorbell one for the other, so it eases
debugging. Finally, the mailbox software queue will always contain a
single doorbell request due to the serialization done at the ACPM TX
queue level. Protocols like ACPM, that handle their own hardware queues
need a passthrough mailbox API, where they are able to just ring the
doorbell or flip a bit directly into the mailbox controller. The mailbox
software queue mechanism, the locking done into the mailbox core is not
really needed, so hopefully this lays the foundation for a passthrough
mailbox API.

Reported-by: André Draszik <[email protected]>
Fixes: a88927b ("firmware: add Exynos ACPM protocol driver")
Signed-off-by: Tudor Ambarus <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Krzysztof Kozlowski <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 12, 2025
Some architectures (e.g. arm64) only support memory hotplug operations on
a restricted set of physical addresses. This applies even when we are
faking some CXL fixed memory windows for the purposes of cxl_test.
That range can be queried with mhp_get_pluggable_range(true). Use the
minimum of that the top of that range and iomem_resource.end to establish
the 64GiB region used by cxl_test.

From thread #2 which was related to the issue in #1.

[ dj: Add CONFIG_MEMORY_HOTPLUG config check, from Alison ]

Link: https://lore.kernel.org/linux-cxl/[email protected]/ #2
Reported-by: Itaru Kitayama <[email protected]>
Closes: pmem/ndctl#278 #1
Reviewed-by: Dan Williams <[email protected]>
Tested-by: Itaru Kitayama <[email protected] <mailto:[email protected]>
Tested-by: Marc Herbert <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Dave Jiang <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 12, 2025
syzbot reports:

BUG: KASAN: slab-use-after-free in getrusage+0x1109/0x1a60
Read of size 8 at addr ffff88810de2d2c8 by task a.out/304

CPU: 0 UID: 0 PID: 304 Comm: a.out Not tainted 6.16.0-rc1 #1 PREEMPT(voluntary)
Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x53/0x70
 print_report+0xd0/0x670
 ? __pfx__raw_spin_lock_irqsave+0x10/0x10
 ? getrusage+0x1109/0x1a60
 kasan_report+0xce/0x100
 ? getrusage+0x1109/0x1a60
 getrusage+0x1109/0x1a60
 ? __pfx_getrusage+0x10/0x10
 __io_uring_show_fdinfo+0x9fe/0x1790
 ? ksys_read+0xf7/0x1c0
 ? do_syscall_64+0xa4/0x260
 ? vsnprintf+0x591/0x1100
 ? __pfx___io_uring_show_fdinfo+0x10/0x10
 ? __pfx_vsnprintf+0x10/0x10
 ? mutex_trylock+0xcf/0x130
 ? __pfx_mutex_trylock+0x10/0x10
 ? __pfx_show_fd_locks+0x10/0x10
 ? io_uring_show_fdinfo+0x57/0x80
 io_uring_show_fdinfo+0x57/0x80
 seq_show+0x38c/0x690
 seq_read_iter+0x3f7/0x1180
 ? inode_set_ctime_current+0x160/0x4b0
 seq_read+0x271/0x3e0
 ? __pfx_seq_read+0x10/0x10
 ? __pfx__raw_spin_lock+0x10/0x10
 ? __mark_inode_dirty+0x402/0x810
 ? selinux_file_permission+0x368/0x500
 ? file_update_time+0x10f/0x160
 vfs_read+0x177/0xa40
 ? __pfx___handle_mm_fault+0x10/0x10
 ? __pfx_vfs_read+0x10/0x10
 ? mutex_lock+0x81/0xe0
 ? __pfx_mutex_lock+0x10/0x10
 ? fdget_pos+0x24d/0x4b0
 ksys_read+0xf7/0x1c0
 ? __pfx_ksys_read+0x10/0x10
 ? do_user_addr_fault+0x43b/0x9c0
 do_syscall_64+0xa4/0x260
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f0f74170fc9
Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 8
RSP: 002b:00007fffece049e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0f74170fc9
RDX: 0000000000001000 RSI: 00007fffece049f0 RDI: 0000000000000004
RBP: 00007fffece05ad0 R08: 0000000000000000 R09: 00007fffece04d90
R10: 0000000000000000 R11: 0000000000000206 R12: 00005651720a1100
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 </TASK>

Allocated by task 298:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 __kasan_slab_alloc+0x6e/0x70
 kmem_cache_alloc_node_noprof+0xe8/0x330
 copy_process+0x376/0x5e00
 create_io_thread+0xab/0xf0
 io_sq_offload_create+0x9ed/0xf20
 io_uring_setup+0x12b0/0x1cc0
 do_syscall_64+0xa4/0x260
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 22:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3b/0x60
 __kasan_slab_free+0x37/0x50
 kmem_cache_free+0xc4/0x360
 rcu_core+0x5ff/0x19f0
 handle_softirqs+0x18c/0x530
 run_ksoftirqd+0x20/0x30
 smpboot_thread_fn+0x287/0x6c0
 kthread+0x30d/0x630
 ret_from_fork+0xef/0x1a0
 ret_from_fork_asm+0x1a/0x30

Last potentially related work creation:
 kasan_save_stack+0x33/0x60
 kasan_record_aux_stack+0x8c/0xa0
 __call_rcu_common.constprop.0+0x68/0x940
 __schedule+0xff2/0x2930
 __cond_resched+0x4c/0x80
 mutex_lock+0x5c/0xe0
 io_uring_del_tctx_node+0xe1/0x2b0
 io_uring_clean_tctx+0xb7/0x160
 io_uring_cancel_generic+0x34e/0x760
 do_exit+0x240/0x2350
 do_group_exit+0xab/0x220
 __x64_sys_exit_group+0x39/0x40
 x64_sys_call+0x1243/0x1840
 do_syscall_64+0xa4/0x260
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff88810de2cb00
 which belongs to the cache task_struct of size 3712
The buggy address is located 1992 bytes inside of
 freed 3712-byte region [ffff88810de2cb00, ffff88810de2d980)

which is caused by the task_struct pointed to by sq->thread being
released while it is being used in the function
__io_uring_show_fdinfo(). Holding ctx->uring_lock does not prevent ehre
relase or exit of sq->thread.

Fix this by assigning and looking up ->thread under RCU, and grabbing a
reference to the task_struct. This ensures that it cannot get released
while fdinfo is using it.

Reported-by: [email protected]
Closes: https://lore.kernel.org/all/[email protected]
Fixes: 3fcb9d1 ("io_uring/sqpoll: statistics of the true utilization of sq threads")
Signed-off-by: Penglei Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[axboe: massage commit message]
Signed-off-by: Jens Axboe <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 12, 2025
…ing"

This reverts commit 5a9f299.

The following crash/regression was seen with the reverted commit
on a specific BMG SKU with no display capabilities:

[  115.582833] BUG: kernel NULL pointer dereference, address: 00000000000005d0
[  115.589775] #PF: supervisor write access in kernel mode
[  115.594976] #PF: error_code(0x0002) - not-present page
[  115.600088] PGD 0 P4D 0
[  115.602617] Oops: Oops: 0002 [#1] SMP
[  115.606267] CPU: 14 UID: 0 PID: 1547 Comm: kworker/14:3 Tainted: G 	U      E       6.15.0-local+ torvalds#62 PREEMPT(voluntary)
[  115.617332] Tainted: [U]=USER, [E]=UNSIGNED_MODULE
[  115.622100] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P DDR5 SODIMM SBS RVP, BIOS MTLPEMI1.R00.3471.D49.2401260852 01/26/2024
[  115.635314] Workqueue: pm pm_runtime_work
[  115.639309] RIP: 0010:_raw_spin_lock+0x17/0x30
[  115.662382] RSP: 0018:ffffd13f82e7bc30 EFLAGS: 00010246
[  115.667581] RAX: 0000000000000000 RBX: ffff8be919076000 RCX: 0000000000000002
[  115.674675] RDX: 0000000000000001 RSI: 000000000000004b RDI: 00000000000005d0
[  115.681775] RBP: ffffd13f82e7bc60 R08: ffffd13f82e7bb00 R09: ffff8beb0c1b06c0
[  115.688869] R10: ffff8be7c034f4c0 R11: fefefefefefefeff R12: fffffffffffffff0
[  115.695965] R13: ffff8be9190762e8 R14: ffff8be919077798 R15: 00000000000005d0
[  115.703062] FS:  0000000000000000(0000) GS:ffff8beb552b6000(0000) knlGS:0000000000000000
[  115.711106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  115.716826] CR2: 00000000000005d0 CR3: 000000024c68d002 CR4: 0000000000f72ef0
[  115.723921] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  115.731015] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[  115.738113] PKRU: 55555554
[  115.740816] Call Trace:
[  115.743258]  <TASK>
[  115.745363]  ? xe_display_flush_cleanup_work+0x92/0x120 [xe]
[  115.751102]  xe_display_pm_runtime_suspend+0x42/0x80 [xe]
[  115.756542]  xe_pm_runtime_suspend+0x11b/0x1b0 [xe]
[  115.761463]  xe_pci_runtime_suspend+0x23/0xd0 [xe]
[  115.766291]  pci_pm_runtime_suspend+0x6b/0x1a0
[  115.770717]  ? pci_pm_thaw_noirq+0xa0/0xa0
[  115.774797]  __rpm_callback+0x48/0x1e0
[  115.778531]  ? pci_pm_thaw_noirq+0xa0/0xa0
[  115.782614]  rpm_callback+0x66/0x70
[  115.786090]  ? pci_pm_thaw_noirq+0xa0/0xa0
[  115.790173]  rpm_suspend+0xe1/0x5e0
[  115.793647]  ? psi_task_switch+0xb8/0x200
[  115.797643]  ? finish_task_switch.isra.0+0x8d/0x270
[  115.802502]  pm_runtime_work+0xa6/0xc0
[  115.806238]  process_one_work+0x186/0x350
[  115.810234]  worker_thread+0x33a/0x480
[  115.813968]  ? process_one_work+0x350/0x350
[  115.818132]  kthread+0x10c/0x220
[  115.821350]  ? kthreads_online_cpu+0x120/0x120
[  115.825774]  ret_from_fork+0x3a/0x60
[  115.829339]  ? kthreads_online_cpu+0x120/0x120
[  115.833768]  ret_from_fork_asm+0x11/0x20
[  115.829339]  ? kthreads_online_cpu+0x120/0x120
[  115.833768]  ret_from_fork_asm+0x11/0x20
[  115.837680]  </TASK>
[  115.839907]  acpi_tad(E) drm(E)
[  115.931629] CR2: 00000000000005d0
[  115.934935] ---[ end trace 0000000000000000 ]---
[  115.939531] RIP: 0010:_raw_spin_lock+0x17/0x30

We cannot yet use xe->display to determine whether display hardware
has been successfully probed/initialized or not. This is because
xe->display would not be set to NULL even with GPUs with no display
capabilities (e.g, GMD_ID_DISPLAY = 0). However, this might change
in the future as Xe and i915 code is unified to deal with no display
cases.

Therefore, for now we have to continue to rely on xe->info.probe_display
(which would be set to false with display-less GPUs) to decide
whether to invoke any display related functions or not.

Cc: Jani Nikula <[email protected]>
Cc: Matt Roper <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Signed-off-by: Vivek Kasireddy <[email protected]>
Reviewed-by: Jani Nikula <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jani Nikula <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jun 12, 2025
[BUG]
There is syzbot based reproducer that can crash the kernel, with the
following call trace: (With some debug output added)

 DEBUG: rescue=ibadroots parsed
 BTRFS: device fsid 14d642db-7b15-43e4-81e6-4b8fac6a25f8 devid 1 transid 8 /dev/loop0 (7:0) scanned by repro (1010)
 BTRFS info (device loop0): first mount of filesystem 14d642db-7b15-43e4-81e6-4b8fac6a25f8
 BTRFS info (device loop0): using blake2b (blake2b-256-generic) checksum algorithm
 BTRFS info (device loop0): using free-space-tree
 BTRFS warning (device loop0): checksum verify failed on logical 5312512 mirror 1 wanted 0xb043382657aede36608fd3386d6b001692ff406164733d94e2d9a180412c6003 found 0x810ceb2bacb7f0f9eb2bf3b2b15c02af867cb35ad450898169f3b1f0bd818651 level 0
 DEBUG: read tree root path failed for tree csum, ret=-5
 BTRFS warning (device loop0): checksum verify failed on logical 5328896 mirror 1 wanted 0x51be4e8b303da58e6340226815b70e3a93592dac3f30dd510c7517454de8567a found 0x51be4e8b303da58e634022a315b70e3a93592dac3f30dd510c7517454de8567a level 0
 BTRFS warning (device loop0): checksum verify failed on logical 5292032 mirror 1 wanted 0x1924ccd683be9efc2fa98582ef58760e3848e9043db8649ee382681e220cdee4 found 0x0cb6184f6e8799d9f8cb335dccd1d1832da1071d12290dab3b85b587ecacca6e level 0
 process 'repro' launched './file2' with NULL argv: empty string added
 DEBUG: no csum root, idatacsums=0 ibadroots=134217728
 Oops: general protection fault, probably for non-canonical address 0xdffffc0000000041: 0000 [#1] SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000208-0x000000000000020f]
 CPU: 5 UID: 0 PID: 1010 Comm: repro Tainted: G           OE       6.15.0-custom+ torvalds#249 PREEMPT(full)
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
 RIP: 0010:btrfs_lookup_csum+0x93/0x3d0 [btrfs]
 Call Trace:
  <TASK>
  btrfs_lookup_bio_sums+0x47a/0xdf0 [btrfs]
  btrfs_submit_bbio+0x43e/0x1a80 [btrfs]
  submit_one_bio+0xde/0x160 [btrfs]
  btrfs_readahead+0x498/0x6a0 [btrfs]
  read_pages+0x1c3/0xb20
  page_cache_ra_order+0x4b5/0xc20
  filemap_get_pages+0x2d3/0x19e0
  filemap_read+0x314/0xde0
  __kernel_read+0x35b/0x900
  bprm_execve+0x62e/0x1140
  do_execveat_common.isra.0+0x3fc/0x520
  __x64_sys_execveat+0xdc/0x130
  do_syscall_64+0x54/0x1d0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 ---[ end trace 0000000000000000 ]---

[CAUSE]
Firstly the fs has a corrupted csum tree root, thus to mount the fs we
have to go "ro,rescue=ibadroots" mount option.

Normally with that mount option, a bad csum tree root should set
BTRFS_FS_STATE_NO_DATA_CSUMS flag, so that any future data read will
ignore csum search.

But in this particular case, we have the following call trace that
caused NULL csum root, but not setting BTRFS_FS_STATE_NO_DATA_CSUMS:

load_global_roots_objectid():

		ret = btrfs_search_slot();
		/* Succeeded */
		btrfs_item_key_to_cpu()
		found = true;
		/* We found the root item for csum tree. */
		root = read_tree_root_path();
		if (IS_ERR(root)) {
			if (!btrfs_test_opt(fs_info, IGNOREBADROOTS))
			/*
			 * Since we have rescue=ibadroots mount option,
			 * @ret is still 0.
			 */
			break;
	if (!found || ret) {
		/* @found is true, @ret is 0, error handling for csum
		 * tree is skipped.
		 */
	}

This means we completely skipped to set BTRFS_FS_STATE_NO_DATA_CSUMS if
the csum tree is corrupted, which results unexpected later csum lookup.

[FIX]
If read_tree_root_path() failed, always populate @ret to the error
number.

As at the end of the function, we need @ret to determine if we need to
do the extra error handling for csum tree.

Fixes: abed4aa ("btrfs: track the csum, extent, and free space trees in a rb tree")
Reported-by: Zhiyu Zhang <[email protected]>
Reported-by: Longxing Li <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants