Skip to content

tcp: increase flexibility of EBPF congestion control initialization #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

kernel-patches-bot
Copy link

Pull request for series with
subject: tcp: increase flexibility of EBPF congestion control initialization
version: 1
url: https://patchwork.ozlabs.org/project/netdev/list/?series=200608

kernel-patches-bot and others added 5 commits September 9, 2020 13:12
has not been initialized already.

With this new approach, we can arrange things so that if the EBPF code
sets the congestion control by calling setsockopt(TCP_CONGESTION) then
tcp_init_transfer() will not re-initialize the CC module.

This is an approach that has the following beneficial properties:

(1) This allows CC module customizations made by the EBPF called in
    tcp_init_transfer() to persist, and not be wiped out by a later
    call to tcp_init_congestion_control() in tcp_init_transfer().

(2) Does not flip the order of EBPF and CC init, to avoid causing bugs
    for existing code upstream that depends on the current order.

(3) Does not cause 2 initializations for for CC in the case where the
    EBPF called in tcp_init_transfer() wants to set the CC to a new CC
    algorithm.

(4) Allows follow-on simplifications to the code in net/core/filter.c
    and net/ipv4/tcp_cong.c, which currently both have some complexity
    to special-case CC initialization to avoid double CC
    initialization if EBPF sets the CC.

Signed-off-by: Neal Cardwell <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Acked-by: Kevin Yang <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Lawrence Brakmo <[email protected]>
---
 include/net/inet_connection_sock.h | 3 ++-
 net/ipv4/tcp.c                     | 1 +
 net/ipv4/tcp_cong.c                | 3 ++-
 net/ipv4/tcp_input.c               | 4 +++-
 4 files changed, 8 insertions(+), 3 deletions(-)
control twice, when EBPF sets the congestion control algorithm at
connection establishment we can simplify the code by simply
initializing the congestion control module at that time.

Signed-off-by: Neal Cardwell <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Acked-by: Kevin Yang <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Lawrence Brakmo <[email protected]>
---
 net/core/filter.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)
tcp_set_congestion_control() want to initialize congestion control, we
can simplify tcp_set_congestion_control() by removing the reinit
argument and the code to support it.

Signed-off-by: Neal Cardwell <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Acked-by: Kevin Yang <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Lawrence Brakmo <[email protected]>
---
 include/net/tcp.h   |  2 +-
 net/core/filter.c   |  3 +--
 net/ipv4/tcp.c      |  2 +-
 net/ipv4/tcp_cong.c | 11 ++---------
 4 files changed, 5 insertions(+), 13 deletions(-)
flags argument to _bpf_setsockopt(), we can remove that argument.

Signed-off-by: Neal Cardwell <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Acked-by: Kevin Yang <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Lawrence Brakmo <[email protected]>
---
 net/core/filter.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)
@kernel-patches-bot
Copy link
Author

At least one diff in series https://patchwork.ozlabs.org/project/netdev/list/?series=200608 expired. Closing PR.

kernel-patches-bot pushed a commit that referenced this pull request Sep 16, 2020
Recently nvme_dev.q_depth was changed from an int to u16 type.

This falls over for the queue depth calculation in nvme_pci_enable(),
where NVME_CAP_MQES(dev->ctrl.cap) + 1 may overflow as a u16, as
NVME_CAP_MQES() is a 16b number also. That happens for me, and this is the
result:

root@ubuntu:/home/john# [148.272996] Unable to handle kernel NULL pointer
dereference at virtual address 0000000000000010
Mem abort info:
ESR = 0x96000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=00000a27bf3c9000
[0000000000000010] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in: nvme nvme_core
CPU: 56 PID: 256 Comm: kworker/u195:0 Not tainted
5.8.0-next-20200812 #27
Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 -
V1.16.01 03/15/2019
Workqueue: nvme-reset-wq nvme_reset_work [nvme]
pstate: 80c00009 (Nzcv daif +PAN +UAO BTYPE=--)
pc : __sg_alloc_table_from_pages+0xec/0x238
lr : __sg_alloc_table_from_pages+0xc8/0x238
sp : ffff800013ccbad0
x29: ffff800013ccbad0 x28: ffff0a27b3d380a8
x27: 0000000000000000 x26: 0000000000002dc2
x25: 0000000000000dc0 x24: 0000000000000000
x23: 0000000000000000 x22: ffff800013ccbbe8
x21: 0000000000000010 x20: 0000000000000000
x19: 00000000fffff000 x18: ffffffffffffffff
x17: 00000000000000c0 x16: fffffe289eaf6380
x15: ffff800011b59948 x14: ffff002bc8fe98f8
x13: ff00000000000000 x12: ffff8000114ca000
x11: 0000000000000000 x10: ffffffffffffffff
x9 : ffffffffffffffc0 x8 : ffff0a27b5f9b6a0
x7 : 0000000000000000 x6 : 0000000000000001
x5 : ffff0a27b5f9b680 x4 : 0000000000000000
x3 : ffff0a27b5f9b680 x2 : 0000000000000000
 x1 : 0000000000000001 x0 : 0000000000000000
 Call trace:
__sg_alloc_table_from_pages+0xec/0x238
sg_alloc_table_from_pages+0x18/0x28
iommu_dma_alloc+0x474/0x678
dma_alloc_attrs+0xd8/0xf0
nvme_alloc_queue+0x114/0x160 [nvme]
nvme_reset_work+0xb34/0x14b4 [nvme]
process_one_work+0x1e8/0x360
worker_thread+0x44/0x478
kthread+0x150/0x158
ret_from_fork+0x10/0x34
 Code: f94002c3 6b01017f 540007c2 11000486 (f8645aa5)
---[ end trace 89bb2b72d59bf925 ]---

Fix by making onto a u32.

Also use u32 for nvme_dev.q_depth, as we assign this value from
nvme_dev.q_depth, and nvme_dev.q_depth will possibly hold 65536 - this
avoids the same crash as above.

Fixes: 61f3b89 ("nvme-pci: use unsigned for io queue depth")
Signed-off-by: John Garry <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Dec 14, 2020
If cm_create_timewait_info() fails, the timewait_info pointer will contain
an error value and will be used in cm_remove_remote() later.

  general protection fault, probably for non-canonical address 0xdffffc0000000024: 0000 [#1] SMP KASAN PTI
  KASAN: null-ptr-deref in range [0×0000000000000120-0×0000000000000127]
  CPU: 2 PID: 12446 Comm: syz-executor.3 Not tainted 5.10.0-rc5-5d4c0742a60e #27
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:cm_remove_remote.isra.0+0x24/0×170 drivers/infiniband/core/cm.c:978
  Code: 84 00 00 00 00 00 41 54 55 53 48 89 fb 48 8d ab 2d 01 00 00 e8 7d bf 4b fe 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 04 02 48 89 ea 83 e2 07 38 d0 7f 08 84 c0 0f 85 fc 00 00 00
  RSP: 0018:ffff888013127918 EFLAGS: 00010006
  RAX: dffffc0000000000 RBX: fffffffffffffff4 RCX: ffffc9000a18b000
  RDX: 0000000000000024 RSI: ffffffff82edc573 RDI: fffffffffffffff4
  RBP: 0000000000000121 R08: 0000000000000001 R09: ffffed1002624f1d
  R10: 0000000000000003 R11: ffffed1002624f1c R12: ffff888107760c70
  R13: ffff888107760c40 R14: fffffffffffffff4 R15: ffff888107760c9c
  FS:  00007fe1ffcc1700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000001b2ff21000 CR3: 000000010f504001 CR4: 0000000000370ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   cm_destroy_id+0x189/0×15b0 drivers/infiniband/core/cm.c:1155
   cma_connect_ib drivers/infiniband/core/cma.c:4029 [inline]
   rdma_connect_locked+0x1100/0×17c0 drivers/infiniband/core/cma.c:4107
   rdma_connect+0x2a/0×40 drivers/infiniband/core/cma.c:4140
   ucma_connect+0x277/0×340 drivers/infiniband/core/ucma.c:1069
   ucma_write+0x236/0×2f0 drivers/infiniband/core/ucma.c:1724
   vfs_write+0x220/0×830 fs/read_write.c:603
   ksys_write+0x1df/0×240 fs/read_write.c:658
   do_syscall_64+0x33/0×40 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: a977049 ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Maor Gottlieb <[email protected]>
Reported-by: Amit Matityahu <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 9, 2021
Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

  # ./test_progs -t cgroup_v1v2
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  test_cgroup_v1v2:PASS:client_fd 0 nsec
  test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  run_test:PASS:join_classid 0 nsec
  (network_helpers.c:219: errno: None) Unexpected success to connect to server
  test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
  #27 cgroup_v1v2:FAIL
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

  # ./test_progs -t cgroup_v1v2
  #27 cgroup_v1v2:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 9, 2021
Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

  # ./test_progs -t cgroup_v1v2
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  test_cgroup_v1v2:PASS:client_fd 0 nsec
  test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  run_test:PASS:join_classid 0 nsec
  (network_helpers.c:219: errno: None) Unexpected success to connect to server
  test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
  #27 cgroup_v1v2:FAIL
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

  # ./test_progs -t cgroup_v1v2
  #27 cgroup_v1v2:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 9, 2021
Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

  # ./test_progs -t cgroup_v1v2
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  test_cgroup_v1v2:PASS:client_fd 0 nsec
  test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  run_test:PASS:join_classid 0 nsec
  (network_helpers.c:219: errno: None) Unexpected success to connect to server
  test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
  #27 cgroup_v1v2:FAIL
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

  # ./test_progs -t cgroup_v1v2
  #27 cgroup_v1v2:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 10, 2021
Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

  # ./test_progs -t cgroup_v1v2
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  test_cgroup_v1v2:PASS:client_fd 0 nsec
  test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  run_test:PASS:join_classid 0 nsec
  (network_helpers.c:219: errno: None) Unexpected success to connect to server
  test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
  #27 cgroup_v1v2:FAIL
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

  # ./test_progs -t cgroup_v1v2
  #27 cgroup_v1v2:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 10, 2021
Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

  # ./test_progs -t cgroup_v1v2
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  test_cgroup_v1v2:PASS:client_fd 0 nsec
  test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  run_test:PASS:join_classid 0 nsec
  (network_helpers.c:219: errno: None) Unexpected success to connect to server
  test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
  #27 cgroup_v1v2:FAIL
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

  # ./test_progs -t cgroup_v1v2
  #27 cgroup_v1v2:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 13, 2021
Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

  # ./test_progs -t cgroup_v1v2
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  test_cgroup_v1v2:PASS:client_fd 0 nsec
  test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  run_test:PASS:join_classid 0 nsec
  (network_helpers.c:219: errno: None) Unexpected success to connect to server
  test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
  #27 cgroup_v1v2:FAIL
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

  # ./test_progs -t cgroup_v1v2
  #27 cgroup_v1v2:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 13, 2021
Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

  # ./test_progs -t cgroup_v1v2
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  test_cgroup_v1v2:PASS:client_fd 0 nsec
  test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  run_test:PASS:join_classid 0 nsec
  (network_helpers.c:219: errno: None) Unexpected success to connect to server
  test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
  #27 cgroup_v1v2:FAIL
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

  # ./test_progs -t cgroup_v1v2
  #27 cgroup_v1v2:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 13, 2021
Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

  # ./test_progs -t cgroup_v1v2
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  test_cgroup_v1v2:PASS:client_fd 0 nsec
  test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  run_test:PASS:join_classid 0 nsec
  (network_helpers.c:219: errno: None) Unexpected success to connect to server
  test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
  #27 cgroup_v1v2:FAIL
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

  # ./test_progs -t cgroup_v1v2
  #27 cgroup_v1v2:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 14, 2021
Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.

Before fix:

  # ./test_progs -t cgroup_v1v2
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  test_cgroup_v1v2:PASS:client_fd 0 nsec
  test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
  test_cgroup_v1v2:PASS:server_fd 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
  run_test:PASS:skel_open 0 nsec
  run_test:PASS:prog_attach 0 nsec
  run_test:PASS:join_classid 0 nsec
  (network_helpers.c:219: errno: None) Unexpected success to connect to server
  test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
  #27 cgroup_v1v2:FAIL
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

After fix:

  # ./test_progs -t cgroup_v1v2
  #27 cgroup_v1v2:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
kernel-patches-bot pushed a commit that referenced this pull request Oct 4, 2021
The change of devlink_alloc() to accept device makes sure that device
is fully initialized and device_register() does nothing except allowing
users to use that devlink instance.

Such change ensures that no user input will be usable till that point and
it eliminates the need to worry about internal locking as long as devlink_register
is called last since all accesses to the devlink are during initialization.

This change fixes the following lockdep warning.

 ======================================================
 WARNING: possible circular locking dependency detected
 5.14.0-rc2+ #27 Not tainted
 ------------------------------------------------------
 devlink/265 is trying to acquire lock:
 ffff8880133c2bc0 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_unload_one+0x1e/0xa0 [mlx5_core]
 but task is already holding lock:
 ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0
 which lock already depends on the new lock.
 the existing dependency chain (in reverse order) is:

 -> #1 (devlink_mutex){+.+.}-{3:3}:
        __mutex_lock+0x149/0x1310
        devlink_register+0xe7/0x280
        mlx5_devlink_register+0x118/0x480 [mlx5_core]
        mlx5_init_one+0x34b/0x440 [mlx5_core]
        probe_one+0x480/0x6e0 [mlx5_core]
        pci_device_probe+0x2a0/0x4a0
        really_probe+0x1cb/0xba0
        __driver_probe_device+0x18f/0x470
        driver_probe_device+0x49/0x120
        __driver_attach+0x1ce/0x400
        bus_for_each_dev+0x11e/0x1a0
        bus_add_driver+0x309/0x570
        driver_register+0x20f/0x390
        0xffffffffa04a0062
        do_one_initcall+0xd5/0x400
        do_init_module+0x1c8/0x760
        load_module+0x7d9d/0xa4b0
        __do_sys_finit_module+0x118/0x1a0
        do_syscall_64+0x3d/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae

 -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
        __lock_acquire+0x2999/0x5a40
        lock_acquire+0x1a9/0x4a0
        __mutex_lock+0x149/0x1310
        mlx5_unload_one+0x1e/0xa0 [mlx5_core]
        mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core]
        devlink_reload+0x1f2/0x640
        devlink_nl_cmd_reload+0x6c3/0x10d0
        genl_family_rcv_msg_doit+0x1e9/0x2f0
        genl_rcv_msg+0x27f/0x4a0
        netlink_rcv_skb+0x11e/0x340
        genl_rcv+0x24/0x40
        netlink_unicast+0x433/0x700
        netlink_sendmsg+0x6fb/0xbe0
        sock_sendmsg+0xb0/0xe0
        __sys_sendto+0x192/0x240
        __x64_sys_sendto+0xdc/0x1b0
        do_syscall_64+0x3d/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(devlink_mutex);
                                lock(&dev->intf_state_mutex);
                                lock(devlink_mutex);
   lock(&dev->intf_state_mutex);

  *** DEADLOCK ***

 3 locks held by devlink/265:
  #0: ffffffff836371d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40
  #1: ffffffff83637288 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x31a/0x4a0
  #2: ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0

 stack backtrace:
 CPU: 0 PID: 265 Comm: devlink Not tainted 5.14.0-rc2+ #27
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Call Trace:
  dump_stack_lvl+0x45/0x59
  check_noncircular+0x268/0x310
  ? print_circular_bug+0x460/0x460
  ? __kernel_text_address+0xe/0x30
  ? alloc_chain_hlocks+0x1e6/0x5a0
  __lock_acquire+0x2999/0x5a40
  ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
  ? add_lock_to_list.constprop.0+0x6c/0x530
  lock_acquire+0x1a9/0x4a0
  ? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
  ? lock_release+0x6c0/0x6c0
  ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
  ? lock_is_held_type+0x98/0x110
  __mutex_lock+0x149/0x1310
  ? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
  ? lock_is_held_type+0x98/0x110
  ? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
  ? find_held_lock+0x2d/0x110
  ? mutex_lock_io_nested+0x1160/0x1160
  ? mlx5_lag_is_active+0x72/0x90 [mlx5_core]
  ? lock_downgrade+0x6d0/0x6d0
  ? do_raw_spin_lock+0x12e/0x270
  ? rwlock_bug.part.0+0x90/0x90
  ? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
  mlx5_unload_one+0x1e/0xa0 [mlx5_core]
  mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core]
  ? netlink_broadcast_filtered+0x308/0xac0
  ? mlx5_devlink_info_get+0x1f0/0x1f0 [mlx5_core]
  ? __build_skb_around+0x110/0x2b0
  ? __alloc_skb+0x113/0x2b0
  devlink_reload+0x1f2/0x640
  ? devlink_unregister+0x1e0/0x1e0
  ? security_capable+0x51/0x90
  devlink_nl_cmd_reload+0x6c3/0x10d0
  ? devlink_nl_cmd_get_doit+0x1e0/0x1e0
  ? devlink_nl_pre_doit+0x72/0x8d0
  genl_family_rcv_msg_doit+0x1e9/0x2f0
  ? __lock_acquire+0x15e2/0x5a40
  ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240
  ? mutex_lock_io_nested+0x1160/0x1160
  ? security_capable+0x51/0x90
  genl_rcv_msg+0x27f/0x4a0
  ? genl_get_cmd+0x3c0/0x3c0
  ? lock_acquire+0x1a9/0x4a0
  ? devlink_nl_cmd_get_doit+0x1e0/0x1e0
  ? lock_release+0x6c0/0x6c0
  netlink_rcv_skb+0x11e/0x340
  ? genl_get_cmd+0x3c0/0x3c0
  ? netlink_ack+0x930/0x930
  genl_rcv+0x24/0x40
  netlink_unicast+0x433/0x700
  ? netlink_attachskb+0x750/0x750
  ? __alloc_skb+0x113/0x2b0
  netlink_sendmsg+0x6fb/0xbe0
  ? netlink_unicast+0x700/0x700
  ? netlink_unicast+0x700/0x700
  sock_sendmsg+0xb0/0xe0
  __sys_sendto+0x192/0x240
  ? __x64_sys_getpeername+0xb0/0xb0
  ? do_sys_openat2+0x10a/0x370
  ? down_write_nested+0x150/0x150
  ? do_user_addr_fault+0x215/0xd50
  ? __x64_sys_openat+0x11f/0x1d0
  ? __x64_sys_open+0x1a0/0x1a0
  __x64_sys_sendto+0xdc/0x1b0
  ? syscall_enter_from_user_mode+0x1d/0x50
  do_syscall_64+0x3d/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f50b50b6b3a
 Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
 RSP: 002b:00007fff6c0d3f38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f50b50b6b3a
 RDX: 0000000000000038 RSI: 000055763ac08440 RDI: 0000000000000003
 RBP: 000055763ac08410 R08: 00007f50b5192200 R09: 000000000000000c
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
 R13: 0000000000000000 R14: 000055763ac08410 R15: 000055763ac08440
 mlx5_core 0000:00:09.0: firmware version: 4.8.9999
 mlx5_core 0000:00:09.0: 0.000 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x255 link)
 mlx5_core 0000:00:09.0 eth1: Link up

Fixes: a6f3b62 ("net/mlx5: Move devlink registration before interfaces load")
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 17, 2021
The perf_buffer fails on system with offline cpus:

  # test_progs -t perf_buffer
  test_perf_buffer:PASS:nr_cpus 0 nsec
  test_perf_buffer:PASS:nr_on_cpus 0 nsec
  test_perf_buffer:PASS:skel_load 0 nsec
  test_perf_buffer:PASS:attach_kprobe 0 nsec
  test_perf_buffer:PASS:perf_buf__new 0 nsec
  test_perf_buffer:PASS:epoll_fd 0 nsec
  skipping offline CPU #24
  skipping offline CPU #25
  skipping offline CPU #26
  skipping offline CPU #27
  skipping offline CPU #28
  skipping offline CPU #29
  skipping offline CPU #30
  skipping offline CPU #31
  test_perf_buffer:PASS:perf_buffer__poll 0 nsec
  test_perf_buffer:PASS:seen_cpu_cnt 0 nsec
  test_perf_buffer:FAIL:buf_cnt got 24, expected 32
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Changing the test to check online cpus instead of possible.

Signed-off-by: Jiri Olsa <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 19, 2021
The perf_buffer fails on system with offline cpus:

  # test_progs -t perf_buffer
  test_perf_buffer:PASS:nr_cpus 0 nsec
  test_perf_buffer:PASS:nr_on_cpus 0 nsec
  test_perf_buffer:PASS:skel_load 0 nsec
  test_perf_buffer:PASS:attach_kprobe 0 nsec
  test_perf_buffer:PASS:perf_buf__new 0 nsec
  test_perf_buffer:PASS:epoll_fd 0 nsec
  skipping offline CPU #24
  skipping offline CPU #25
  skipping offline CPU #26
  skipping offline CPU #27
  skipping offline CPU #28
  skipping offline CPU #29
  skipping offline CPU #30
  skipping offline CPU #31
  test_perf_buffer:PASS:perf_buffer__poll 0 nsec
  test_perf_buffer:PASS:seen_cpu_cnt 0 nsec
  test_perf_buffer:FAIL:buf_cnt got 24, expected 32
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Changing the test to check online cpus instead of possible.

Signed-off-by: Jiri Olsa <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 20, 2021
The perf_buffer fails on system with offline cpus:

  # test_progs -t perf_buffer
  test_perf_buffer:PASS:nr_cpus 0 nsec
  test_perf_buffer:PASS:nr_on_cpus 0 nsec
  test_perf_buffer:PASS:skel_load 0 nsec
  test_perf_buffer:PASS:attach_kprobe 0 nsec
  test_perf_buffer:PASS:perf_buf__new 0 nsec
  test_perf_buffer:PASS:epoll_fd 0 nsec
  skipping offline CPU #24
  skipping offline CPU #25
  skipping offline CPU #26
  skipping offline CPU #27
  skipping offline CPU #28
  skipping offline CPU #29
  skipping offline CPU #30
  skipping offline CPU #31
  test_perf_buffer:PASS:perf_buffer__poll 0 nsec
  test_perf_buffer:PASS:seen_cpu_cnt 0 nsec
  test_perf_buffer:FAIL:buf_cnt got 24, expected 32
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Changing the test to check online cpus instead of possible.

Signed-off-by: Jiri Olsa <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 20, 2021
The perf_buffer fails on system with offline cpus:

  # test_progs -t perf_buffer
  test_perf_buffer:PASS:nr_cpus 0 nsec
  test_perf_buffer:PASS:nr_on_cpus 0 nsec
  test_perf_buffer:PASS:skel_load 0 nsec
  test_perf_buffer:PASS:attach_kprobe 0 nsec
  test_perf_buffer:PASS:perf_buf__new 0 nsec
  test_perf_buffer:PASS:epoll_fd 0 nsec
  skipping offline CPU #24
  skipping offline CPU #25
  skipping offline CPU #26
  skipping offline CPU #27
  skipping offline CPU #28
  skipping offline CPU #29
  skipping offline CPU #30
  skipping offline CPU #31
  test_perf_buffer:PASS:perf_buffer__poll 0 nsec
  test_perf_buffer:PASS:seen_cpu_cnt 0 nsec
  test_perf_buffer:FAIL:buf_cnt got 24, expected 32
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Changing the test to check online cpus instead of possible.

Signed-off-by: Jiri Olsa <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 21, 2021
The perf_buffer fails on system with offline cpus:

  # test_progs -t perf_buffer
  test_perf_buffer:PASS:nr_cpus 0 nsec
  test_perf_buffer:PASS:nr_on_cpus 0 nsec
  test_perf_buffer:PASS:skel_load 0 nsec
  test_perf_buffer:PASS:attach_kprobe 0 nsec
  test_perf_buffer:PASS:perf_buf__new 0 nsec
  test_perf_buffer:PASS:epoll_fd 0 nsec
  skipping offline CPU #24
  skipping offline CPU #25
  skipping offline CPU #26
  skipping offline CPU #27
  skipping offline CPU #28
  skipping offline CPU #29
  skipping offline CPU #30
  skipping offline CPU #31
  test_perf_buffer:PASS:perf_buffer__poll 0 nsec
  test_perf_buffer:PASS:seen_cpu_cnt 0 nsec
  test_perf_buffer:FAIL:buf_cnt got 24, expected 32
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Changing the test to check online cpus instead of possible.

Signed-off-by: Jiri Olsa <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 21, 2021
The perf_buffer fails on system with offline cpus:

  # test_progs -t perf_buffer
  test_perf_buffer:PASS:nr_cpus 0 nsec
  test_perf_buffer:PASS:nr_on_cpus 0 nsec
  test_perf_buffer:PASS:skel_load 0 nsec
  test_perf_buffer:PASS:attach_kprobe 0 nsec
  test_perf_buffer:PASS:perf_buf__new 0 nsec
  test_perf_buffer:PASS:epoll_fd 0 nsec
  skipping offline CPU #24
  skipping offline CPU #25
  skipping offline CPU #26
  skipping offline CPU #27
  skipping offline CPU #28
  skipping offline CPU #29
  skipping offline CPU #30
  skipping offline CPU #31
  test_perf_buffer:PASS:perf_buffer__poll 0 nsec
  test_perf_buffer:PASS:seen_cpu_cnt 0 nsec
  test_perf_buffer:FAIL:buf_cnt got 24, expected 32
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Changing the test to check online cpus instead of possible.

Signed-off-by: Jiri Olsa <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 21, 2021
The perf_buffer fails on system with offline cpus:

  # test_progs -t perf_buffer
  test_perf_buffer:PASS:nr_cpus 0 nsec
  test_perf_buffer:PASS:nr_on_cpus 0 nsec
  test_perf_buffer:PASS:skel_load 0 nsec
  test_perf_buffer:PASS:attach_kprobe 0 nsec
  test_perf_buffer:PASS:perf_buf__new 0 nsec
  test_perf_buffer:PASS:epoll_fd 0 nsec
  skipping offline CPU #24
  skipping offline CPU #25
  skipping offline CPU #26
  skipping offline CPU #27
  skipping offline CPU #28
  skipping offline CPU #29
  skipping offline CPU #30
  skipping offline CPU #31
  test_perf_buffer:PASS:perf_buffer__poll 0 nsec
  test_perf_buffer:PASS:seen_cpu_cnt 0 nsec
  test_perf_buffer:FAIL:buf_cnt got 24, expected 32
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Changing the test to check online cpus instead of possible.

Signed-off-by: Jiri Olsa <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 21, 2021
The perf_buffer fails on system with offline cpus:

  # test_progs -t perf_buffer
  test_perf_buffer:PASS:nr_cpus 0 nsec
  test_perf_buffer:PASS:nr_on_cpus 0 nsec
  test_perf_buffer:PASS:skel_load 0 nsec
  test_perf_buffer:PASS:attach_kprobe 0 nsec
  test_perf_buffer:PASS:perf_buf__new 0 nsec
  test_perf_buffer:PASS:epoll_fd 0 nsec
  skipping offline CPU #24
  skipping offline CPU #25
  skipping offline CPU #26
  skipping offline CPU #27
  skipping offline CPU #28
  skipping offline CPU #29
  skipping offline CPU #30
  skipping offline CPU #31
  test_perf_buffer:PASS:perf_buffer__poll 0 nsec
  test_perf_buffer:PASS:seen_cpu_cnt 0 nsec
  test_perf_buffer:FAIL:buf_cnt got 24, expected 32
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Changing the test to check online cpus instead of possible.

Signed-off-by: Jiri Olsa <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 21, 2021
The perf_buffer fails on system with offline cpus:

  # test_progs -t perf_buffer
  test_perf_buffer:PASS:nr_cpus 0 nsec
  test_perf_buffer:PASS:nr_on_cpus 0 nsec
  test_perf_buffer:PASS:skel_load 0 nsec
  test_perf_buffer:PASS:attach_kprobe 0 nsec
  test_perf_buffer:PASS:perf_buf__new 0 nsec
  test_perf_buffer:PASS:epoll_fd 0 nsec
  skipping offline CPU #24
  skipping offline CPU #25
  skipping offline CPU #26
  skipping offline CPU #27
  skipping offline CPU #28
  skipping offline CPU #29
  skipping offline CPU #30
  skipping offline CPU #31
  test_perf_buffer:PASS:perf_buffer__poll 0 nsec
  test_perf_buffer:PASS:seen_cpu_cnt 0 nsec
  test_perf_buffer:FAIL:buf_cnt got 24, expected 32
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Changing the test to check online cpus instead of possible.

Signed-off-by: Jiri Olsa <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 21, 2021
The perf_buffer fails on system with offline cpus:

  # test_progs -t perf_buffer
  test_perf_buffer:PASS:nr_cpus 0 nsec
  test_perf_buffer:PASS:nr_on_cpus 0 nsec
  test_perf_buffer:PASS:skel_load 0 nsec
  test_perf_buffer:PASS:attach_kprobe 0 nsec
  test_perf_buffer:PASS:perf_buf__new 0 nsec
  test_perf_buffer:PASS:epoll_fd 0 nsec
  skipping offline CPU #24
  skipping offline CPU #25
  skipping offline CPU #26
  skipping offline CPU #27
  skipping offline CPU #28
  skipping offline CPU #29
  skipping offline CPU #30
  skipping offline CPU #31
  test_perf_buffer:PASS:perf_buffer__poll 0 nsec
  test_perf_buffer:PASS:seen_cpu_cnt 0 nsec
  test_perf_buffer:FAIL:buf_cnt got 24, expected 32
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Changing the test to check online cpus instead of possible.

Signed-off-by: Jiri Olsa <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 21, 2021
The perf_buffer fails on system with offline cpus:

  # test_progs -t perf_buffer
  test_perf_buffer:PASS:nr_cpus 0 nsec
  test_perf_buffer:PASS:nr_on_cpus 0 nsec
  test_perf_buffer:PASS:skel_load 0 nsec
  test_perf_buffer:PASS:attach_kprobe 0 nsec
  test_perf_buffer:PASS:perf_buf__new 0 nsec
  test_perf_buffer:PASS:epoll_fd 0 nsec
  skipping offline CPU #24
  skipping offline CPU #25
  skipping offline CPU #26
  skipping offline CPU #27
  skipping offline CPU #28
  skipping offline CPU #29
  skipping offline CPU #30
  skipping offline CPU #31
  test_perf_buffer:PASS:perf_buffer__poll 0 nsec
  test_perf_buffer:PASS:seen_cpu_cnt 0 nsec
  test_perf_buffer:FAIL:buf_cnt got 24, expected 32
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Changing the test to check online cpus instead of possible.

Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
kernel-patches-bot pushed a commit that referenced this pull request Nov 15, 2021
…_fini()

When the amt module is being removed, it calls flush_delayed_work() to exit
source_gc_wq. But it wouldn't be exited properly because the
amt_source_gc_work(), which is the callback function of source_gc_wq
internally calls mod_delayed_work() again.
So, amt_source_gc_work() would be called after the amt module is removed.
Therefore kernel panic would occur.
In order to avoid it, cancel_delayed_work() should be used instead of
flush_delayed_work().

Test commands:
   modprobe amt
   modprobe -rv amt

Splat looks like:
 BUG: unable to handle page fault for address: fffffbfff80f50db
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 1237ee067 P4D 1237ee067 PUD 1237b2067 PMD 100c11067 PTE 0
 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0+ #27
 5a0ebebc29fe5c40c68bea90197606c3a832b09f
 RIP: 0010:run_timer_softirq+0x221/0xfc0
 Code: 00 00 4c 89 e1 4c 8b 30 48 c1 e9 03 80 3c 29 00 0f 85 ed 0b 00 00
 4d 89 34 24 4d 85 f6 74 19 49 8d 7e 08 48 89 f9 48 c1 e9 03 <80> 3c 29 00
 0f 85 fa 0b 00 00 4d 89 66 08 83 04 24 01 49 89 d4 48
 RSP: 0018:ffff888119009e50 EFLAGS: 00010806
 RAX: ffff8881191f8a80 RBX: 00000000007ffe2a RCX: 1ffffffff80f50db
 RDX: ffff888119009ed0 RSI: 0000000000000008 RDI: ffffffffc07a86d8
 RBP: dffffc0000000000 R08: ffff8881191f8280 R09: ffffed102323f061
 R10: ffff8881191f8307 R11: ffffed102323f060 R12: ffff888119009ec8
 R13: 00000000000000c0 R14: ffffffffc07a86d0 R15: ffff8881191f82e8
 FS:  0000000000000000(0000) GS:ffff888119000000(0000)
 knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: fffffbfff80f50db CR3: 00000001062dc002 CR4: 00000000003706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <IRQ>
  ? add_timer+0x650/0x650
  ? kvm_clock_read+0x14/0x30
  ? ktime_get+0xb9/0x180
  ? rcu_read_lock_held_common+0xe/0xa0
  ? rcu_read_lock_sched_held+0x56/0xc0
  ? rcu_read_lock_bh_held+0xa0/0xa0
  ? hrtimer_interrupt+0x271/0x790
  __do_softirq+0x1d0/0x88f
  irq_exit_rcu+0xe7/0x120
  sysvec_apic_timer_interrupt+0x8a/0xb0
  </IRQ>
  <TASK>
[ ... ]

Fixes: bc54e49 ("amt: add multicast(IGMP) report message handler")
Signed-off-by: Taehee Yoo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Mar 18, 2022
high_memory used to be initialized in mem_init, way after setup_bootmem.
But a call to dma_contiguous_reserve in this function gives rise to the
below warning because high_memory is equal to 0 and is used at the very
beginning at cma_declare_contiguous_nid.

It went unnoticed since the move of the kasan region redefined
KERN_VIRT_SIZE so that it does not encompass -1 anymore.

Fix this by initializing high_memory in setup_bootmem.

------------[ cut here ]------------
virt_to_phys used for non-linear address: ffffffffffffffff (0xffffffffffffffff)
WARNING: CPU: 0 PID: 0 at arch/riscv/mm/physaddr.c:14 __virt_to_phys+0xac/0x1b8
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-rc1-00007-ga68b89289e26 #27
Hardware name: riscv-virtio,qemu (DT)
epc : __virt_to_phys+0xac/0x1b8
 ra : __virt_to_phys+0xac/0x1b8
epc : ffffffff80014922 ra : ffffffff80014922 sp : ffffffff84a03c30
 gp : ffffffff85866c80 tp : ffffffff84a3f180 t0 : ffffffff86bce657
 t1 : fffffffef09406e8 t2 : 0000000000000000 s0 : ffffffff84a03c70
 s1 : ffffffffffffffff a0 : 000000000000004f a1 : 00000000000f0000
 a2 : 0000000000000002 a3 : ffffffff8011f408 a4 : 0000000000000000
 a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffff84a03747
 s2 : ffffffd800000000 s3 : ffffffff86ef4000 s4 : ffffffff8467f828
 s5 : fffffff800000000 s6 : 8000000000006800 s7 : 0000000000000000
 s8 : 0000000480000000 s9 : 0000000080038ea0 s10: 0000000000000000
 s11: ffffffffffffffff t3 : ffffffff84a035c0 t4 : fffffffef09406e8
 t5 : fffffffef09406e9 t6 : ffffffff84a03758
status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003
[<ffffffff8322ef4c>] cma_declare_contiguous_nid+0xf2/0x64a
[<ffffffff83212a58>] dma_contiguous_reserve_area+0x46/0xb4
[<ffffffff83212c3a>] dma_contiguous_reserve+0x174/0x18e
[<ffffffff83208fc2>] paging_init+0x12c/0x35e
[<ffffffff83206bd2>] setup_arch+0x120/0x74e
[<ffffffff83201416>] start_kernel+0xce/0x68c
irq event stamp: 0
hardirqs last  enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<0000000000000000>] 0x0
softirqs last  enabled at (0): [<0000000000000000>] 0x0
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace 0000000000000000 ]---

Fixes: f7ae023 ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <[email protected]>
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Apr 9, 2022
when get fiemap starting from MAX_LFS_FILESIZE, (maxbytes - *len) < start
will always true , then *len set zero. because of start offset is beyond
file size, for erofs filesystem it will always return iomap.length with
zero,iomap iterate will enter infinite loop. it is necessary cover this
corner case to avoid this situation.

------------[ cut here ]------------
WARNING: CPU: 7 PID: 905 at fs/iomap/iter.c:35 iomap_iter+0x97f/0xc70
Modules linked in: xfs erofs
CPU: 7 PID: 905 Comm: iomap Tainted: G        W         5.17.0-rc8 #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:iomap_iter+0x97f/0xc70
Code: 85 a1 fc ff ff e8 71 be 9c ff 0f 1f 44 00 00 e9 92 fc ff ff e8 62 be 9c ff 0f 0b b8 fb ff ff ff e9 fc f8 ff ff e8 51 be 9c ff <0f> 0b e9 2b fc ff ff e8 45 be 9c ff 0f 0b e9 e1 fb ff ff e8 39 be
RSP: 0018:ffff888060a37ab0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888060a37bb0 RCX: 0000000000000000
RDX: ffff88807e19a900 RSI: ffffffff81a7da7f RDI: ffff888060a37be0
RBP: 7fffffffffffffff R08: 0000000000000000 R09: ffff888060a37c20
R10: ffff888060a37c67 R11: ffffed100c146f8c R12: 7fffffffffffffff
R13: 0000000000000000 R14: ffff888060a37bd8 R15: ffff888060a37c20
FS:  00007fd3cca01540(0000) GS:ffff888108780000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020010820 CR3: 0000000054b92000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 iomap_fiemap+0x1c9/0x2f0
 erofs_fiemap+0x64/0x90 [erofs]
 do_vfs_ioctl+0x40d/0x12e0
 __x64_sys_ioctl+0xaa/0x1c0
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
 </TASK>
---[ end trace 0000000000000000 ]---
watchdog: BUG: soft lockup - CPU#7 stuck for 26s! [iomap:905]

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Guo Xuenan <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
[djwong: fix some typos]
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Apr 27, 2022
+new file mode 100644
+WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
+#27: FILE: Documentation/virt/kvm/x86/errata.rst:1:

Opportunistically update all other non-added KVM documents and
remove a new extra blank line at EOF for x86/errata.rst.

Signed-off-by: Like Xu <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Apr 27, 2022
Given a sufficiently large number of actions, while copying and
reserving memory for a new action of a new flow, if next_offset is
greater than MAX_ACTIONS_BUFSIZE, the function reserve_sfa_size() does
not return -EMSGSIZE as expected, but it allocates MAX_ACTIONS_BUFSIZE
bytes increasing actions_len by req_size. This can then lead to an OOB
write access, especially when further actions need to be copied.

Fix it by rearranging the flow action size check.

KASAN splat below:

==================================================================
BUG: KASAN: slab-out-of-bounds in reserve_sfa_size+0x1ba/0x380 [openvswitch]
Write of size 65360 at addr ffff888147e4001c by task handler15/836

CPU: 1 PID: 836 Comm: handler15 Not tainted 5.18.0-rc1+ #27
...
Call Trace:
 <TASK>
 dump_stack_lvl+0x45/0x5a
 print_report.cold+0x5e/0x5db
 ? __lock_text_start+0x8/0x8
 ? reserve_sfa_size+0x1ba/0x380 [openvswitch]
 kasan_report+0xb5/0x130
 ? reserve_sfa_size+0x1ba/0x380 [openvswitch]
 kasan_check_range+0xf5/0x1d0
 memcpy+0x39/0x60
 reserve_sfa_size+0x1ba/0x380 [openvswitch]
 __add_action+0x24/0x120 [openvswitch]
 ovs_nla_add_action+0xe/0x20 [openvswitch]
 ovs_ct_copy_action+0x29d/0x1130 [openvswitch]
 ? __kernel_text_address+0xe/0x30
 ? unwind_get_return_address+0x56/0xa0
 ? create_prof_cpu_mask+0x20/0x20
 ? ovs_ct_verify+0xf0/0xf0 [openvswitch]
 ? prep_compound_page+0x198/0x2a0
 ? __kasan_check_byte+0x10/0x40
 ? kasan_unpoison+0x40/0x70
 ? ksize+0x44/0x60
 ? reserve_sfa_size+0x75/0x380 [openvswitch]
 __ovs_nla_copy_actions+0xc26/0x2070 [openvswitch]
 ? __zone_watermark_ok+0x420/0x420
 ? validate_set.constprop.0+0xc90/0xc90 [openvswitch]
 ? __alloc_pages+0x1a9/0x3e0
 ? __alloc_pages_slowpath.constprop.0+0x1da0/0x1da0
 ? unwind_next_frame+0x991/0x1e40
 ? __mod_node_page_state+0x99/0x120
 ? __mod_lruvec_page_state+0x2e3/0x470
 ? __kasan_kmalloc_large+0x90/0xe0
 ovs_nla_copy_actions+0x1b4/0x2c0 [openvswitch]
 ovs_flow_cmd_new+0x3cd/0xb10 [openvswitch]
 ...

Cc: [email protected]
Fixes: f28cd2a ("openvswitch: fix flow actions reallocation")
Signed-off-by: Paolo Valerio <[email protected]>
Acked-by: Eelco Chaudron <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Mar 7, 2023
Add instruction dump (Code:) output to RISC-V splats. Dump 16b
parcels.

An example:
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
  Oops [#1]
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc3-00302-g840ff44c571d-dirty #27
  Hardware name: riscv-virtio,qemu (DT)
  epc : kernel_init+0xc8/0x10e
   ra : kernel_init+0x70/0x10e
  epc : ffffffff80bd9a40 ra : ffffffff80bd99e8 sp : ff2000000060bec0
   gp : ffffffff81730b28 tp : ff6000007ff00000 t0 : 7974697275636573
   t1 : 0000000000000000 t2 : 3030303270393d6e s0 : ff2000000060bee0
   s1 : ffffffff81732028 a0 : 0000000000000000 a1 : ff60000080dd1780
   a2 : 0000000000000002 a3 : ffffffff8176a470 a4 : 0000000000000000
   a5 : 000000000000000a a6 : 0000000000000081 a7 : ff60000080dd1780
   s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
   s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
   s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
   s11: 0000000000000000 t3 : ffffffff81186018 t4 : 0000000000000022
   t5 : 000000000000003d t6 : 0000000000000000
  status: 0000000200000120 badaddr: 0000000000000000 cause: 000000000000000f
  [<ffffffff80003528>] ret_from_exception+0x0/0x16
  Code: 862a d179 608c a517 0069 0513 2be5 d0ef db2e 47a9 (c11c) a517
  ---[ end trace 0000000000000000 ]---
  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
  SMP: stopping secondary CPUs
  ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Signed-off-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request May 30, 2023
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Sep 21, 2023
The following processes run into a deadlock. CPU 41 was waiting for CPU 29
to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29
was hung by that spinlock with IRQs disabled.

  PID: 17360    TASK: ffff95c1090c5c40  CPU: 41  COMMAND: "mrdiagd"
  !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0
  !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0
  !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0
   # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0
   # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0
   # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0
   # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0
   # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0
   # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0
   # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0
   #10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0
   #11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0
   #12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0
   #13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0
   #14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0
   #15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0
   #16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0
   #17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0
   #18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0
   #19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0
   #20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0
   #21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0
   #22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0
   #23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0
   #24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0
   #25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0
   #26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0
   #27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0

  PID: 17355    TASK: ffff95c1090c3d80  CPU: 29  COMMAND: "mrdiagd"
  !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0
  !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0
   # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0
   # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0
   # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0
   # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0
   # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0
   # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0
   # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0
   # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0
   #10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0
   #11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0
   #12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0
   #13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0
   #14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0
   #15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0
   #16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0
   #17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0

The lock is used to synchronize different sysfs operations, it doesn't
protect any resource that will be touched by an interrupt. Consequently
it's not required to disable IRQs. Replace the spinlock with a mutex to fix
the deadlock.

Signed-off-by: Junxiao Bi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mike Christie <[email protected]>
Cc: [email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Mar 13, 2024
It appears the client object tree has no locking unless I've missed
something else. Fix races around adding/removing client objects,
mostly vram bar mappings.

 4562.099306] general protection fault, probably for non-canonical address 0x6677ed422bceb80c: 0000 [#1] PREEMPT SMP PTI
[ 4562.099314] CPU: 2 PID: 23171 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27
[ 4562.099324] Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
[ 4562.099330] RIP: 0010:nvkm_object_search+0x1d/0x70 [nouveau]
[ 4562.099503] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 48 89 f8 48 85 f6 74 39 48 8b 87 a0 00 00 00 48 85 c0 74 12 <48> 8b 48 f8 48 39 ce 73 15 48 8b 40 10 48 85 c0 75 ee 48 c7 c0 fe
[ 4562.099506] RSP: 0000:ffffa94cc420bbf8 EFLAGS: 00010206
[ 4562.099512] RAX: 6677ed422bceb814 RBX: ffff98108791f400 RCX: ffff9810f26b8f58
[ 4562.099517] RDX: 0000000000000000 RSI: ffff9810f26b9158 RDI: ffff98108791f400
[ 4562.099519] RBP: ffff9810f26b9158 R08: 0000000000000000 R09: 0000000000000000
[ 4562.099521] R10: ffffa94cc420bc48 R11: 0000000000000001 R12: ffff9810f02a7cc0
[ 4562.099526] R13: 0000000000000000 R14: 00000000000000ff R15: 0000000000000007
[ 4562.099528] FS:  00007f629c5017c0(0000) GS:ffff98142c700000(0000) knlGS:0000000000000000
[ 4562.099534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4562.099536] CR2: 00007f629a882000 CR3: 000000017019e004 CR4: 00000000003706f0
[ 4562.099541] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4562.099542] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4562.099544] Call Trace:
[ 4562.099555]  <TASK>
[ 4562.099573]  ? die_addr+0x36/0x90
[ 4562.099583]  ? exc_general_protection+0x246/0x4a0
[ 4562.099593]  ? asm_exc_general_protection+0x26/0x30
[ 4562.099600]  ? nvkm_object_search+0x1d/0x70 [nouveau]
[ 4562.099730]  nvkm_ioctl+0xa1/0x250 [nouveau]
[ 4562.099861]  nvif_object_map_handle+0xc8/0x180 [nouveau]
[ 4562.099986]  nouveau_ttm_io_mem_reserve+0x122/0x270 [nouveau]
[ 4562.100156]  ? dma_resv_test_signaled+0x26/0xb0
[ 4562.100163]  ttm_bo_vm_fault_reserved+0x97/0x3c0 [ttm]
[ 4562.100182]  ? __mutex_unlock_slowpath+0x2a/0x270
[ 4562.100189]  nouveau_ttm_fault+0x69/0xb0 [nouveau]
[ 4562.100356]  __do_fault+0x32/0x150
[ 4562.100362]  do_fault+0x7c/0x560
[ 4562.100369]  __handle_mm_fault+0x800/0xc10
[ 4562.100382]  handle_mm_fault+0x17c/0x3e0
[ 4562.100388]  do_user_addr_fault+0x208/0x860
[ 4562.100395]  exc_page_fault+0x7f/0x200
[ 4562.100402]  asm_exc_page_fault+0x26/0x30
[ 4562.100412] RIP: 0033:0x9b9870
[ 4562.100419] Code: 85 a8 f7 ff ff 8b 8d 80 f7 ff ff 89 08 e9 18 f2 ff ff 0f 1f 84 00 00 00 00 00 44 89 32 e9 90 fa ff ff 0f 1f 84 00 00 00 00 00 <44> 89 32 e9 f8 f1 ff ff 0f 1f 84 00 00 00 00 00 66 44 89 32 e9 e7
[ 4562.100422] RSP: 002b:00007fff9ba2dc70 EFLAGS: 00010246
[ 4562.100426] RAX: 0000000000000004 RBX: 000000000dd65e10 RCX: 000000fff0000000
[ 4562.100428] RDX: 00007f629a882000 RSI: 00007f629a882000 RDI: 0000000000000066
[ 4562.100432] RBP: 00007fff9ba2e570 R08: 0000000000000000 R09: 0000000123ddf000
[ 4562.100434] R10: 0000000000000001 R11: 0000000000000246 R12: 000000007fffffff
[ 4562.100436] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 4562.100446]  </TASK>
[ 4562.100448] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink cmac bnep sunrpc iwlmvm intel_rapl_msr intel_rapl_common snd_sof_pci_intel_cnl x86_pkg_temp_thermal intel_powerclamp snd_sof_intel_hda_common mac80211 coretemp snd_soc_acpi_intel_match kvm_intel snd_soc_acpi snd_soc_hdac_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof kvm snd_sof_utils snd_soc_core snd_hda_codec_realtek libarc4 snd_hda_codec_generic snd_compress snd_hda_ext_core vfat fat snd_hda_intel snd_intel_dspcfg irqbypass iwlwifi snd_hda_codec snd_hwdep snd_hda_core btusb btrtl mei_hdcp iTCO_wdt rapl mei_pxp btintel snd_seq iTCO_vendor_support btbcm snd_seq_device intel_cstate bluetooth snd_pcm cfg80211 intel_wmi_thunderbolt wmi_bmof intel_uncore snd_timer mei_me snd ecdh_generic i2c_i801
[ 4562.100541]  ecc mei i2c_smbus soundcore rfkill intel_pch_thermal acpi_pad zram nouveau drm_ttm_helper ttm gpu_sched i2c_algo_bit drm_gpuvm drm_exec mxm_wmi drm_display_helper drm_kms_helper drm crct10dif_pclmul crc32_pclmul nvme e1000e crc32c_intel nvme_core ghash_clmulni_intel video wmi pinctrl_cannonlake ip6_tables ip_tables fuse
[ 4562.100616] ---[ end trace 0000000000000000 ]---

Signed-off-by: Dave Airlie <[email protected]>
Cc: [email protected]
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Apr 26, 2024
Running a lot of VK CTS in parallel against nouveau, once every
few hours you might see something like this crash.

BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 8000000114e6e067 P4D 8000000114e6e067 PUD 109046067 PMD 0
Oops: 0000 [kernel-patches#1] PREEMPT SMP PTI
CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ kernel-patches#27
Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1
RSP: 0000:ffffac20c5857838 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000004d8001 RCX: 0000000000000001
RDX: 00000000004d8001 RSI: 00000000000006d8 RDI: ffffa07afe332180
RBP: 00000000000006d8 R08: ffffac20c5857ad0 R09: 0000000000ffff10
R10: 0000000000000001 R11: ffffa07af27e2de0 R12: 000000000000001c
R13: ffffac20c5857ad0 R14: ffffa07a96fe9040 R15: 000000000000001c
FS:  00007fe395eed7c0(0000) GS:ffffa07e2c980000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000011febe001 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:

...

 ? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
 ? gp100_vmm_pgt_mem+0x37/0x180 [nouveau]
 nvkm_vmm_iter+0x351/0xa20 [nouveau]
 ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 ? __lock_acquire+0x3ed/0x2170
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau]
 ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 nvkm_vmm_map_locked+0x224/0x3a0 [nouveau]

Adding any sort of useful debug usually makes it go away, so I hand
wrote the function in a line, and debugged the asm.

Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in
the nv50_instobj_acquire called from nvkm_kmap.

If Thread A and Thread B both get to nv50_instobj_acquire around
the same time, and Thread A hits the refcount_set line, and in
lockstep thread B succeeds at refcount_inc_not_zero, there is a
chance the ptrs value won't have been stored since refcount_set
is unordered. Force a memory barrier here, I picked smp_mb, since
we want it on all CPUs and it's write followed by a read.

v2: use paired smp_rmb/smp_wmb.

Cc: <[email protected]>
Fixes: be55287 ("drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj")
Signed-off-by: Dave Airlie <[email protected]>
Signed-off-by: Danilo Krummrich <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request May 23, 2024
When request_irq() fails, error path calls vp_del_vqs(). There, as vq is
present in the list, free_irq() is called for the same vector. That
causes following splat:

[    0.414355] Trying to free already-free IRQ 27
[    0.414403] WARNING: CPU: 1 PID: 1 at kernel/irq/manage.c:1899 free_irq+0x1a1/0x2d0
[    0.414510] Modules linked in:
[    0.414540] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc4+ kernel-patches#27
[    0.414540] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
[    0.414540] RIP: 0010:free_irq+0x1a1/0x2d0
[    0.414540] Code: 1e 00 48 83 c4 08 48 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 90 8b 74 24 04 48 c7 c7 98 80 6c b1 e8 00 c9 f7 ff 90 <0f> 0b 90 90 48 89 ee 4c 89 ef e8 e0 20 b8 00 49 8b 47 40 48 8b 40
[    0.414540] RSP: 0000:ffffb71480013ae0 EFLAGS: 00010086
[    0.414540] RAX: 0000000000000000 RBX: ffffa099c2722000 RCX: 0000000000000000
[    0.414540] RDX: 0000000000000000 RSI: ffffb71480013998 RDI: 0000000000000001
[    0.414540] RBP: 0000000000000246 R08: 00000000ffffdfff R09: 0000000000000001
[    0.414540] R10: 00000000ffffdfff R11: ffffffffb18729c0 R12: ffffa099c1c91760
[    0.414540] R13: ffffa099c1c916a4 R14: ffffa099c1d2f200 R15: ffffa099c1c91600
[    0.414540] FS:  0000000000000000(0000) GS:ffffa099fec40000(0000) knlGS:0000000000000000
[    0.414540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.414540] CR2: 0000000000000000 CR3: 0000000008e3e001 CR4: 0000000000370ef0
[    0.414540] Call Trace:
[    0.414540]  <TASK>
[    0.414540]  ? __warn+0x80/0x120
[    0.414540]  ? free_irq+0x1a1/0x2d0
[    0.414540]  ? report_bug+0x164/0x190
[    0.414540]  ? handle_bug+0x3b/0x70
[    0.414540]  ? exc_invalid_op+0x17/0x70
[    0.414540]  ? asm_exc_invalid_op+0x1a/0x20
[    0.414540]  ? free_irq+0x1a1/0x2d0
[    0.414540]  vp_del_vqs+0xc1/0x220
[    0.414540]  vp_find_vqs_msix+0x305/0x470
[    0.414540]  vp_find_vqs+0x3e/0x1a0
[    0.414540]  vp_modern_find_vqs+0x1b/0x70
[    0.414540]  init_vqs+0x387/0x600
[    0.414540]  virtnet_probe+0x50a/0xc80
[    0.414540]  virtio_dev_probe+0x1e0/0x2b0
[    0.414540]  really_probe+0xc0/0x2c0
[    0.414540]  ? __pfx___driver_attach+0x10/0x10
[    0.414540]  __driver_probe_device+0x73/0x120
[    0.414540]  driver_probe_device+0x1f/0xe0
[    0.414540]  __driver_attach+0x88/0x180
[    0.414540]  bus_for_each_dev+0x85/0xd0
[    0.414540]  bus_add_driver+0xec/0x1f0
[    0.414540]  driver_register+0x59/0x100
[    0.414540]  ? __pfx_virtio_net_driver_init+0x10/0x10
[    0.414540]  virtio_net_driver_init+0x90/0xb0
[    0.414540]  do_one_initcall+0x58/0x230
[    0.414540]  kernel_init_freeable+0x1a3/0x2d0
[    0.414540]  ? __pfx_kernel_init+0x10/0x10
[    0.414540]  kernel_init+0x1a/0x1c0
[    0.414540]  ret_from_fork+0x31/0x50
[    0.414540]  ? __pfx_kernel_init+0x10/0x10
[    0.414540]  ret_from_fork_asm+0x1a/0x30
[    0.414540]  </TASK>

Fix this by calling deleting the current vq when request_irq() fails.

Fixes: 0b0f9dc ("Revert "virtio_pci: use shared interrupts for virtqueues"")
Signed-off-by: Jiri Pirko <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jul 26, 2024
When cpu_map has been redirected, first the pointer to the
bpf_cpu_map_entry has been copied, then freed, and read from the copy.
To fix it, this commit introduced the refcount cpu_map_parent during
redirections to prevent use after free.

syzbot reported:

[   61.581464][T11670] ==================================================================
[   61.583323][T11670] BUG: KASAN: slab-use-after-free in cpu_map_enqueue+0xba/0x370
[   61.585419][T11670] Read of size 8 at addr ffff888122d75208 by task syzbot-repro/11670
[   61.587541][T11670]
[   61.588237][T11670] CPU: 1 PID: 11670 Comm: syzbot-repro Not tainted 6.9.0-rc6-00053-g0106679839f7 #27
[   61.590542][T11670] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[   61.592798][T11670] Call Trace:
[   61.593885][T11670]  <TASK>
[   61.594805][T11670]  dump_stack_lvl+0x241/0x360
[   61.595974][T11670]  ? tcp_gro_dev_warn+0x260/0x260
[   61.598242][T11670]  ? __wake_up_klogd+0xcc/0x100
[   61.599407][T11670]  ? panic+0x850/0x850
[   61.600516][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.602073][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.603496][T11670]  print_address_description+0x7b/0x360
[   61.605170][T11670]  print_report+0xfd/0x210
[   61.606370][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.607925][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.609577][T11670]  ? __virt_addr_valid+0x43d/0x510
[   61.610948][T11670]  ? __phys_addr+0xb9/0x170
[   61.612103][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.613448][T11670]  kasan_report+0x143/0x180
[   61.615000][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.616181][T11670]  cpu_map_enqueue+0xba/0x370
[   61.617620][T11670]  xdp_do_redirect+0x685/0xbf0
[   61.618787][T11670]  tun_xdp_act+0xe7/0x9e0
[   61.619856][T11670]  ? __tun_build_skb+0x2e0/0x2e0
[   61.621356][T11670]  tun_build_skb+0xac6/0x1140
[   61.622602][T11670]  ? tun_build_skb+0xb4/0x1140
[   61.623880][T11670]  ? tun_get_user+0x2760/0x2760
[   61.625341][T11670]  tun_get_user+0x7fa/0x2760
[   61.626532][T11670]  ? rcu_read_unlock+0xa0/0xa0
[   61.627725][T11670]  ? tun_get+0x1e/0x2f0
[   61.629147][T11670]  ? tun_get+0x1e/0x2f0
[   61.630265][T11670]  ? tun_get+0x27d/0x2f0
[   61.631486][T11670]  tun_chr_write_iter+0x111/0x1f0
[   61.632855][T11670]  vfs_write+0xa84/0xcb0
[   61.634185][T11670]  ? __lock_acquire+0x1f60/0x1f60
[   61.635501][T11670]  ? kernel_write+0x330/0x330
[   61.636757][T11670]  ? lockdep_hardirqs_on_prepare+0x43c/0x780
[   61.638445][T11670]  ? __fget_files+0x3ea/0x460
[   61.639448][T11670]  ? seqcount_lockdep_reader_access+0x157/0x220
[   61.641217][T11670]  ? __fdget_pos+0x19e/0x320
[   61.642426][T11670]  ksys_write+0x19f/0x2c0
[   61.643576][T11670]  ? __ia32_sys_read+0x90/0x90
[   61.644841][T11670]  ? ktime_get_coarse_real_ts64+0x10b/0x120
[   61.646549][T11670]  do_syscall_64+0xec/0x210
[   61.647832][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.649485][T11670] RIP: 0033:0x472a4f
[   61.650539][T11670] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 d8 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c d9 02 00 48
[   61.655476][T11670] RSP: 002b:00007f7a7a90f5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   61.657675][T11670] RAX: ffffffffffffffda RBX: 00007f7a7a911640 RCX: 0000000000472a4f
[   61.659658][T11670] RDX: 0000000000000066 RSI: 0000000020000440 RDI: 00000000000000c8
[   61.661980][T11670] RBP: 00007f7a7a90f620 R08: 0000000000000000 R09: 0000000100000000
[   61.663982][T11670] R10: 0000000100000000 R11: 0000000000000293 R12: 00007f7a7a911640
[   61.666425][T11670] R13: 000000000000006e R14: 000000000042f2f0 R15: 00007f7a7a8f1000
[   61.668443][T11670]  </TASK>
[   61.669233][T11670]
[   61.669754][T11670] Allocated by task 11643:
[   61.670855][T11670]  kasan_save_track+0x3f/0x70
[   61.672094][T11670]  __kasan_kmalloc+0x98/0xb0
[   61.673466][T11670]  __kmalloc_node+0x259/0x4f0
[   61.674687][T11670]  bpf_map_kmalloc_node+0xd3/0x1c0
[   61.676069][T11670]  cpu_map_update_elem+0x2f0/0x1000
[   61.677619][T11670]  bpf_map_update_value+0x1b2/0x540
[   61.679006][T11670]  map_update_elem+0x52f/0x6e0
[   61.680076][T11670]  __sys_bpf+0x7a9/0x850
[   61.681610][T11670]  __x64_sys_bpf+0x7c/0x90
[   61.682772][T11670]  do_syscall_64+0xec/0x210
[   61.683967][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.685648][T11670]
[   61.686282][T11670] Freed by task 1064:
[   61.687296][T11670]  kasan_save_track+0x3f/0x70
[   61.688498][T11670]  kasan_save_free_info+0x40/0x50
[   61.689786][T11670]  poison_slab_object+0xa6/0xe0
[   61.691059][T11670]  __kasan_slab_free+0x37/0x60
[   61.692336][T11670]  kfree+0x136/0x2f0
[   61.693549][T11670]  __cpu_map_entry_free+0x6f3/0x770
[   61.695004][T11670]  cpu_map_free+0xc0/0x180
[   61.696191][T11670]  bpf_map_free_deferred+0xe3/0x100
[   61.697703][T11670]  process_scheduled_works+0x9cb/0x14a0
[   61.699330][T11670]  worker_thread+0x85c/0xd50
[   61.700546][T11670]  kthread+0x2ef/0x390
[   61.701791][T11670]  ret_from_fork+0x4d/0x80
[   61.702942][T11670]  ret_from_fork_asm+0x11/0x20
[   61.704195][T11670]
[   61.704825][T11670] The buggy address belongs to the object at ffff888122d75200
[   61.704825][T11670]  which belongs to the cache kmalloc-cg-256 of size 256
[   61.708516][T11670] The buggy address is located 8 bytes inside of
[   61.708516][T11670]  freed 256-byte region [ffff888122d75200, ffff888122d75300)
[   61.712215][T11670]
[   61.712824][T11670] The buggy address belongs to the physical page:
[   61.714883][T11670] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x122d74
[   61.717300][T11670] head: order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[   61.719037][T11670] memcg:ffff888120d85f01
[   61.720006][T11670] flags: 0x17ff00000000840(slab|head|node=0|zone=2|lastcpupid=0x7ff)
[   61.722181][T11670] page_type: 0xffffffff()
[   61.723318][T11670] raw: 017ff00000000840 ffff88810004dcc0 dead000000000122 0000000000000000
[   61.725650][T11670] raw: 0000000000000000 0000000080100010 00000001ffffffff ffff888120d85f01
[   61.727943][T11670] head: 017ff00000000840 ffff88810004dcc0 dead000000000122 0000000000000000
[   61.730237][T11670] head: 0000000000000000 0000000080100010 00000001ffffffff ffff888120d85f01
[   61.732671][T11670] head: 017ff00000000001 ffffea00048b5d01 dead000000000122 00000000ffffffff
[   61.735029][T11670] head: 0000000200000000 0000000000000000 00000000ffffffff 0000000000000000
[   61.737400][T11670] page dumped because: kasan: bad access detected
[   61.740100][T11670] page_owner tracks the page as allocated
[   61.743121][T11670] page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 8343, tgid -2092279795 (syzbot-repro), ts 8343, free_ts 43505720198
[   61.754038][T11670]  post_alloc_hook+0x1e6/0x210
[   61.756046][T11670]  get_page_from_freelist+0x7d2/0x850
[   61.759460][T11670]  __alloc_pages+0x25e/0x580
[   61.761428][T11670]  alloc_slab_page+0x6b/0x1a0
[   61.764199][T11670]  allocate_slab+0x5d/0x200
[   61.766122][T11670]  ___slab_alloc+0xac5/0xf20
[   61.767195][T11670]  __kmalloc+0x2e0/0x4b0
[   61.769028][T11670]  fib_default_rule_add+0x4a/0x350
[   61.770394][T11670]  fib6_rules_net_init+0x42/0x100
[   61.771731][T11670]  ops_init+0x39d/0x670
[   61.773061][T11670]  setup_net+0x3bc/0xae0
[   61.774102][T11670]  copy_net_ns+0x399/0x5e0
[   61.775628][T11670]  create_new_namespaces+0x4de/0x8d0
[   61.776950][T11670]  unshare_nsproxy_namespaces+0x127/0x190
[   61.778352][T11670]  ksys_unshare+0x5e6/0xbf0
[   61.779741][T11670]  __x64_sys_unshare+0x38/0x40
[   61.781302][T11670] page last free pid 4619 tgid 4619 stack trace:
[   61.783542][T11670]  free_unref_page_prepare+0x72f/0x7c0
[   61.785018][T11670]  free_unref_page+0x37/0x3f0
[   61.786030][T11670]  __slab_free+0x351/0x3f0
[   61.786991][T11670]  qlist_free_all+0x60/0xd0
[   61.788827][T11670]  kasan_quarantine_reduce+0x15a/0x170
[   61.789951][T11670]  __kasan_slab_alloc+0x23/0x70
[   61.790999][T11670]  kmem_cache_alloc_node+0x193/0x390
[   61.792331][T11670]  kmalloc_reserve+0xa7/0x2a0
[   61.793345][T11670]  __alloc_skb+0x1ec/0x430
[   61.794435][T11670]  netlink_sendmsg+0x615/0xc80
[   61.796439][T11670]  __sock_sendmsg+0x21f/0x270
[   61.797467][T11670]  ____sys_sendmsg+0x540/0x860
[   61.798505][T11670]  __sys_sendmsg+0x2b7/0x3a0
[   61.799512][T11670]  do_syscall_64+0xec/0x210
[   61.800674][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.802021][T11670]
[   61.802526][T11670] Memory state around the buggy address:
[   61.803701][T11670]  ffff888122d75100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.805694][T11670]  ffff888122d75180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.808104][T11670] >ffff888122d75200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   61.809769][T11670]                       ^
[   61.810672][T11670]  ffff888122d75280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   61.812532][T11670]  ffff888122d75300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.814846][T11670] ==================================================================
[   61.816914][T11670] Kernel panic - not syncing: KASAN: panic_on_warn set ...
[   61.818415][T11670] CPU: 1 PID: 11670 Comm: syzbot-repro Not tainted 6.9.0-rc6-00053-g0106679839f7 #27
[   61.821191][T11670] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[   61.822911][T11670] Call Trace:
[   61.823632][T11670]  <TASK>
[   61.824525][T11670]  dump_stack_lvl+0x241/0x360
[   61.825545][T11670]  ? tcp_gro_dev_warn+0x260/0x260
[   61.826706][T11670]  ? panic+0x850/0x850
[   61.828594][T11670]  ? lock_release+0x85/0x860
[   61.829749][T11670]  ? vscnprintf+0x5d/0x80
[   61.830951][T11670]  panic+0x335/0x850
[   61.832316][T11670]  ? check_panic_on_warn+0x21/0xa0
[   61.834475][T11670]  ? __memcpy_flushcache+0x2c0/0x2c0
[   61.835809][T11670]  ? _raw_spin_unlock_irqrestore+0xd8/0x140
[   61.838063][T11670]  ? _raw_spin_unlock_irqrestore+0xdd/0x140
[   61.842056][T11670]  ? _raw_spin_unlock+0x40/0x40
[   61.843116][T11670]  ? print_report+0x1cc/0x210
[   61.844527][T11670]  check_panic_on_warn+0x82/0xa0
[   61.845336][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.846117][T11670]  end_report+0x48/0xa0
[   61.846790][T11670]  kasan_report+0x154/0x180
[   61.847520][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.848471][T11670]  cpu_map_enqueue+0xba/0x370
[   61.849968][T11670]  xdp_do_redirect+0x685/0xbf0
[   61.850994][T11670]  tun_xdp_act+0xe7/0x9e0
[   61.851703][T11670]  ? __tun_build_skb+0x2e0/0x2e0
[   61.852598][T11670]  tun_build_skb+0xac6/0x1140
[   61.853362][T11670]  ? tun_build_skb+0xb4/0x1140
[   61.854454][T11670]  ? tun_get_user+0x2760/0x2760
[   61.855806][T11670]  tun_get_user+0x7fa/0x2760
[   61.856734][T11670]  ? rcu_read_unlock+0xa0/0xa0
[   61.857502][T11670]  ? tun_get+0x1e/0x2f0
[   61.858171][T11670]  ? tun_get+0x1e/0x2f0
[   61.858952][T11670]  ? tun_get+0x27d/0x2f0
[   61.859637][T11670]  tun_chr_write_iter+0x111/0x1f0
[   61.860913][T11670]  vfs_write+0xa84/0xcb0
[   61.861578][T11670]  ? __lock_acquire+0x1f60/0x1f60
[   61.862376][T11670]  ? kernel_write+0x330/0x330
[   61.863221][T11670]  ? lockdep_hardirqs_on_prepare+0x43c/0x780
[   61.864230][T11670]  ? __fget_files+0x3ea/0x460
[   61.864955][T11670]  ? seqcount_lockdep_reader_access+0x157/0x220
[   61.866571][T11670]  ? __fdget_pos+0x19e/0x320
[   61.867414][T11670]  ksys_write+0x19f/0x2c0
[   61.868263][T11670]  ? __ia32_sys_read+0x90/0x90
[   61.868996][T11670]  ? ktime_get_coarse_real_ts64+0x10b/0x120
[   61.869896][T11670]  do_syscall_64+0xec/0x210
[   61.870592][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.871595][T11670] RIP: 0033:0x472a4f
[   61.873158][T11670] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 d8 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c d9 02 00 48
[   61.876447][T11670] RSP: 002b:00007f7a7a90f5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   61.877944][T11670] RAX: ffffffffffffffda RBX: 00007f7a7a911640 RCX: 0000000000472a4f
[   61.879751][T11670] RDX: 0000000000000066 RSI: 0000000020000440 RDI: 00000000000000c8
[   61.881100][T11670] RBP: 00007f7a7a90f620 R08: 0000000000000000 R09: 0000000100000000
[   61.882298][T11670] R10: 0000000100000000 R11: 0000000000000293 R12: 00007f7a7a911640
[   61.883501][T11670] R13: 000000000000006e R14: 000000000042f2f0 R15: 00007f7a7a8f1000
[   61.885999][T11670]  </TASK>

Signed-off-by: Radoslaw Zielonek <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jul 29, 2024
When cpu_map has been redirected, first the pointer to the
bpf_cpu_map_entry has been copied, then freed, and read from the copy.
To fix it, this commit introduced the refcount cpu_map_parent during
redirections to prevent use after free.

syzbot reported:

[   61.581464][T11670] ==================================================================
[   61.583323][T11670] BUG: KASAN: slab-use-after-free in cpu_map_enqueue+0xba/0x370
[   61.585419][T11670] Read of size 8 at addr ffff888122d75208 by task syzbot-repro/11670
[   61.587541][T11670]
[   61.588237][T11670] CPU: 1 PID: 11670 Comm: syzbot-repro Not tainted 6.9.0-rc6-00053-g0106679839f7 #27
[   61.590542][T11670] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[   61.592798][T11670] Call Trace:
[   61.593885][T11670]  <TASK>
[   61.594805][T11670]  dump_stack_lvl+0x241/0x360
[   61.595974][T11670]  ? tcp_gro_dev_warn+0x260/0x260
[   61.598242][T11670]  ? __wake_up_klogd+0xcc/0x100
[   61.599407][T11670]  ? panic+0x850/0x850
[   61.600516][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.602073][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.603496][T11670]  print_address_description+0x7b/0x360
[   61.605170][T11670]  print_report+0xfd/0x210
[   61.606370][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.607925][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.609577][T11670]  ? __virt_addr_valid+0x43d/0x510
[   61.610948][T11670]  ? __phys_addr+0xb9/0x170
[   61.612103][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.613448][T11670]  kasan_report+0x143/0x180
[   61.615000][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.616181][T11670]  cpu_map_enqueue+0xba/0x370
[   61.617620][T11670]  xdp_do_redirect+0x685/0xbf0
[   61.618787][T11670]  tun_xdp_act+0xe7/0x9e0
[   61.619856][T11670]  ? __tun_build_skb+0x2e0/0x2e0
[   61.621356][T11670]  tun_build_skb+0xac6/0x1140
[   61.622602][T11670]  ? tun_build_skb+0xb4/0x1140
[   61.623880][T11670]  ? tun_get_user+0x2760/0x2760
[   61.625341][T11670]  tun_get_user+0x7fa/0x2760
[   61.626532][T11670]  ? rcu_read_unlock+0xa0/0xa0
[   61.627725][T11670]  ? tun_get+0x1e/0x2f0
[   61.629147][T11670]  ? tun_get+0x1e/0x2f0
[   61.630265][T11670]  ? tun_get+0x27d/0x2f0
[   61.631486][T11670]  tun_chr_write_iter+0x111/0x1f0
[   61.632855][T11670]  vfs_write+0xa84/0xcb0
[   61.634185][T11670]  ? __lock_acquire+0x1f60/0x1f60
[   61.635501][T11670]  ? kernel_write+0x330/0x330
[   61.636757][T11670]  ? lockdep_hardirqs_on_prepare+0x43c/0x780
[   61.638445][T11670]  ? __fget_files+0x3ea/0x460
[   61.639448][T11670]  ? seqcount_lockdep_reader_access+0x157/0x220
[   61.641217][T11670]  ? __fdget_pos+0x19e/0x320
[   61.642426][T11670]  ksys_write+0x19f/0x2c0
[   61.643576][T11670]  ? __ia32_sys_read+0x90/0x90
[   61.644841][T11670]  ? ktime_get_coarse_real_ts64+0x10b/0x120
[   61.646549][T11670]  do_syscall_64+0xec/0x210
[   61.647832][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.649485][T11670] RIP: 0033:0x472a4f
[   61.650539][T11670] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 d8 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c d9 02 00 48
[   61.655476][T11670] RSP: 002b:00007f7a7a90f5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   61.657675][T11670] RAX: ffffffffffffffda RBX: 00007f7a7a911640 RCX: 0000000000472a4f
[   61.659658][T11670] RDX: 0000000000000066 RSI: 0000000020000440 RDI: 00000000000000c8
[   61.661980][T11670] RBP: 00007f7a7a90f620 R08: 0000000000000000 R09: 0000000100000000
[   61.663982][T11670] R10: 0000000100000000 R11: 0000000000000293 R12: 00007f7a7a911640
[   61.666425][T11670] R13: 000000000000006e R14: 000000000042f2f0 R15: 00007f7a7a8f1000
[   61.668443][T11670]  </TASK>
[   61.669233][T11670]
[   61.669754][T11670] Allocated by task 11643:
[   61.670855][T11670]  kasan_save_track+0x3f/0x70
[   61.672094][T11670]  __kasan_kmalloc+0x98/0xb0
[   61.673466][T11670]  __kmalloc_node+0x259/0x4f0
[   61.674687][T11670]  bpf_map_kmalloc_node+0xd3/0x1c0
[   61.676069][T11670]  cpu_map_update_elem+0x2f0/0x1000
[   61.677619][T11670]  bpf_map_update_value+0x1b2/0x540
[   61.679006][T11670]  map_update_elem+0x52f/0x6e0
[   61.680076][T11670]  __sys_bpf+0x7a9/0x850
[   61.681610][T11670]  __x64_sys_bpf+0x7c/0x90
[   61.682772][T11670]  do_syscall_64+0xec/0x210
[   61.683967][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.685648][T11670]
[   61.686282][T11670] Freed by task 1064:
[   61.687296][T11670]  kasan_save_track+0x3f/0x70
[   61.688498][T11670]  kasan_save_free_info+0x40/0x50
[   61.689786][T11670]  poison_slab_object+0xa6/0xe0
[   61.691059][T11670]  __kasan_slab_free+0x37/0x60
[   61.692336][T11670]  kfree+0x136/0x2f0
[   61.693549][T11670]  __cpu_map_entry_free+0x6f3/0x770
[   61.695004][T11670]  cpu_map_free+0xc0/0x180
[   61.696191][T11670]  bpf_map_free_deferred+0xe3/0x100
[   61.697703][T11670]  process_scheduled_works+0x9cb/0x14a0
[   61.699330][T11670]  worker_thread+0x85c/0xd50
[   61.700546][T11670]  kthread+0x2ef/0x390
[   61.701791][T11670]  ret_from_fork+0x4d/0x80
[   61.702942][T11670]  ret_from_fork_asm+0x11/0x20
[   61.704195][T11670]
[   61.704825][T11670] The buggy address belongs to the object at ffff888122d75200
[   61.704825][T11670]  which belongs to the cache kmalloc-cg-256 of size 256
[   61.708516][T11670] The buggy address is located 8 bytes inside of
[   61.708516][T11670]  freed 256-byte region [ffff888122d75200, ffff888122d75300)
[   61.712215][T11670]
[   61.712824][T11670] The buggy address belongs to the physical page:
[   61.714883][T11670] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x122d74
[   61.717300][T11670] head: order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[   61.719037][T11670] memcg:ffff888120d85f01
[   61.720006][T11670] flags: 0x17ff00000000840(slab|head|node=0|zone=2|lastcpupid=0x7ff)
[   61.722181][T11670] page_type: 0xffffffff()
[   61.723318][T11670] raw: 017ff00000000840 ffff88810004dcc0 dead000000000122 0000000000000000
[   61.725650][T11670] raw: 0000000000000000 0000000080100010 00000001ffffffff ffff888120d85f01
[   61.727943][T11670] head: 017ff00000000840 ffff88810004dcc0 dead000000000122 0000000000000000
[   61.730237][T11670] head: 0000000000000000 0000000080100010 00000001ffffffff ffff888120d85f01
[   61.732671][T11670] head: 017ff00000000001 ffffea00048b5d01 dead000000000122 00000000ffffffff
[   61.735029][T11670] head: 0000000200000000 0000000000000000 00000000ffffffff 0000000000000000
[   61.737400][T11670] page dumped because: kasan: bad access detected
[   61.740100][T11670] page_owner tracks the page as allocated
[   61.743121][T11670] page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 8343, tgid -2092279795 (syzbot-repro), ts 8343, free_ts 43505720198
[   61.754038][T11670]  post_alloc_hook+0x1e6/0x210
[   61.756046][T11670]  get_page_from_freelist+0x7d2/0x850
[   61.759460][T11670]  __alloc_pages+0x25e/0x580
[   61.761428][T11670]  alloc_slab_page+0x6b/0x1a0
[   61.764199][T11670]  allocate_slab+0x5d/0x200
[   61.766122][T11670]  ___slab_alloc+0xac5/0xf20
[   61.767195][T11670]  __kmalloc+0x2e0/0x4b0
[   61.769028][T11670]  fib_default_rule_add+0x4a/0x350
[   61.770394][T11670]  fib6_rules_net_init+0x42/0x100
[   61.771731][T11670]  ops_init+0x39d/0x670
[   61.773061][T11670]  setup_net+0x3bc/0xae0
[   61.774102][T11670]  copy_net_ns+0x399/0x5e0
[   61.775628][T11670]  create_new_namespaces+0x4de/0x8d0
[   61.776950][T11670]  unshare_nsproxy_namespaces+0x127/0x190
[   61.778352][T11670]  ksys_unshare+0x5e6/0xbf0
[   61.779741][T11670]  __x64_sys_unshare+0x38/0x40
[   61.781302][T11670] page last free pid 4619 tgid 4619 stack trace:
[   61.783542][T11670]  free_unref_page_prepare+0x72f/0x7c0
[   61.785018][T11670]  free_unref_page+0x37/0x3f0
[   61.786030][T11670]  __slab_free+0x351/0x3f0
[   61.786991][T11670]  qlist_free_all+0x60/0xd0
[   61.788827][T11670]  kasan_quarantine_reduce+0x15a/0x170
[   61.789951][T11670]  __kasan_slab_alloc+0x23/0x70
[   61.790999][T11670]  kmem_cache_alloc_node+0x193/0x390
[   61.792331][T11670]  kmalloc_reserve+0xa7/0x2a0
[   61.793345][T11670]  __alloc_skb+0x1ec/0x430
[   61.794435][T11670]  netlink_sendmsg+0x615/0xc80
[   61.796439][T11670]  __sock_sendmsg+0x21f/0x270
[   61.797467][T11670]  ____sys_sendmsg+0x540/0x860
[   61.798505][T11670]  __sys_sendmsg+0x2b7/0x3a0
[   61.799512][T11670]  do_syscall_64+0xec/0x210
[   61.800674][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.802021][T11670]
[   61.802526][T11670] Memory state around the buggy address:
[   61.803701][T11670]  ffff888122d75100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.805694][T11670]  ffff888122d75180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.808104][T11670] >ffff888122d75200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   61.809769][T11670]                       ^
[   61.810672][T11670]  ffff888122d75280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   61.812532][T11670]  ffff888122d75300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.814846][T11670] ==================================================================
[   61.816914][T11670] Kernel panic - not syncing: KASAN: panic_on_warn set ...
[   61.818415][T11670] CPU: 1 PID: 11670 Comm: syzbot-repro Not tainted 6.9.0-rc6-00053-g0106679839f7 #27
[   61.821191][T11670] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[   61.822911][T11670] Call Trace:
[   61.823632][T11670]  <TASK>
[   61.824525][T11670]  dump_stack_lvl+0x241/0x360
[   61.825545][T11670]  ? tcp_gro_dev_warn+0x260/0x260
[   61.826706][T11670]  ? panic+0x850/0x850
[   61.828594][T11670]  ? lock_release+0x85/0x860
[   61.829749][T11670]  ? vscnprintf+0x5d/0x80
[   61.830951][T11670]  panic+0x335/0x850
[   61.832316][T11670]  ? check_panic_on_warn+0x21/0xa0
[   61.834475][T11670]  ? __memcpy_flushcache+0x2c0/0x2c0
[   61.835809][T11670]  ? _raw_spin_unlock_irqrestore+0xd8/0x140
[   61.838063][T11670]  ? _raw_spin_unlock_irqrestore+0xdd/0x140
[   61.842056][T11670]  ? _raw_spin_unlock+0x40/0x40
[   61.843116][T11670]  ? print_report+0x1cc/0x210
[   61.844527][T11670]  check_panic_on_warn+0x82/0xa0
[   61.845336][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.846117][T11670]  end_report+0x48/0xa0
[   61.846790][T11670]  kasan_report+0x154/0x180
[   61.847520][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.848471][T11670]  cpu_map_enqueue+0xba/0x370
[   61.849968][T11670]  xdp_do_redirect+0x685/0xbf0
[   61.850994][T11670]  tun_xdp_act+0xe7/0x9e0
[   61.851703][T11670]  ? __tun_build_skb+0x2e0/0x2e0
[   61.852598][T11670]  tun_build_skb+0xac6/0x1140
[   61.853362][T11670]  ? tun_build_skb+0xb4/0x1140
[   61.854454][T11670]  ? tun_get_user+0x2760/0x2760
[   61.855806][T11670]  tun_get_user+0x7fa/0x2760
[   61.856734][T11670]  ? rcu_read_unlock+0xa0/0xa0
[   61.857502][T11670]  ? tun_get+0x1e/0x2f0
[   61.858171][T11670]  ? tun_get+0x1e/0x2f0
[   61.858952][T11670]  ? tun_get+0x27d/0x2f0
[   61.859637][T11670]  tun_chr_write_iter+0x111/0x1f0
[   61.860913][T11670]  vfs_write+0xa84/0xcb0
[   61.861578][T11670]  ? __lock_acquire+0x1f60/0x1f60
[   61.862376][T11670]  ? kernel_write+0x330/0x330
[   61.863221][T11670]  ? lockdep_hardirqs_on_prepare+0x43c/0x780
[   61.864230][T11670]  ? __fget_files+0x3ea/0x460
[   61.864955][T11670]  ? seqcount_lockdep_reader_access+0x157/0x220
[   61.866571][T11670]  ? __fdget_pos+0x19e/0x320
[   61.867414][T11670]  ksys_write+0x19f/0x2c0
[   61.868263][T11670]  ? __ia32_sys_read+0x90/0x90
[   61.868996][T11670]  ? ktime_get_coarse_real_ts64+0x10b/0x120
[   61.869896][T11670]  do_syscall_64+0xec/0x210
[   61.870592][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.871595][T11670] RIP: 0033:0x472a4f
[   61.873158][T11670] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 d8 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c d9 02 00 48
[   61.876447][T11670] RSP: 002b:00007f7a7a90f5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   61.877944][T11670] RAX: ffffffffffffffda RBX: 00007f7a7a911640 RCX: 0000000000472a4f
[   61.879751][T11670] RDX: 0000000000000066 RSI: 0000000020000440 RDI: 00000000000000c8
[   61.881100][T11670] RBP: 00007f7a7a90f620 R08: 0000000000000000 R09: 0000000100000000
[   61.882298][T11670] R10: 0000000100000000 R11: 0000000000000293 R12: 00007f7a7a911640
[   61.883501][T11670] R13: 000000000000006e R14: 000000000042f2f0 R15: 00007f7a7a8f1000
[   61.885999][T11670]  </TASK>

Signed-off-by: Radoslaw Zielonek <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jul 29, 2024
When cpu_map has been redirected, first the pointer to the
bpf_cpu_map_entry has been copied, then freed, and read from the copy.
To fix it, this commit introduced the refcount cpu_map_parent during
redirections to prevent use after free.

syzbot reported:

[   61.581464][T11670] ==================================================================
[   61.583323][T11670] BUG: KASAN: slab-use-after-free in cpu_map_enqueue+0xba/0x370
[   61.585419][T11670] Read of size 8 at addr ffff888122d75208 by task syzbot-repro/11670
[   61.587541][T11670]
[   61.588237][T11670] CPU: 1 PID: 11670 Comm: syzbot-repro Not tainted 6.9.0-rc6-00053-g0106679839f7 #27
[   61.590542][T11670] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[   61.592798][T11670] Call Trace:
[   61.593885][T11670]  <TASK>
[   61.594805][T11670]  dump_stack_lvl+0x241/0x360
[   61.595974][T11670]  ? tcp_gro_dev_warn+0x260/0x260
[   61.598242][T11670]  ? __wake_up_klogd+0xcc/0x100
[   61.599407][T11670]  ? panic+0x850/0x850
[   61.600516][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.602073][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.603496][T11670]  print_address_description+0x7b/0x360
[   61.605170][T11670]  print_report+0xfd/0x210
[   61.606370][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.607925][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.609577][T11670]  ? __virt_addr_valid+0x43d/0x510
[   61.610948][T11670]  ? __phys_addr+0xb9/0x170
[   61.612103][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.613448][T11670]  kasan_report+0x143/0x180
[   61.615000][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.616181][T11670]  cpu_map_enqueue+0xba/0x370
[   61.617620][T11670]  xdp_do_redirect+0x685/0xbf0
[   61.618787][T11670]  tun_xdp_act+0xe7/0x9e0
[   61.619856][T11670]  ? __tun_build_skb+0x2e0/0x2e0
[   61.621356][T11670]  tun_build_skb+0xac6/0x1140
[   61.622602][T11670]  ? tun_build_skb+0xb4/0x1140
[   61.623880][T11670]  ? tun_get_user+0x2760/0x2760
[   61.625341][T11670]  tun_get_user+0x7fa/0x2760
[   61.626532][T11670]  ? rcu_read_unlock+0xa0/0xa0
[   61.627725][T11670]  ? tun_get+0x1e/0x2f0
[   61.629147][T11670]  ? tun_get+0x1e/0x2f0
[   61.630265][T11670]  ? tun_get+0x27d/0x2f0
[   61.631486][T11670]  tun_chr_write_iter+0x111/0x1f0
[   61.632855][T11670]  vfs_write+0xa84/0xcb0
[   61.634185][T11670]  ? __lock_acquire+0x1f60/0x1f60
[   61.635501][T11670]  ? kernel_write+0x330/0x330
[   61.636757][T11670]  ? lockdep_hardirqs_on_prepare+0x43c/0x780
[   61.638445][T11670]  ? __fget_files+0x3ea/0x460
[   61.639448][T11670]  ? seqcount_lockdep_reader_access+0x157/0x220
[   61.641217][T11670]  ? __fdget_pos+0x19e/0x320
[   61.642426][T11670]  ksys_write+0x19f/0x2c0
[   61.643576][T11670]  ? __ia32_sys_read+0x90/0x90
[   61.644841][T11670]  ? ktime_get_coarse_real_ts64+0x10b/0x120
[   61.646549][T11670]  do_syscall_64+0xec/0x210
[   61.647832][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.649485][T11670] RIP: 0033:0x472a4f
[   61.650539][T11670] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 d8 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c d9 02 00 48
[   61.655476][T11670] RSP: 002b:00007f7a7a90f5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   61.657675][T11670] RAX: ffffffffffffffda RBX: 00007f7a7a911640 RCX: 0000000000472a4f
[   61.659658][T11670] RDX: 0000000000000066 RSI: 0000000020000440 RDI: 00000000000000c8
[   61.661980][T11670] RBP: 00007f7a7a90f620 R08: 0000000000000000 R09: 0000000100000000
[   61.663982][T11670] R10: 0000000100000000 R11: 0000000000000293 R12: 00007f7a7a911640
[   61.666425][T11670] R13: 000000000000006e R14: 000000000042f2f0 R15: 00007f7a7a8f1000
[   61.668443][T11670]  </TASK>
[   61.669233][T11670]
[   61.669754][T11670] Allocated by task 11643:
[   61.670855][T11670]  kasan_save_track+0x3f/0x70
[   61.672094][T11670]  __kasan_kmalloc+0x98/0xb0
[   61.673466][T11670]  __kmalloc_node+0x259/0x4f0
[   61.674687][T11670]  bpf_map_kmalloc_node+0xd3/0x1c0
[   61.676069][T11670]  cpu_map_update_elem+0x2f0/0x1000
[   61.677619][T11670]  bpf_map_update_value+0x1b2/0x540
[   61.679006][T11670]  map_update_elem+0x52f/0x6e0
[   61.680076][T11670]  __sys_bpf+0x7a9/0x850
[   61.681610][T11670]  __x64_sys_bpf+0x7c/0x90
[   61.682772][T11670]  do_syscall_64+0xec/0x210
[   61.683967][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.685648][T11670]
[   61.686282][T11670] Freed by task 1064:
[   61.687296][T11670]  kasan_save_track+0x3f/0x70
[   61.688498][T11670]  kasan_save_free_info+0x40/0x50
[   61.689786][T11670]  poison_slab_object+0xa6/0xe0
[   61.691059][T11670]  __kasan_slab_free+0x37/0x60
[   61.692336][T11670]  kfree+0x136/0x2f0
[   61.693549][T11670]  __cpu_map_entry_free+0x6f3/0x770
[   61.695004][T11670]  cpu_map_free+0xc0/0x180
[   61.696191][T11670]  bpf_map_free_deferred+0xe3/0x100
[   61.697703][T11670]  process_scheduled_works+0x9cb/0x14a0
[   61.699330][T11670]  worker_thread+0x85c/0xd50
[   61.700546][T11670]  kthread+0x2ef/0x390
[   61.701791][T11670]  ret_from_fork+0x4d/0x80
[   61.702942][T11670]  ret_from_fork_asm+0x11/0x20
[   61.704195][T11670]
[   61.704825][T11670] The buggy address belongs to the object at ffff888122d75200
[   61.704825][T11670]  which belongs to the cache kmalloc-cg-256 of size 256
[   61.708516][T11670] The buggy address is located 8 bytes inside of
[   61.708516][T11670]  freed 256-byte region [ffff888122d75200, ffff888122d75300)
[   61.712215][T11670]
[   61.712824][T11670] The buggy address belongs to the physical page:
[   61.714883][T11670] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x122d74
[   61.717300][T11670] head: order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[   61.719037][T11670] memcg:ffff888120d85f01
[   61.720006][T11670] flags: 0x17ff00000000840(slab|head|node=0|zone=2|lastcpupid=0x7ff)
[   61.722181][T11670] page_type: 0xffffffff()
[   61.723318][T11670] raw: 017ff00000000840 ffff88810004dcc0 dead000000000122 0000000000000000
[   61.725650][T11670] raw: 0000000000000000 0000000080100010 00000001ffffffff ffff888120d85f01
[   61.727943][T11670] head: 017ff00000000840 ffff88810004dcc0 dead000000000122 0000000000000000
[   61.730237][T11670] head: 0000000000000000 0000000080100010 00000001ffffffff ffff888120d85f01
[   61.732671][T11670] head: 017ff00000000001 ffffea00048b5d01 dead000000000122 00000000ffffffff
[   61.735029][T11670] head: 0000000200000000 0000000000000000 00000000ffffffff 0000000000000000
[   61.737400][T11670] page dumped because: kasan: bad access detected
[   61.740100][T11670] page_owner tracks the page as allocated
[   61.743121][T11670] page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 8343, tgid -2092279795 (syzbot-repro), ts 8343, free_ts 43505720198
[   61.754038][T11670]  post_alloc_hook+0x1e6/0x210
[   61.756046][T11670]  get_page_from_freelist+0x7d2/0x850
[   61.759460][T11670]  __alloc_pages+0x25e/0x580
[   61.761428][T11670]  alloc_slab_page+0x6b/0x1a0
[   61.764199][T11670]  allocate_slab+0x5d/0x200
[   61.766122][T11670]  ___slab_alloc+0xac5/0xf20
[   61.767195][T11670]  __kmalloc+0x2e0/0x4b0
[   61.769028][T11670]  fib_default_rule_add+0x4a/0x350
[   61.770394][T11670]  fib6_rules_net_init+0x42/0x100
[   61.771731][T11670]  ops_init+0x39d/0x670
[   61.773061][T11670]  setup_net+0x3bc/0xae0
[   61.774102][T11670]  copy_net_ns+0x399/0x5e0
[   61.775628][T11670]  create_new_namespaces+0x4de/0x8d0
[   61.776950][T11670]  unshare_nsproxy_namespaces+0x127/0x190
[   61.778352][T11670]  ksys_unshare+0x5e6/0xbf0
[   61.779741][T11670]  __x64_sys_unshare+0x38/0x40
[   61.781302][T11670] page last free pid 4619 tgid 4619 stack trace:
[   61.783542][T11670]  free_unref_page_prepare+0x72f/0x7c0
[   61.785018][T11670]  free_unref_page+0x37/0x3f0
[   61.786030][T11670]  __slab_free+0x351/0x3f0
[   61.786991][T11670]  qlist_free_all+0x60/0xd0
[   61.788827][T11670]  kasan_quarantine_reduce+0x15a/0x170
[   61.789951][T11670]  __kasan_slab_alloc+0x23/0x70
[   61.790999][T11670]  kmem_cache_alloc_node+0x193/0x390
[   61.792331][T11670]  kmalloc_reserve+0xa7/0x2a0
[   61.793345][T11670]  __alloc_skb+0x1ec/0x430
[   61.794435][T11670]  netlink_sendmsg+0x615/0xc80
[   61.796439][T11670]  __sock_sendmsg+0x21f/0x270
[   61.797467][T11670]  ____sys_sendmsg+0x540/0x860
[   61.798505][T11670]  __sys_sendmsg+0x2b7/0x3a0
[   61.799512][T11670]  do_syscall_64+0xec/0x210
[   61.800674][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.802021][T11670]
[   61.802526][T11670] Memory state around the buggy address:
[   61.803701][T11670]  ffff888122d75100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.805694][T11670]  ffff888122d75180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.808104][T11670] >ffff888122d75200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   61.809769][T11670]                       ^
[   61.810672][T11670]  ffff888122d75280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   61.812532][T11670]  ffff888122d75300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.814846][T11670] ==================================================================
[   61.816914][T11670] Kernel panic - not syncing: KASAN: panic_on_warn set ...
[   61.818415][T11670] CPU: 1 PID: 11670 Comm: syzbot-repro Not tainted 6.9.0-rc6-00053-g0106679839f7 #27
[   61.821191][T11670] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[   61.822911][T11670] Call Trace:
[   61.823632][T11670]  <TASK>
[   61.824525][T11670]  dump_stack_lvl+0x241/0x360
[   61.825545][T11670]  ? tcp_gro_dev_warn+0x260/0x260
[   61.826706][T11670]  ? panic+0x850/0x850
[   61.828594][T11670]  ? lock_release+0x85/0x860
[   61.829749][T11670]  ? vscnprintf+0x5d/0x80
[   61.830951][T11670]  panic+0x335/0x850
[   61.832316][T11670]  ? check_panic_on_warn+0x21/0xa0
[   61.834475][T11670]  ? __memcpy_flushcache+0x2c0/0x2c0
[   61.835809][T11670]  ? _raw_spin_unlock_irqrestore+0xd8/0x140
[   61.838063][T11670]  ? _raw_spin_unlock_irqrestore+0xdd/0x140
[   61.842056][T11670]  ? _raw_spin_unlock+0x40/0x40
[   61.843116][T11670]  ? print_report+0x1cc/0x210
[   61.844527][T11670]  check_panic_on_warn+0x82/0xa0
[   61.845336][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.846117][T11670]  end_report+0x48/0xa0
[   61.846790][T11670]  kasan_report+0x154/0x180
[   61.847520][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.848471][T11670]  cpu_map_enqueue+0xba/0x370
[   61.849968][T11670]  xdp_do_redirect+0x685/0xbf0
[   61.850994][T11670]  tun_xdp_act+0xe7/0x9e0
[   61.851703][T11670]  ? __tun_build_skb+0x2e0/0x2e0
[   61.852598][T11670]  tun_build_skb+0xac6/0x1140
[   61.853362][T11670]  ? tun_build_skb+0xb4/0x1140
[   61.854454][T11670]  ? tun_get_user+0x2760/0x2760
[   61.855806][T11670]  tun_get_user+0x7fa/0x2760
[   61.856734][T11670]  ? rcu_read_unlock+0xa0/0xa0
[   61.857502][T11670]  ? tun_get+0x1e/0x2f0
[   61.858171][T11670]  ? tun_get+0x1e/0x2f0
[   61.858952][T11670]  ? tun_get+0x27d/0x2f0
[   61.859637][T11670]  tun_chr_write_iter+0x111/0x1f0
[   61.860913][T11670]  vfs_write+0xa84/0xcb0
[   61.861578][T11670]  ? __lock_acquire+0x1f60/0x1f60
[   61.862376][T11670]  ? kernel_write+0x330/0x330
[   61.863221][T11670]  ? lockdep_hardirqs_on_prepare+0x43c/0x780
[   61.864230][T11670]  ? __fget_files+0x3ea/0x460
[   61.864955][T11670]  ? seqcount_lockdep_reader_access+0x157/0x220
[   61.866571][T11670]  ? __fdget_pos+0x19e/0x320
[   61.867414][T11670]  ksys_write+0x19f/0x2c0
[   61.868263][T11670]  ? __ia32_sys_read+0x90/0x90
[   61.868996][T11670]  ? ktime_get_coarse_real_ts64+0x10b/0x120
[   61.869896][T11670]  do_syscall_64+0xec/0x210
[   61.870592][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.871595][T11670] RIP: 0033:0x472a4f
[   61.873158][T11670] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 d8 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c d9 02 00 48
[   61.876447][T11670] RSP: 002b:00007f7a7a90f5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   61.877944][T11670] RAX: ffffffffffffffda RBX: 00007f7a7a911640 RCX: 0000000000472a4f
[   61.879751][T11670] RDX: 0000000000000066 RSI: 0000000020000440 RDI: 00000000000000c8
[   61.881100][T11670] RBP: 00007f7a7a90f620 R08: 0000000000000000 R09: 0000000100000000
[   61.882298][T11670] R10: 0000000100000000 R11: 0000000000000293 R12: 00007f7a7a911640
[   61.883501][T11670] R13: 000000000000006e R14: 000000000042f2f0 R15: 00007f7a7a8f1000
[   61.885999][T11670]  </TASK>

Signed-off-by: Radoslaw Zielonek <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jul 30, 2024
When cpu_map has been redirected, first the pointer to the
bpf_cpu_map_entry has been copied, then freed, and read from the copy.
To fix it, this commit introduced the refcount cpu_map_parent during
redirections to prevent use after free.

syzbot reported:

[   61.581464][T11670] ==================================================================
[   61.583323][T11670] BUG: KASAN: slab-use-after-free in cpu_map_enqueue+0xba/0x370
[   61.585419][T11670] Read of size 8 at addr ffff888122d75208 by task syzbot-repro/11670
[   61.587541][T11670]
[   61.588237][T11670] CPU: 1 PID: 11670 Comm: syzbot-repro Not tainted 6.9.0-rc6-00053-g0106679839f7 #27
[   61.590542][T11670] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[   61.592798][T11670] Call Trace:
[   61.593885][T11670]  <TASK>
[   61.594805][T11670]  dump_stack_lvl+0x241/0x360
[   61.595974][T11670]  ? tcp_gro_dev_warn+0x260/0x260
[   61.598242][T11670]  ? __wake_up_klogd+0xcc/0x100
[   61.599407][T11670]  ? panic+0x850/0x850
[   61.600516][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.602073][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.603496][T11670]  print_address_description+0x7b/0x360
[   61.605170][T11670]  print_report+0xfd/0x210
[   61.606370][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.607925][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.609577][T11670]  ? __virt_addr_valid+0x43d/0x510
[   61.610948][T11670]  ? __phys_addr+0xb9/0x170
[   61.612103][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.613448][T11670]  kasan_report+0x143/0x180
[   61.615000][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.616181][T11670]  cpu_map_enqueue+0xba/0x370
[   61.617620][T11670]  xdp_do_redirect+0x685/0xbf0
[   61.618787][T11670]  tun_xdp_act+0xe7/0x9e0
[   61.619856][T11670]  ? __tun_build_skb+0x2e0/0x2e0
[   61.621356][T11670]  tun_build_skb+0xac6/0x1140
[   61.622602][T11670]  ? tun_build_skb+0xb4/0x1140
[   61.623880][T11670]  ? tun_get_user+0x2760/0x2760
[   61.625341][T11670]  tun_get_user+0x7fa/0x2760
[   61.626532][T11670]  ? rcu_read_unlock+0xa0/0xa0
[   61.627725][T11670]  ? tun_get+0x1e/0x2f0
[   61.629147][T11670]  ? tun_get+0x1e/0x2f0
[   61.630265][T11670]  ? tun_get+0x27d/0x2f0
[   61.631486][T11670]  tun_chr_write_iter+0x111/0x1f0
[   61.632855][T11670]  vfs_write+0xa84/0xcb0
[   61.634185][T11670]  ? __lock_acquire+0x1f60/0x1f60
[   61.635501][T11670]  ? kernel_write+0x330/0x330
[   61.636757][T11670]  ? lockdep_hardirqs_on_prepare+0x43c/0x780
[   61.638445][T11670]  ? __fget_files+0x3ea/0x460
[   61.639448][T11670]  ? seqcount_lockdep_reader_access+0x157/0x220
[   61.641217][T11670]  ? __fdget_pos+0x19e/0x320
[   61.642426][T11670]  ksys_write+0x19f/0x2c0
[   61.643576][T11670]  ? __ia32_sys_read+0x90/0x90
[   61.644841][T11670]  ? ktime_get_coarse_real_ts64+0x10b/0x120
[   61.646549][T11670]  do_syscall_64+0xec/0x210
[   61.647832][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.649485][T11670] RIP: 0033:0x472a4f
[   61.650539][T11670] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 d8 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c d9 02 00 48
[   61.655476][T11670] RSP: 002b:00007f7a7a90f5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   61.657675][T11670] RAX: ffffffffffffffda RBX: 00007f7a7a911640 RCX: 0000000000472a4f
[   61.659658][T11670] RDX: 0000000000000066 RSI: 0000000020000440 RDI: 00000000000000c8
[   61.661980][T11670] RBP: 00007f7a7a90f620 R08: 0000000000000000 R09: 0000000100000000
[   61.663982][T11670] R10: 0000000100000000 R11: 0000000000000293 R12: 00007f7a7a911640
[   61.666425][T11670] R13: 000000000000006e R14: 000000000042f2f0 R15: 00007f7a7a8f1000
[   61.668443][T11670]  </TASK>
[   61.669233][T11670]
[   61.669754][T11670] Allocated by task 11643:
[   61.670855][T11670]  kasan_save_track+0x3f/0x70
[   61.672094][T11670]  __kasan_kmalloc+0x98/0xb0
[   61.673466][T11670]  __kmalloc_node+0x259/0x4f0
[   61.674687][T11670]  bpf_map_kmalloc_node+0xd3/0x1c0
[   61.676069][T11670]  cpu_map_update_elem+0x2f0/0x1000
[   61.677619][T11670]  bpf_map_update_value+0x1b2/0x540
[   61.679006][T11670]  map_update_elem+0x52f/0x6e0
[   61.680076][T11670]  __sys_bpf+0x7a9/0x850
[   61.681610][T11670]  __x64_sys_bpf+0x7c/0x90
[   61.682772][T11670]  do_syscall_64+0xec/0x210
[   61.683967][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.685648][T11670]
[   61.686282][T11670] Freed by task 1064:
[   61.687296][T11670]  kasan_save_track+0x3f/0x70
[   61.688498][T11670]  kasan_save_free_info+0x40/0x50
[   61.689786][T11670]  poison_slab_object+0xa6/0xe0
[   61.691059][T11670]  __kasan_slab_free+0x37/0x60
[   61.692336][T11670]  kfree+0x136/0x2f0
[   61.693549][T11670]  __cpu_map_entry_free+0x6f3/0x770
[   61.695004][T11670]  cpu_map_free+0xc0/0x180
[   61.696191][T11670]  bpf_map_free_deferred+0xe3/0x100
[   61.697703][T11670]  process_scheduled_works+0x9cb/0x14a0
[   61.699330][T11670]  worker_thread+0x85c/0xd50
[   61.700546][T11670]  kthread+0x2ef/0x390
[   61.701791][T11670]  ret_from_fork+0x4d/0x80
[   61.702942][T11670]  ret_from_fork_asm+0x11/0x20
[   61.704195][T11670]
[   61.704825][T11670] The buggy address belongs to the object at ffff888122d75200
[   61.704825][T11670]  which belongs to the cache kmalloc-cg-256 of size 256
[   61.708516][T11670] The buggy address is located 8 bytes inside of
[   61.708516][T11670]  freed 256-byte region [ffff888122d75200, ffff888122d75300)
[   61.712215][T11670]
[   61.712824][T11670] The buggy address belongs to the physical page:
[   61.714883][T11670] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x122d74
[   61.717300][T11670] head: order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[   61.719037][T11670] memcg:ffff888120d85f01
[   61.720006][T11670] flags: 0x17ff00000000840(slab|head|node=0|zone=2|lastcpupid=0x7ff)
[   61.722181][T11670] page_type: 0xffffffff()
[   61.723318][T11670] raw: 017ff00000000840 ffff88810004dcc0 dead000000000122 0000000000000000
[   61.725650][T11670] raw: 0000000000000000 0000000080100010 00000001ffffffff ffff888120d85f01
[   61.727943][T11670] head: 017ff00000000840 ffff88810004dcc0 dead000000000122 0000000000000000
[   61.730237][T11670] head: 0000000000000000 0000000080100010 00000001ffffffff ffff888120d85f01
[   61.732671][T11670] head: 017ff00000000001 ffffea00048b5d01 dead000000000122 00000000ffffffff
[   61.735029][T11670] head: 0000000200000000 0000000000000000 00000000ffffffff 0000000000000000
[   61.737400][T11670] page dumped because: kasan: bad access detected
[   61.740100][T11670] page_owner tracks the page as allocated
[   61.743121][T11670] page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 8343, tgid -2092279795 (syzbot-repro), ts 8343, free_ts 43505720198
[   61.754038][T11670]  post_alloc_hook+0x1e6/0x210
[   61.756046][T11670]  get_page_from_freelist+0x7d2/0x850
[   61.759460][T11670]  __alloc_pages+0x25e/0x580
[   61.761428][T11670]  alloc_slab_page+0x6b/0x1a0
[   61.764199][T11670]  allocate_slab+0x5d/0x200
[   61.766122][T11670]  ___slab_alloc+0xac5/0xf20
[   61.767195][T11670]  __kmalloc+0x2e0/0x4b0
[   61.769028][T11670]  fib_default_rule_add+0x4a/0x350
[   61.770394][T11670]  fib6_rules_net_init+0x42/0x100
[   61.771731][T11670]  ops_init+0x39d/0x670
[   61.773061][T11670]  setup_net+0x3bc/0xae0
[   61.774102][T11670]  copy_net_ns+0x399/0x5e0
[   61.775628][T11670]  create_new_namespaces+0x4de/0x8d0
[   61.776950][T11670]  unshare_nsproxy_namespaces+0x127/0x190
[   61.778352][T11670]  ksys_unshare+0x5e6/0xbf0
[   61.779741][T11670]  __x64_sys_unshare+0x38/0x40
[   61.781302][T11670] page last free pid 4619 tgid 4619 stack trace:
[   61.783542][T11670]  free_unref_page_prepare+0x72f/0x7c0
[   61.785018][T11670]  free_unref_page+0x37/0x3f0
[   61.786030][T11670]  __slab_free+0x351/0x3f0
[   61.786991][T11670]  qlist_free_all+0x60/0xd0
[   61.788827][T11670]  kasan_quarantine_reduce+0x15a/0x170
[   61.789951][T11670]  __kasan_slab_alloc+0x23/0x70
[   61.790999][T11670]  kmem_cache_alloc_node+0x193/0x390
[   61.792331][T11670]  kmalloc_reserve+0xa7/0x2a0
[   61.793345][T11670]  __alloc_skb+0x1ec/0x430
[   61.794435][T11670]  netlink_sendmsg+0x615/0xc80
[   61.796439][T11670]  __sock_sendmsg+0x21f/0x270
[   61.797467][T11670]  ____sys_sendmsg+0x540/0x860
[   61.798505][T11670]  __sys_sendmsg+0x2b7/0x3a0
[   61.799512][T11670]  do_syscall_64+0xec/0x210
[   61.800674][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.802021][T11670]
[   61.802526][T11670] Memory state around the buggy address:
[   61.803701][T11670]  ffff888122d75100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.805694][T11670]  ffff888122d75180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.808104][T11670] >ffff888122d75200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   61.809769][T11670]                       ^
[   61.810672][T11670]  ffff888122d75280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   61.812532][T11670]  ffff888122d75300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.814846][T11670] ==================================================================
[   61.816914][T11670] Kernel panic - not syncing: KASAN: panic_on_warn set ...
[   61.818415][T11670] CPU: 1 PID: 11670 Comm: syzbot-repro Not tainted 6.9.0-rc6-00053-g0106679839f7 #27
[   61.821191][T11670] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[   61.822911][T11670] Call Trace:
[   61.823632][T11670]  <TASK>
[   61.824525][T11670]  dump_stack_lvl+0x241/0x360
[   61.825545][T11670]  ? tcp_gro_dev_warn+0x260/0x260
[   61.826706][T11670]  ? panic+0x850/0x850
[   61.828594][T11670]  ? lock_release+0x85/0x860
[   61.829749][T11670]  ? vscnprintf+0x5d/0x80
[   61.830951][T11670]  panic+0x335/0x850
[   61.832316][T11670]  ? check_panic_on_warn+0x21/0xa0
[   61.834475][T11670]  ? __memcpy_flushcache+0x2c0/0x2c0
[   61.835809][T11670]  ? _raw_spin_unlock_irqrestore+0xd8/0x140
[   61.838063][T11670]  ? _raw_spin_unlock_irqrestore+0xdd/0x140
[   61.842056][T11670]  ? _raw_spin_unlock+0x40/0x40
[   61.843116][T11670]  ? print_report+0x1cc/0x210
[   61.844527][T11670]  check_panic_on_warn+0x82/0xa0
[   61.845336][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.846117][T11670]  end_report+0x48/0xa0
[   61.846790][T11670]  kasan_report+0x154/0x180
[   61.847520][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.848471][T11670]  cpu_map_enqueue+0xba/0x370
[   61.849968][T11670]  xdp_do_redirect+0x685/0xbf0
[   61.850994][T11670]  tun_xdp_act+0xe7/0x9e0
[   61.851703][T11670]  ? __tun_build_skb+0x2e0/0x2e0
[   61.852598][T11670]  tun_build_skb+0xac6/0x1140
[   61.853362][T11670]  ? tun_build_skb+0xb4/0x1140
[   61.854454][T11670]  ? tun_get_user+0x2760/0x2760
[   61.855806][T11670]  tun_get_user+0x7fa/0x2760
[   61.856734][T11670]  ? rcu_read_unlock+0xa0/0xa0
[   61.857502][T11670]  ? tun_get+0x1e/0x2f0
[   61.858171][T11670]  ? tun_get+0x1e/0x2f0
[   61.858952][T11670]  ? tun_get+0x27d/0x2f0
[   61.859637][T11670]  tun_chr_write_iter+0x111/0x1f0
[   61.860913][T11670]  vfs_write+0xa84/0xcb0
[   61.861578][T11670]  ? __lock_acquire+0x1f60/0x1f60
[   61.862376][T11670]  ? kernel_write+0x330/0x330
[   61.863221][T11670]  ? lockdep_hardirqs_on_prepare+0x43c/0x780
[   61.864230][T11670]  ? __fget_files+0x3ea/0x460
[   61.864955][T11670]  ? seqcount_lockdep_reader_access+0x157/0x220
[   61.866571][T11670]  ? __fdget_pos+0x19e/0x320
[   61.867414][T11670]  ksys_write+0x19f/0x2c0
[   61.868263][T11670]  ? __ia32_sys_read+0x90/0x90
[   61.868996][T11670]  ? ktime_get_coarse_real_ts64+0x10b/0x120
[   61.869896][T11670]  do_syscall_64+0xec/0x210
[   61.870592][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.871595][T11670] RIP: 0033:0x472a4f
[   61.873158][T11670] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 d8 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c d9 02 00 48
[   61.876447][T11670] RSP: 002b:00007f7a7a90f5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   61.877944][T11670] RAX: ffffffffffffffda RBX: 00007f7a7a911640 RCX: 0000000000472a4f
[   61.879751][T11670] RDX: 0000000000000066 RSI: 0000000020000440 RDI: 00000000000000c8
[   61.881100][T11670] RBP: 00007f7a7a90f620 R08: 0000000000000000 R09: 0000000100000000
[   61.882298][T11670] R10: 0000000100000000 R11: 0000000000000293 R12: 00007f7a7a911640
[   61.883501][T11670] R13: 000000000000006e R14: 000000000042f2f0 R15: 00007f7a7a8f1000
[   61.885999][T11670]  </TASK>

Signed-off-by: Radoslaw Zielonek <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jul 30, 2024
When cpu_map has been redirected, first the pointer to the
bpf_cpu_map_entry has been copied, then freed, and read from the copy.
To fix it, this commit introduced the refcount cpu_map_parent during
redirections to prevent use after free.

syzbot reported:

[   61.581464][T11670] ==================================================================
[   61.583323][T11670] BUG: KASAN: slab-use-after-free in cpu_map_enqueue+0xba/0x370
[   61.585419][T11670] Read of size 8 at addr ffff888122d75208 by task syzbot-repro/11670
[   61.587541][T11670]
[   61.588237][T11670] CPU: 1 PID: 11670 Comm: syzbot-repro Not tainted 6.9.0-rc6-00053-g0106679839f7 #27
[   61.590542][T11670] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[   61.592798][T11670] Call Trace:
[   61.593885][T11670]  <TASK>
[   61.594805][T11670]  dump_stack_lvl+0x241/0x360
[   61.595974][T11670]  ? tcp_gro_dev_warn+0x260/0x260
[   61.598242][T11670]  ? __wake_up_klogd+0xcc/0x100
[   61.599407][T11670]  ? panic+0x850/0x850
[   61.600516][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.602073][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.603496][T11670]  print_address_description+0x7b/0x360
[   61.605170][T11670]  print_report+0xfd/0x210
[   61.606370][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.607925][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.609577][T11670]  ? __virt_addr_valid+0x43d/0x510
[   61.610948][T11670]  ? __phys_addr+0xb9/0x170
[   61.612103][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.613448][T11670]  kasan_report+0x143/0x180
[   61.615000][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.616181][T11670]  cpu_map_enqueue+0xba/0x370
[   61.617620][T11670]  xdp_do_redirect+0x685/0xbf0
[   61.618787][T11670]  tun_xdp_act+0xe7/0x9e0
[   61.619856][T11670]  ? __tun_build_skb+0x2e0/0x2e0
[   61.621356][T11670]  tun_build_skb+0xac6/0x1140
[   61.622602][T11670]  ? tun_build_skb+0xb4/0x1140
[   61.623880][T11670]  ? tun_get_user+0x2760/0x2760
[   61.625341][T11670]  tun_get_user+0x7fa/0x2760
[   61.626532][T11670]  ? rcu_read_unlock+0xa0/0xa0
[   61.627725][T11670]  ? tun_get+0x1e/0x2f0
[   61.629147][T11670]  ? tun_get+0x1e/0x2f0
[   61.630265][T11670]  ? tun_get+0x27d/0x2f0
[   61.631486][T11670]  tun_chr_write_iter+0x111/0x1f0
[   61.632855][T11670]  vfs_write+0xa84/0xcb0
[   61.634185][T11670]  ? __lock_acquire+0x1f60/0x1f60
[   61.635501][T11670]  ? kernel_write+0x330/0x330
[   61.636757][T11670]  ? lockdep_hardirqs_on_prepare+0x43c/0x780
[   61.638445][T11670]  ? __fget_files+0x3ea/0x460
[   61.639448][T11670]  ? seqcount_lockdep_reader_access+0x157/0x220
[   61.641217][T11670]  ? __fdget_pos+0x19e/0x320
[   61.642426][T11670]  ksys_write+0x19f/0x2c0
[   61.643576][T11670]  ? __ia32_sys_read+0x90/0x90
[   61.644841][T11670]  ? ktime_get_coarse_real_ts64+0x10b/0x120
[   61.646549][T11670]  do_syscall_64+0xec/0x210
[   61.647832][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.649485][T11670] RIP: 0033:0x472a4f
[   61.650539][T11670] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 d8 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c d9 02 00 48
[   61.655476][T11670] RSP: 002b:00007f7a7a90f5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   61.657675][T11670] RAX: ffffffffffffffda RBX: 00007f7a7a911640 RCX: 0000000000472a4f
[   61.659658][T11670] RDX: 0000000000000066 RSI: 0000000020000440 RDI: 00000000000000c8
[   61.661980][T11670] RBP: 00007f7a7a90f620 R08: 0000000000000000 R09: 0000000100000000
[   61.663982][T11670] R10: 0000000100000000 R11: 0000000000000293 R12: 00007f7a7a911640
[   61.666425][T11670] R13: 000000000000006e R14: 000000000042f2f0 R15: 00007f7a7a8f1000
[   61.668443][T11670]  </TASK>
[   61.669233][T11670]
[   61.669754][T11670] Allocated by task 11643:
[   61.670855][T11670]  kasan_save_track+0x3f/0x70
[   61.672094][T11670]  __kasan_kmalloc+0x98/0xb0
[   61.673466][T11670]  __kmalloc_node+0x259/0x4f0
[   61.674687][T11670]  bpf_map_kmalloc_node+0xd3/0x1c0
[   61.676069][T11670]  cpu_map_update_elem+0x2f0/0x1000
[   61.677619][T11670]  bpf_map_update_value+0x1b2/0x540
[   61.679006][T11670]  map_update_elem+0x52f/0x6e0
[   61.680076][T11670]  __sys_bpf+0x7a9/0x850
[   61.681610][T11670]  __x64_sys_bpf+0x7c/0x90
[   61.682772][T11670]  do_syscall_64+0xec/0x210
[   61.683967][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.685648][T11670]
[   61.686282][T11670] Freed by task 1064:
[   61.687296][T11670]  kasan_save_track+0x3f/0x70
[   61.688498][T11670]  kasan_save_free_info+0x40/0x50
[   61.689786][T11670]  poison_slab_object+0xa6/0xe0
[   61.691059][T11670]  __kasan_slab_free+0x37/0x60
[   61.692336][T11670]  kfree+0x136/0x2f0
[   61.693549][T11670]  __cpu_map_entry_free+0x6f3/0x770
[   61.695004][T11670]  cpu_map_free+0xc0/0x180
[   61.696191][T11670]  bpf_map_free_deferred+0xe3/0x100
[   61.697703][T11670]  process_scheduled_works+0x9cb/0x14a0
[   61.699330][T11670]  worker_thread+0x85c/0xd50
[   61.700546][T11670]  kthread+0x2ef/0x390
[   61.701791][T11670]  ret_from_fork+0x4d/0x80
[   61.702942][T11670]  ret_from_fork_asm+0x11/0x20
[   61.704195][T11670]
[   61.704825][T11670] The buggy address belongs to the object at ffff888122d75200
[   61.704825][T11670]  which belongs to the cache kmalloc-cg-256 of size 256
[   61.708516][T11670] The buggy address is located 8 bytes inside of
[   61.708516][T11670]  freed 256-byte region [ffff888122d75200, ffff888122d75300)
[   61.712215][T11670]
[   61.712824][T11670] The buggy address belongs to the physical page:
[   61.714883][T11670] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x122d74
[   61.717300][T11670] head: order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[   61.719037][T11670] memcg:ffff888120d85f01
[   61.720006][T11670] flags: 0x17ff00000000840(slab|head|node=0|zone=2|lastcpupid=0x7ff)
[   61.722181][T11670] page_type: 0xffffffff()
[   61.723318][T11670] raw: 017ff00000000840 ffff88810004dcc0 dead000000000122 0000000000000000
[   61.725650][T11670] raw: 0000000000000000 0000000080100010 00000001ffffffff ffff888120d85f01
[   61.727943][T11670] head: 017ff00000000840 ffff88810004dcc0 dead000000000122 0000000000000000
[   61.730237][T11670] head: 0000000000000000 0000000080100010 00000001ffffffff ffff888120d85f01
[   61.732671][T11670] head: 017ff00000000001 ffffea00048b5d01 dead000000000122 00000000ffffffff
[   61.735029][T11670] head: 0000000200000000 0000000000000000 00000000ffffffff 0000000000000000
[   61.737400][T11670] page dumped because: kasan: bad access detected
[   61.740100][T11670] page_owner tracks the page as allocated
[   61.743121][T11670] page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 8343, tgid -2092279795 (syzbot-repro), ts 8343, free_ts 43505720198
[   61.754038][T11670]  post_alloc_hook+0x1e6/0x210
[   61.756046][T11670]  get_page_from_freelist+0x7d2/0x850
[   61.759460][T11670]  __alloc_pages+0x25e/0x580
[   61.761428][T11670]  alloc_slab_page+0x6b/0x1a0
[   61.764199][T11670]  allocate_slab+0x5d/0x200
[   61.766122][T11670]  ___slab_alloc+0xac5/0xf20
[   61.767195][T11670]  __kmalloc+0x2e0/0x4b0
[   61.769028][T11670]  fib_default_rule_add+0x4a/0x350
[   61.770394][T11670]  fib6_rules_net_init+0x42/0x100
[   61.771731][T11670]  ops_init+0x39d/0x670
[   61.773061][T11670]  setup_net+0x3bc/0xae0
[   61.774102][T11670]  copy_net_ns+0x399/0x5e0
[   61.775628][T11670]  create_new_namespaces+0x4de/0x8d0
[   61.776950][T11670]  unshare_nsproxy_namespaces+0x127/0x190
[   61.778352][T11670]  ksys_unshare+0x5e6/0xbf0
[   61.779741][T11670]  __x64_sys_unshare+0x38/0x40
[   61.781302][T11670] page last free pid 4619 tgid 4619 stack trace:
[   61.783542][T11670]  free_unref_page_prepare+0x72f/0x7c0
[   61.785018][T11670]  free_unref_page+0x37/0x3f0
[   61.786030][T11670]  __slab_free+0x351/0x3f0
[   61.786991][T11670]  qlist_free_all+0x60/0xd0
[   61.788827][T11670]  kasan_quarantine_reduce+0x15a/0x170
[   61.789951][T11670]  __kasan_slab_alloc+0x23/0x70
[   61.790999][T11670]  kmem_cache_alloc_node+0x193/0x390
[   61.792331][T11670]  kmalloc_reserve+0xa7/0x2a0
[   61.793345][T11670]  __alloc_skb+0x1ec/0x430
[   61.794435][T11670]  netlink_sendmsg+0x615/0xc80
[   61.796439][T11670]  __sock_sendmsg+0x21f/0x270
[   61.797467][T11670]  ____sys_sendmsg+0x540/0x860
[   61.798505][T11670]  __sys_sendmsg+0x2b7/0x3a0
[   61.799512][T11670]  do_syscall_64+0xec/0x210
[   61.800674][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.802021][T11670]
[   61.802526][T11670] Memory state around the buggy address:
[   61.803701][T11670]  ffff888122d75100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.805694][T11670]  ffff888122d75180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.808104][T11670] >ffff888122d75200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   61.809769][T11670]                       ^
[   61.810672][T11670]  ffff888122d75280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   61.812532][T11670]  ffff888122d75300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.814846][T11670] ==================================================================
[   61.816914][T11670] Kernel panic - not syncing: KASAN: panic_on_warn set ...
[   61.818415][T11670] CPU: 1 PID: 11670 Comm: syzbot-repro Not tainted 6.9.0-rc6-00053-g0106679839f7 #27
[   61.821191][T11670] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[   61.822911][T11670] Call Trace:
[   61.823632][T11670]  <TASK>
[   61.824525][T11670]  dump_stack_lvl+0x241/0x360
[   61.825545][T11670]  ? tcp_gro_dev_warn+0x260/0x260
[   61.826706][T11670]  ? panic+0x850/0x850
[   61.828594][T11670]  ? lock_release+0x85/0x860
[   61.829749][T11670]  ? vscnprintf+0x5d/0x80
[   61.830951][T11670]  panic+0x335/0x850
[   61.832316][T11670]  ? check_panic_on_warn+0x21/0xa0
[   61.834475][T11670]  ? __memcpy_flushcache+0x2c0/0x2c0
[   61.835809][T11670]  ? _raw_spin_unlock_irqrestore+0xd8/0x140
[   61.838063][T11670]  ? _raw_spin_unlock_irqrestore+0xdd/0x140
[   61.842056][T11670]  ? _raw_spin_unlock+0x40/0x40
[   61.843116][T11670]  ? print_report+0x1cc/0x210
[   61.844527][T11670]  check_panic_on_warn+0x82/0xa0
[   61.845336][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.846117][T11670]  end_report+0x48/0xa0
[   61.846790][T11670]  kasan_report+0x154/0x180
[   61.847520][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.848471][T11670]  cpu_map_enqueue+0xba/0x370
[   61.849968][T11670]  xdp_do_redirect+0x685/0xbf0
[   61.850994][T11670]  tun_xdp_act+0xe7/0x9e0
[   61.851703][T11670]  ? __tun_build_skb+0x2e0/0x2e0
[   61.852598][T11670]  tun_build_skb+0xac6/0x1140
[   61.853362][T11670]  ? tun_build_skb+0xb4/0x1140
[   61.854454][T11670]  ? tun_get_user+0x2760/0x2760
[   61.855806][T11670]  tun_get_user+0x7fa/0x2760
[   61.856734][T11670]  ? rcu_read_unlock+0xa0/0xa0
[   61.857502][T11670]  ? tun_get+0x1e/0x2f0
[   61.858171][T11670]  ? tun_get+0x1e/0x2f0
[   61.858952][T11670]  ? tun_get+0x27d/0x2f0
[   61.859637][T11670]  tun_chr_write_iter+0x111/0x1f0
[   61.860913][T11670]  vfs_write+0xa84/0xcb0
[   61.861578][T11670]  ? __lock_acquire+0x1f60/0x1f60
[   61.862376][T11670]  ? kernel_write+0x330/0x330
[   61.863221][T11670]  ? lockdep_hardirqs_on_prepare+0x43c/0x780
[   61.864230][T11670]  ? __fget_files+0x3ea/0x460
[   61.864955][T11670]  ? seqcount_lockdep_reader_access+0x157/0x220
[   61.866571][T11670]  ? __fdget_pos+0x19e/0x320
[   61.867414][T11670]  ksys_write+0x19f/0x2c0
[   61.868263][T11670]  ? __ia32_sys_read+0x90/0x90
[   61.868996][T11670]  ? ktime_get_coarse_real_ts64+0x10b/0x120
[   61.869896][T11670]  do_syscall_64+0xec/0x210
[   61.870592][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.871595][T11670] RIP: 0033:0x472a4f
[   61.873158][T11670] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 d8 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c d9 02 00 48
[   61.876447][T11670] RSP: 002b:00007f7a7a90f5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   61.877944][T11670] RAX: ffffffffffffffda RBX: 00007f7a7a911640 RCX: 0000000000472a4f
[   61.879751][T11670] RDX: 0000000000000066 RSI: 0000000020000440 RDI: 00000000000000c8
[   61.881100][T11670] RBP: 00007f7a7a90f620 R08: 0000000000000000 R09: 0000000100000000
[   61.882298][T11670] R10: 0000000100000000 R11: 0000000000000293 R12: 00007f7a7a911640
[   61.883501][T11670] R13: 000000000000006e R14: 000000000042f2f0 R15: 00007f7a7a8f1000
[   61.885999][T11670]  </TASK>

Signed-off-by: Radoslaw Zielonek <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jul 30, 2024
When cpu_map has been redirected, first the pointer to the
bpf_cpu_map_entry has been copied, then freed, and read from the copy.
To fix it, this commit introduced the refcount cpu_map_parent during
redirections to prevent use after free.

syzbot reported:

[   61.581464][T11670] ==================================================================
[   61.583323][T11670] BUG: KASAN: slab-use-after-free in cpu_map_enqueue+0xba/0x370
[   61.585419][T11670] Read of size 8 at addr ffff888122d75208 by task syzbot-repro/11670
[   61.587541][T11670]
[   61.588237][T11670] CPU: 1 PID: 11670 Comm: syzbot-repro Not tainted 6.9.0-rc6-00053-g0106679839f7 #27
[   61.590542][T11670] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[   61.592798][T11670] Call Trace:
[   61.593885][T11670]  <TASK>
[   61.594805][T11670]  dump_stack_lvl+0x241/0x360
[   61.595974][T11670]  ? tcp_gro_dev_warn+0x260/0x260
[   61.598242][T11670]  ? __wake_up_klogd+0xcc/0x100
[   61.599407][T11670]  ? panic+0x850/0x850
[   61.600516][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.602073][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.603496][T11670]  print_address_description+0x7b/0x360
[   61.605170][T11670]  print_report+0xfd/0x210
[   61.606370][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.607925][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.609577][T11670]  ? __virt_addr_valid+0x43d/0x510
[   61.610948][T11670]  ? __phys_addr+0xb9/0x170
[   61.612103][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.613448][T11670]  kasan_report+0x143/0x180
[   61.615000][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.616181][T11670]  cpu_map_enqueue+0xba/0x370
[   61.617620][T11670]  xdp_do_redirect+0x685/0xbf0
[   61.618787][T11670]  tun_xdp_act+0xe7/0x9e0
[   61.619856][T11670]  ? __tun_build_skb+0x2e0/0x2e0
[   61.621356][T11670]  tun_build_skb+0xac6/0x1140
[   61.622602][T11670]  ? tun_build_skb+0xb4/0x1140
[   61.623880][T11670]  ? tun_get_user+0x2760/0x2760
[   61.625341][T11670]  tun_get_user+0x7fa/0x2760
[   61.626532][T11670]  ? rcu_read_unlock+0xa0/0xa0
[   61.627725][T11670]  ? tun_get+0x1e/0x2f0
[   61.629147][T11670]  ? tun_get+0x1e/0x2f0
[   61.630265][T11670]  ? tun_get+0x27d/0x2f0
[   61.631486][T11670]  tun_chr_write_iter+0x111/0x1f0
[   61.632855][T11670]  vfs_write+0xa84/0xcb0
[   61.634185][T11670]  ? __lock_acquire+0x1f60/0x1f60
[   61.635501][T11670]  ? kernel_write+0x330/0x330
[   61.636757][T11670]  ? lockdep_hardirqs_on_prepare+0x43c/0x780
[   61.638445][T11670]  ? __fget_files+0x3ea/0x460
[   61.639448][T11670]  ? seqcount_lockdep_reader_access+0x157/0x220
[   61.641217][T11670]  ? __fdget_pos+0x19e/0x320
[   61.642426][T11670]  ksys_write+0x19f/0x2c0
[   61.643576][T11670]  ? __ia32_sys_read+0x90/0x90
[   61.644841][T11670]  ? ktime_get_coarse_real_ts64+0x10b/0x120
[   61.646549][T11670]  do_syscall_64+0xec/0x210
[   61.647832][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.649485][T11670] RIP: 0033:0x472a4f
[   61.650539][T11670] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 d8 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c d9 02 00 48
[   61.655476][T11670] RSP: 002b:00007f7a7a90f5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   61.657675][T11670] RAX: ffffffffffffffda RBX: 00007f7a7a911640 RCX: 0000000000472a4f
[   61.659658][T11670] RDX: 0000000000000066 RSI: 0000000020000440 RDI: 00000000000000c8
[   61.661980][T11670] RBP: 00007f7a7a90f620 R08: 0000000000000000 R09: 0000000100000000
[   61.663982][T11670] R10: 0000000100000000 R11: 0000000000000293 R12: 00007f7a7a911640
[   61.666425][T11670] R13: 000000000000006e R14: 000000000042f2f0 R15: 00007f7a7a8f1000
[   61.668443][T11670]  </TASK>
[   61.669233][T11670]
[   61.669754][T11670] Allocated by task 11643:
[   61.670855][T11670]  kasan_save_track+0x3f/0x70
[   61.672094][T11670]  __kasan_kmalloc+0x98/0xb0
[   61.673466][T11670]  __kmalloc_node+0x259/0x4f0
[   61.674687][T11670]  bpf_map_kmalloc_node+0xd3/0x1c0
[   61.676069][T11670]  cpu_map_update_elem+0x2f0/0x1000
[   61.677619][T11670]  bpf_map_update_value+0x1b2/0x540
[   61.679006][T11670]  map_update_elem+0x52f/0x6e0
[   61.680076][T11670]  __sys_bpf+0x7a9/0x850
[   61.681610][T11670]  __x64_sys_bpf+0x7c/0x90
[   61.682772][T11670]  do_syscall_64+0xec/0x210
[   61.683967][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.685648][T11670]
[   61.686282][T11670] Freed by task 1064:
[   61.687296][T11670]  kasan_save_track+0x3f/0x70
[   61.688498][T11670]  kasan_save_free_info+0x40/0x50
[   61.689786][T11670]  poison_slab_object+0xa6/0xe0
[   61.691059][T11670]  __kasan_slab_free+0x37/0x60
[   61.692336][T11670]  kfree+0x136/0x2f0
[   61.693549][T11670]  __cpu_map_entry_free+0x6f3/0x770
[   61.695004][T11670]  cpu_map_free+0xc0/0x180
[   61.696191][T11670]  bpf_map_free_deferred+0xe3/0x100
[   61.697703][T11670]  process_scheduled_works+0x9cb/0x14a0
[   61.699330][T11670]  worker_thread+0x85c/0xd50
[   61.700546][T11670]  kthread+0x2ef/0x390
[   61.701791][T11670]  ret_from_fork+0x4d/0x80
[   61.702942][T11670]  ret_from_fork_asm+0x11/0x20
[   61.704195][T11670]
[   61.704825][T11670] The buggy address belongs to the object at ffff888122d75200
[   61.704825][T11670]  which belongs to the cache kmalloc-cg-256 of size 256
[   61.708516][T11670] The buggy address is located 8 bytes inside of
[   61.708516][T11670]  freed 256-byte region [ffff888122d75200, ffff888122d75300)
[   61.712215][T11670]
[   61.712824][T11670] The buggy address belongs to the physical page:
[   61.714883][T11670] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x122d74
[   61.717300][T11670] head: order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[   61.719037][T11670] memcg:ffff888120d85f01
[   61.720006][T11670] flags: 0x17ff00000000840(slab|head|node=0|zone=2|lastcpupid=0x7ff)
[   61.722181][T11670] page_type: 0xffffffff()
[   61.723318][T11670] raw: 017ff00000000840 ffff88810004dcc0 dead000000000122 0000000000000000
[   61.725650][T11670] raw: 0000000000000000 0000000080100010 00000001ffffffff ffff888120d85f01
[   61.727943][T11670] head: 017ff00000000840 ffff88810004dcc0 dead000000000122 0000000000000000
[   61.730237][T11670] head: 0000000000000000 0000000080100010 00000001ffffffff ffff888120d85f01
[   61.732671][T11670] head: 017ff00000000001 ffffea00048b5d01 dead000000000122 00000000ffffffff
[   61.735029][T11670] head: 0000000200000000 0000000000000000 00000000ffffffff 0000000000000000
[   61.737400][T11670] page dumped because: kasan: bad access detected
[   61.740100][T11670] page_owner tracks the page as allocated
[   61.743121][T11670] page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 8343, tgid -2092279795 (syzbot-repro), ts 8343, free_ts 43505720198
[   61.754038][T11670]  post_alloc_hook+0x1e6/0x210
[   61.756046][T11670]  get_page_from_freelist+0x7d2/0x850
[   61.759460][T11670]  __alloc_pages+0x25e/0x580
[   61.761428][T11670]  alloc_slab_page+0x6b/0x1a0
[   61.764199][T11670]  allocate_slab+0x5d/0x200
[   61.766122][T11670]  ___slab_alloc+0xac5/0xf20
[   61.767195][T11670]  __kmalloc+0x2e0/0x4b0
[   61.769028][T11670]  fib_default_rule_add+0x4a/0x350
[   61.770394][T11670]  fib6_rules_net_init+0x42/0x100
[   61.771731][T11670]  ops_init+0x39d/0x670
[   61.773061][T11670]  setup_net+0x3bc/0xae0
[   61.774102][T11670]  copy_net_ns+0x399/0x5e0
[   61.775628][T11670]  create_new_namespaces+0x4de/0x8d0
[   61.776950][T11670]  unshare_nsproxy_namespaces+0x127/0x190
[   61.778352][T11670]  ksys_unshare+0x5e6/0xbf0
[   61.779741][T11670]  __x64_sys_unshare+0x38/0x40
[   61.781302][T11670] page last free pid 4619 tgid 4619 stack trace:
[   61.783542][T11670]  free_unref_page_prepare+0x72f/0x7c0
[   61.785018][T11670]  free_unref_page+0x37/0x3f0
[   61.786030][T11670]  __slab_free+0x351/0x3f0
[   61.786991][T11670]  qlist_free_all+0x60/0xd0
[   61.788827][T11670]  kasan_quarantine_reduce+0x15a/0x170
[   61.789951][T11670]  __kasan_slab_alloc+0x23/0x70
[   61.790999][T11670]  kmem_cache_alloc_node+0x193/0x390
[   61.792331][T11670]  kmalloc_reserve+0xa7/0x2a0
[   61.793345][T11670]  __alloc_skb+0x1ec/0x430
[   61.794435][T11670]  netlink_sendmsg+0x615/0xc80
[   61.796439][T11670]  __sock_sendmsg+0x21f/0x270
[   61.797467][T11670]  ____sys_sendmsg+0x540/0x860
[   61.798505][T11670]  __sys_sendmsg+0x2b7/0x3a0
[   61.799512][T11670]  do_syscall_64+0xec/0x210
[   61.800674][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.802021][T11670]
[   61.802526][T11670] Memory state around the buggy address:
[   61.803701][T11670]  ffff888122d75100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.805694][T11670]  ffff888122d75180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.808104][T11670] >ffff888122d75200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   61.809769][T11670]                       ^
[   61.810672][T11670]  ffff888122d75280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   61.812532][T11670]  ffff888122d75300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.814846][T11670] ==================================================================
[   61.816914][T11670] Kernel panic - not syncing: KASAN: panic_on_warn set ...
[   61.818415][T11670] CPU: 1 PID: 11670 Comm: syzbot-repro Not tainted 6.9.0-rc6-00053-g0106679839f7 #27
[   61.821191][T11670] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[   61.822911][T11670] Call Trace:
[   61.823632][T11670]  <TASK>
[   61.824525][T11670]  dump_stack_lvl+0x241/0x360
[   61.825545][T11670]  ? tcp_gro_dev_warn+0x260/0x260
[   61.826706][T11670]  ? panic+0x850/0x850
[   61.828594][T11670]  ? lock_release+0x85/0x860
[   61.829749][T11670]  ? vscnprintf+0x5d/0x80
[   61.830951][T11670]  panic+0x335/0x850
[   61.832316][T11670]  ? check_panic_on_warn+0x21/0xa0
[   61.834475][T11670]  ? __memcpy_flushcache+0x2c0/0x2c0
[   61.835809][T11670]  ? _raw_spin_unlock_irqrestore+0xd8/0x140
[   61.838063][T11670]  ? _raw_spin_unlock_irqrestore+0xdd/0x140
[   61.842056][T11670]  ? _raw_spin_unlock+0x40/0x40
[   61.843116][T11670]  ? print_report+0x1cc/0x210
[   61.844527][T11670]  check_panic_on_warn+0x82/0xa0
[   61.845336][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.846117][T11670]  end_report+0x48/0xa0
[   61.846790][T11670]  kasan_report+0x154/0x180
[   61.847520][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.848471][T11670]  cpu_map_enqueue+0xba/0x370
[   61.849968][T11670]  xdp_do_redirect+0x685/0xbf0
[   61.850994][T11670]  tun_xdp_act+0xe7/0x9e0
[   61.851703][T11670]  ? __tun_build_skb+0x2e0/0x2e0
[   61.852598][T11670]  tun_build_skb+0xac6/0x1140
[   61.853362][T11670]  ? tun_build_skb+0xb4/0x1140
[   61.854454][T11670]  ? tun_get_user+0x2760/0x2760
[   61.855806][T11670]  tun_get_user+0x7fa/0x2760
[   61.856734][T11670]  ? rcu_read_unlock+0xa0/0xa0
[   61.857502][T11670]  ? tun_get+0x1e/0x2f0
[   61.858171][T11670]  ? tun_get+0x1e/0x2f0
[   61.858952][T11670]  ? tun_get+0x27d/0x2f0
[   61.859637][T11670]  tun_chr_write_iter+0x111/0x1f0
[   61.860913][T11670]  vfs_write+0xa84/0xcb0
[   61.861578][T11670]  ? __lock_acquire+0x1f60/0x1f60
[   61.862376][T11670]  ? kernel_write+0x330/0x330
[   61.863221][T11670]  ? lockdep_hardirqs_on_prepare+0x43c/0x780
[   61.864230][T11670]  ? __fget_files+0x3ea/0x460
[   61.864955][T11670]  ? seqcount_lockdep_reader_access+0x157/0x220
[   61.866571][T11670]  ? __fdget_pos+0x19e/0x320
[   61.867414][T11670]  ksys_write+0x19f/0x2c0
[   61.868263][T11670]  ? __ia32_sys_read+0x90/0x90
[   61.868996][T11670]  ? ktime_get_coarse_real_ts64+0x10b/0x120
[   61.869896][T11670]  do_syscall_64+0xec/0x210
[   61.870592][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.871595][T11670] RIP: 0033:0x472a4f
[   61.873158][T11670] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 d8 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c d9 02 00 48
[   61.876447][T11670] RSP: 002b:00007f7a7a90f5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   61.877944][T11670] RAX: ffffffffffffffda RBX: 00007f7a7a911640 RCX: 0000000000472a4f
[   61.879751][T11670] RDX: 0000000000000066 RSI: 0000000020000440 RDI: 00000000000000c8
[   61.881100][T11670] RBP: 00007f7a7a90f620 R08: 0000000000000000 R09: 0000000100000000
[   61.882298][T11670] R10: 0000000100000000 R11: 0000000000000293 R12: 00007f7a7a911640
[   61.883501][T11670] R13: 000000000000006e R14: 000000000042f2f0 R15: 00007f7a7a8f1000
[   61.885999][T11670]  </TASK>

Signed-off-by: Radoslaw Zielonek <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jul 31, 2024
When cpu_map has been redirected, first the pointer to the
bpf_cpu_map_entry has been copied, then freed, and read from the copy.
To fix it, this commit introduced the refcount cpu_map_parent during
redirections to prevent use after free.

syzbot reported:

[   61.581464][T11670] ==================================================================
[   61.583323][T11670] BUG: KASAN: slab-use-after-free in cpu_map_enqueue+0xba/0x370
[   61.585419][T11670] Read of size 8 at addr ffff888122d75208 by task syzbot-repro/11670
[   61.587541][T11670]
[   61.588237][T11670] CPU: 1 PID: 11670 Comm: syzbot-repro Not tainted 6.9.0-rc6-00053-g0106679839f7 #27
[   61.590542][T11670] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[   61.592798][T11670] Call Trace:
[   61.593885][T11670]  <TASK>
[   61.594805][T11670]  dump_stack_lvl+0x241/0x360
[   61.595974][T11670]  ? tcp_gro_dev_warn+0x260/0x260
[   61.598242][T11670]  ? __wake_up_klogd+0xcc/0x100
[   61.599407][T11670]  ? panic+0x850/0x850
[   61.600516][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.602073][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.603496][T11670]  print_address_description+0x7b/0x360
[   61.605170][T11670]  print_report+0xfd/0x210
[   61.606370][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.607925][T11670]  ? __virt_addr_valid+0x182/0x510
[   61.609577][T11670]  ? __virt_addr_valid+0x43d/0x510
[   61.610948][T11670]  ? __phys_addr+0xb9/0x170
[   61.612103][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.613448][T11670]  kasan_report+0x143/0x180
[   61.615000][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.616181][T11670]  cpu_map_enqueue+0xba/0x370
[   61.617620][T11670]  xdp_do_redirect+0x685/0xbf0
[   61.618787][T11670]  tun_xdp_act+0xe7/0x9e0
[   61.619856][T11670]  ? __tun_build_skb+0x2e0/0x2e0
[   61.621356][T11670]  tun_build_skb+0xac6/0x1140
[   61.622602][T11670]  ? tun_build_skb+0xb4/0x1140
[   61.623880][T11670]  ? tun_get_user+0x2760/0x2760
[   61.625341][T11670]  tun_get_user+0x7fa/0x2760
[   61.626532][T11670]  ? rcu_read_unlock+0xa0/0xa0
[   61.627725][T11670]  ? tun_get+0x1e/0x2f0
[   61.629147][T11670]  ? tun_get+0x1e/0x2f0
[   61.630265][T11670]  ? tun_get+0x27d/0x2f0
[   61.631486][T11670]  tun_chr_write_iter+0x111/0x1f0
[   61.632855][T11670]  vfs_write+0xa84/0xcb0
[   61.634185][T11670]  ? __lock_acquire+0x1f60/0x1f60
[   61.635501][T11670]  ? kernel_write+0x330/0x330
[   61.636757][T11670]  ? lockdep_hardirqs_on_prepare+0x43c/0x780
[   61.638445][T11670]  ? __fget_files+0x3ea/0x460
[   61.639448][T11670]  ? seqcount_lockdep_reader_access+0x157/0x220
[   61.641217][T11670]  ? __fdget_pos+0x19e/0x320
[   61.642426][T11670]  ksys_write+0x19f/0x2c0
[   61.643576][T11670]  ? __ia32_sys_read+0x90/0x90
[   61.644841][T11670]  ? ktime_get_coarse_real_ts64+0x10b/0x120
[   61.646549][T11670]  do_syscall_64+0xec/0x210
[   61.647832][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.649485][T11670] RIP: 0033:0x472a4f
[   61.650539][T11670] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 d8 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c d9 02 00 48
[   61.655476][T11670] RSP: 002b:00007f7a7a90f5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   61.657675][T11670] RAX: ffffffffffffffda RBX: 00007f7a7a911640 RCX: 0000000000472a4f
[   61.659658][T11670] RDX: 0000000000000066 RSI: 0000000020000440 RDI: 00000000000000c8
[   61.661980][T11670] RBP: 00007f7a7a90f620 R08: 0000000000000000 R09: 0000000100000000
[   61.663982][T11670] R10: 0000000100000000 R11: 0000000000000293 R12: 00007f7a7a911640
[   61.666425][T11670] R13: 000000000000006e R14: 000000000042f2f0 R15: 00007f7a7a8f1000
[   61.668443][T11670]  </TASK>
[   61.669233][T11670]
[   61.669754][T11670] Allocated by task 11643:
[   61.670855][T11670]  kasan_save_track+0x3f/0x70
[   61.672094][T11670]  __kasan_kmalloc+0x98/0xb0
[   61.673466][T11670]  __kmalloc_node+0x259/0x4f0
[   61.674687][T11670]  bpf_map_kmalloc_node+0xd3/0x1c0
[   61.676069][T11670]  cpu_map_update_elem+0x2f0/0x1000
[   61.677619][T11670]  bpf_map_update_value+0x1b2/0x540
[   61.679006][T11670]  map_update_elem+0x52f/0x6e0
[   61.680076][T11670]  __sys_bpf+0x7a9/0x850
[   61.681610][T11670]  __x64_sys_bpf+0x7c/0x90
[   61.682772][T11670]  do_syscall_64+0xec/0x210
[   61.683967][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.685648][T11670]
[   61.686282][T11670] Freed by task 1064:
[   61.687296][T11670]  kasan_save_track+0x3f/0x70
[   61.688498][T11670]  kasan_save_free_info+0x40/0x50
[   61.689786][T11670]  poison_slab_object+0xa6/0xe0
[   61.691059][T11670]  __kasan_slab_free+0x37/0x60
[   61.692336][T11670]  kfree+0x136/0x2f0
[   61.693549][T11670]  __cpu_map_entry_free+0x6f3/0x770
[   61.695004][T11670]  cpu_map_free+0xc0/0x180
[   61.696191][T11670]  bpf_map_free_deferred+0xe3/0x100
[   61.697703][T11670]  process_scheduled_works+0x9cb/0x14a0
[   61.699330][T11670]  worker_thread+0x85c/0xd50
[   61.700546][T11670]  kthread+0x2ef/0x390
[   61.701791][T11670]  ret_from_fork+0x4d/0x80
[   61.702942][T11670]  ret_from_fork_asm+0x11/0x20
[   61.704195][T11670]
[   61.704825][T11670] The buggy address belongs to the object at ffff888122d75200
[   61.704825][T11670]  which belongs to the cache kmalloc-cg-256 of size 256
[   61.708516][T11670] The buggy address is located 8 bytes inside of
[   61.708516][T11670]  freed 256-byte region [ffff888122d75200, ffff888122d75300)
[   61.712215][T11670]
[   61.712824][T11670] The buggy address belongs to the physical page:
[   61.714883][T11670] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x122d74
[   61.717300][T11670] head: order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[   61.719037][T11670] memcg:ffff888120d85f01
[   61.720006][T11670] flags: 0x17ff00000000840(slab|head|node=0|zone=2|lastcpupid=0x7ff)
[   61.722181][T11670] page_type: 0xffffffff()
[   61.723318][T11670] raw: 017ff00000000840 ffff88810004dcc0 dead000000000122 0000000000000000
[   61.725650][T11670] raw: 0000000000000000 0000000080100010 00000001ffffffff ffff888120d85f01
[   61.727943][T11670] head: 017ff00000000840 ffff88810004dcc0 dead000000000122 0000000000000000
[   61.730237][T11670] head: 0000000000000000 0000000080100010 00000001ffffffff ffff888120d85f01
[   61.732671][T11670] head: 017ff00000000001 ffffea00048b5d01 dead000000000122 00000000ffffffff
[   61.735029][T11670] head: 0000000200000000 0000000000000000 00000000ffffffff 0000000000000000
[   61.737400][T11670] page dumped because: kasan: bad access detected
[   61.740100][T11670] page_owner tracks the page as allocated
[   61.743121][T11670] page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 8343, tgid -2092279795 (syzbot-repro), ts 8343, free_ts 43505720198
[   61.754038][T11670]  post_alloc_hook+0x1e6/0x210
[   61.756046][T11670]  get_page_from_freelist+0x7d2/0x850
[   61.759460][T11670]  __alloc_pages+0x25e/0x580
[   61.761428][T11670]  alloc_slab_page+0x6b/0x1a0
[   61.764199][T11670]  allocate_slab+0x5d/0x200
[   61.766122][T11670]  ___slab_alloc+0xac5/0xf20
[   61.767195][T11670]  __kmalloc+0x2e0/0x4b0
[   61.769028][T11670]  fib_default_rule_add+0x4a/0x350
[   61.770394][T11670]  fib6_rules_net_init+0x42/0x100
[   61.771731][T11670]  ops_init+0x39d/0x670
[   61.773061][T11670]  setup_net+0x3bc/0xae0
[   61.774102][T11670]  copy_net_ns+0x399/0x5e0
[   61.775628][T11670]  create_new_namespaces+0x4de/0x8d0
[   61.776950][T11670]  unshare_nsproxy_namespaces+0x127/0x190
[   61.778352][T11670]  ksys_unshare+0x5e6/0xbf0
[   61.779741][T11670]  __x64_sys_unshare+0x38/0x40
[   61.781302][T11670] page last free pid 4619 tgid 4619 stack trace:
[   61.783542][T11670]  free_unref_page_prepare+0x72f/0x7c0
[   61.785018][T11670]  free_unref_page+0x37/0x3f0
[   61.786030][T11670]  __slab_free+0x351/0x3f0
[   61.786991][T11670]  qlist_free_all+0x60/0xd0
[   61.788827][T11670]  kasan_quarantine_reduce+0x15a/0x170
[   61.789951][T11670]  __kasan_slab_alloc+0x23/0x70
[   61.790999][T11670]  kmem_cache_alloc_node+0x193/0x390
[   61.792331][T11670]  kmalloc_reserve+0xa7/0x2a0
[   61.793345][T11670]  __alloc_skb+0x1ec/0x430
[   61.794435][T11670]  netlink_sendmsg+0x615/0xc80
[   61.796439][T11670]  __sock_sendmsg+0x21f/0x270
[   61.797467][T11670]  ____sys_sendmsg+0x540/0x860
[   61.798505][T11670]  __sys_sendmsg+0x2b7/0x3a0
[   61.799512][T11670]  do_syscall_64+0xec/0x210
[   61.800674][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.802021][T11670]
[   61.802526][T11670] Memory state around the buggy address:
[   61.803701][T11670]  ffff888122d75100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.805694][T11670]  ffff888122d75180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.808104][T11670] >ffff888122d75200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   61.809769][T11670]                       ^
[   61.810672][T11670]  ffff888122d75280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   61.812532][T11670]  ffff888122d75300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   61.814846][T11670] ==================================================================
[   61.816914][T11670] Kernel panic - not syncing: KASAN: panic_on_warn set ...
[   61.818415][T11670] CPU: 1 PID: 11670 Comm: syzbot-repro Not tainted 6.9.0-rc6-00053-g0106679839f7 #27
[   61.821191][T11670] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[   61.822911][T11670] Call Trace:
[   61.823632][T11670]  <TASK>
[   61.824525][T11670]  dump_stack_lvl+0x241/0x360
[   61.825545][T11670]  ? tcp_gro_dev_warn+0x260/0x260
[   61.826706][T11670]  ? panic+0x850/0x850
[   61.828594][T11670]  ? lock_release+0x85/0x860
[   61.829749][T11670]  ? vscnprintf+0x5d/0x80
[   61.830951][T11670]  panic+0x335/0x850
[   61.832316][T11670]  ? check_panic_on_warn+0x21/0xa0
[   61.834475][T11670]  ? __memcpy_flushcache+0x2c0/0x2c0
[   61.835809][T11670]  ? _raw_spin_unlock_irqrestore+0xd8/0x140
[   61.838063][T11670]  ? _raw_spin_unlock_irqrestore+0xdd/0x140
[   61.842056][T11670]  ? _raw_spin_unlock+0x40/0x40
[   61.843116][T11670]  ? print_report+0x1cc/0x210
[   61.844527][T11670]  check_panic_on_warn+0x82/0xa0
[   61.845336][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.846117][T11670]  end_report+0x48/0xa0
[   61.846790][T11670]  kasan_report+0x154/0x180
[   61.847520][T11670]  ? cpu_map_enqueue+0xba/0x370
[   61.848471][T11670]  cpu_map_enqueue+0xba/0x370
[   61.849968][T11670]  xdp_do_redirect+0x685/0xbf0
[   61.850994][T11670]  tun_xdp_act+0xe7/0x9e0
[   61.851703][T11670]  ? __tun_build_skb+0x2e0/0x2e0
[   61.852598][T11670]  tun_build_skb+0xac6/0x1140
[   61.853362][T11670]  ? tun_build_skb+0xb4/0x1140
[   61.854454][T11670]  ? tun_get_user+0x2760/0x2760
[   61.855806][T11670]  tun_get_user+0x7fa/0x2760
[   61.856734][T11670]  ? rcu_read_unlock+0xa0/0xa0
[   61.857502][T11670]  ? tun_get+0x1e/0x2f0
[   61.858171][T11670]  ? tun_get+0x1e/0x2f0
[   61.858952][T11670]  ? tun_get+0x27d/0x2f0
[   61.859637][T11670]  tun_chr_write_iter+0x111/0x1f0
[   61.860913][T11670]  vfs_write+0xa84/0xcb0
[   61.861578][T11670]  ? __lock_acquire+0x1f60/0x1f60
[   61.862376][T11670]  ? kernel_write+0x330/0x330
[   61.863221][T11670]  ? lockdep_hardirqs_on_prepare+0x43c/0x780
[   61.864230][T11670]  ? __fget_files+0x3ea/0x460
[   61.864955][T11670]  ? seqcount_lockdep_reader_access+0x157/0x220
[   61.866571][T11670]  ? __fdget_pos+0x19e/0x320
[   61.867414][T11670]  ksys_write+0x19f/0x2c0
[   61.868263][T11670]  ? __ia32_sys_read+0x90/0x90
[   61.868996][T11670]  ? ktime_get_coarse_real_ts64+0x10b/0x120
[   61.869896][T11670]  do_syscall_64+0xec/0x210
[   61.870592][T11670]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   61.871595][T11670] RIP: 0033:0x472a4f
[   61.873158][T11670] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 d8 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c d9 02 00 48
[   61.876447][T11670] RSP: 002b:00007f7a7a90f5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   61.877944][T11670] RAX: ffffffffffffffda RBX: 00007f7a7a911640 RCX: 0000000000472a4f
[   61.879751][T11670] RDX: 0000000000000066 RSI: 0000000020000440 RDI: 00000000000000c8
[   61.881100][T11670] RBP: 00007f7a7a90f620 R08: 0000000000000000 R09: 0000000100000000
[   61.882298][T11670] R10: 0000000100000000 R11: 0000000000000293 R12: 00007f7a7a911640
[   61.883501][T11670] R13: 000000000000006e R14: 000000000042f2f0 R15: 00007f7a7a8f1000
[   61.885999][T11670]  </TASK>

Signed-off-by: Radoslaw Zielonek <[email protected]>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Oct 10, 2024
Wesley reported an issue:

==================================================================
EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
------------[ cut here ]------------
kernel BUG at fs/ext4/resize.c:324!
CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ kernel-patches#27
RIP: 0010:ext4_resize_fs+0x1212/0x12d0
Call Trace:
 __ext4_ioctl+0x4e0/0x1800
 ext4_ioctl+0x12/0x20
 __x64_sys_ioctl+0x99/0xd0
 x64_sys_call+0x1206/0x20d0
 do_syscall_64+0x72/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
==================================================================

While reviewing the patch, Honza found that when adjusting resize_bg in
alloc_flex_gd(), it was possible for flex_gd->resize_bg to be bigger than
flexbg_size.

The reproduction of the problem requires the following:

 o_group = flexbg_size * 2 * n;
 o_size = (o_group + 1) * group_size;
 n_group: [o_group + flexbg_size, o_group + flexbg_size * 2)
 o_size = (n_group + 1) * group_size;

Take n=0,flexbg_size=16 as an example:

              last:15
|o---------------|--------------n-|
o_group:0    resize to      n_group:30

The corresponding reproducer is:

img=test.img
rm -f $img
truncate -s 600M $img
mkfs.ext4 -F $img -b 1024 -G 16 8M
dev=`losetup -f --show $img`
mkdir -p /tmp/test
mount $dev /tmp/test
resize2fs $dev 248M

Delete the problematic plus 1 to fix the issue, and add a WARN_ON_ONCE()
to prevent the issue from happening again.

[ Note: another reproucer which this commit fixes is:

  img=test.img
  rm -f $img
  truncate -s 25MiB $img
  mkfs.ext4 -b 4096 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $img
  truncate -s 3GiB $img
  dev=`losetup -f --show $img`
  mkdir -p /tmp/test
  mount $dev /tmp/test
  resize2fs $dev 3G
  umount $dev
  losetup -d $dev

  -- TYT ]

Reported-by: Wesley Hershberger <[email protected]>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231
Reported-by: Stéphane Graber <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Tested-by: Alexander Mikhalitsyn <[email protected]>
Tested-by: Eric Sandeen <[email protected]>
Fixes: 665d3e0 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
Cc: [email protected]
Signed-off-by: Baokun Li <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Mar 10, 2025
The bnxt_rx_pkt() updates ip_summed value at the end if checksum offload
is enabled.
When the XDP-MB program is attached and it returns XDP_PASS, the
bnxt_xdp_build_skb() is called to update skb_shared_info.
The main purpose of bnxt_xdp_build_skb() is to update skb_shared_info,
but it updates ip_summed value too if checksum offload is enabled.
This is actually duplicate work.

When the bnxt_rx_pkt() updates ip_summed value, it checks if ip_summed
is CHECKSUM_NONE or not.
It means that ip_summed should be CHECKSUM_NONE at this moment.
But ip_summed may already be updated to CHECKSUM_UNNECESSARY in the
XDP-MB-PASS path.
So the by skb_checksum_none_assert() WARNS about it.

This is duplicate work and updating ip_summed in the
bnxt_xdp_build_skb() is not needed.

Splat looks like:
WARNING: CPU: 3 PID: 5782 at ./include/linux/skbuff.h:5155 bnxt_rx_pkt+0x479b/0x7610 [bnxt_en]
Modules linked in: bnxt_re bnxt_en rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_]
CPU: 3 UID: 0 PID: 5782 Comm: socat Tainted: G        W          6.14.0-rc4+ kernel-patches#27
Tainted: [W]=WARN
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
RIP: 0010:bnxt_rx_pkt+0x479b/0x7610 [bnxt_en]
Code: 54 24 0c 4c 89 f1 4c 89 ff c1 ea 1f ff d3 0f 1f 00 49 89 c6 48 85 c0 0f 84 4c e5 ff ff 48 89 c7 e8 ca 3d a0 c8 e9 8f f4 ff ff <0f> 0b f
RSP: 0018:ffff88881ba09928 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000c7590303 RCX: 0000000000000000
RDX: 1ffff1104e7d1610 RSI: 0000000000000001 RDI: ffff8881c91300b8
RBP: ffff88881ba09b28 R08: ffff888273e8b0d0 R09: ffff888273e8b070
R10: ffff888273e8b010 R11: ffff888278b0f000 R12: ffff888273e8b080
R13: ffff8881c9130e00 R14: ffff8881505d3800 R15: ffff888273e8b000
FS:  00007f5a2e7be080(0000) GS:ffff88881ba00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff2e708ff8 CR3: 000000013e3b0000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
 <IRQ>
 ? __warn+0xcd/0x2f0
 ? bnxt_rx_pkt+0x479b/0x7610
 ? report_bug+0x326/0x3c0
 ? handle_bug+0x53/0xa0
 ? exc_invalid_op+0x14/0x50
 ? asm_exc_invalid_op+0x16/0x20
 ? bnxt_rx_pkt+0x479b/0x7610
 ? bnxt_rx_pkt+0x3e41/0x7610
 ? __pfx_bnxt_rx_pkt+0x10/0x10
 ? napi_complete_done+0x2cf/0x7d0
 __bnxt_poll_work+0x4e8/0x1220
 ? __pfx___bnxt_poll_work+0x10/0x10
 ? __pfx_mark_lock.part.0+0x10/0x10
 bnxt_poll_p5+0x36a/0xfa0
 ? __pfx_bnxt_poll_p5+0x10/0x10
 __napi_poll.constprop.0+0xa0/0x440
 net_rx_action+0x899/0xd00
...

Following ping.py patch adds xdp-mb-pass case. so ping.py is going
to be able to reproduce this issue.

Fixes: 1dc4c55 ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff")
Signed-off-by: Taehee Yoo <[email protected]>
Reviewed-by: Somnath Kotur <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jun 1, 2025
The commit 59c68ac ("iw_cm: free cm_id resources on the last
deref") simplified cm_id resource management by freeing cm_id once all
references to the cm_id were removed. The references are removed either
upon completion of iw_cm event handlers or when the application destroys
the cm_id. This commit introduced the use-after-free condition where
cm_id_private object could still be in use by event handler works during
the destruction of cm_id. The commit aee2424 ("RDMA/iwcm: Fix a
use-after-free related to destroying CM IDs") addressed this use-after-
free by flushing all pending works at the cm_id destruction.

However, still another use-after-free possibility remained. It happens
with the work objects allocated for each cm_id_priv within
alloc_work_entries() during cm_id creation, and subsequently freed in
dealloc_work_entries() once all references to the cm_id are removed.
If the cm_id's last reference is decremented in the event handler work,
the work object for the work itself gets removed, and causes the use-
after-free BUG below:

  BUG: KASAN: slab-use-after-free in __pwq_activate_work+0x1ff/0x250
  Read of size 8 at addr ffff88811f9cf800 by task kworker/u16:1/147091

  CPU: 2 UID: 0 PID: 147091 Comm: kworker/u16:1 Not tainted 6.15.0-rc2+ #27 PREEMPT(voluntary)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
  Workqueue:  0x0 (iw_cm_wq)
  Call Trace:
   <TASK>
   dump_stack_lvl+0x6a/0x90
   print_report+0x174/0x554
   ? __virt_addr_valid+0x208/0x430
   ? __pwq_activate_work+0x1ff/0x250
   kasan_report+0xae/0x170
   ? __pwq_activate_work+0x1ff/0x250
   __pwq_activate_work+0x1ff/0x250
   pwq_dec_nr_in_flight+0x8c5/0xfb0
   process_one_work+0xc11/0x1460
   ? __pfx_process_one_work+0x10/0x10
   ? assign_work+0x16c/0x240
   worker_thread+0x5ef/0xfd0
   ? __pfx_worker_thread+0x10/0x10
   kthread+0x3b0/0x770
   ? __pfx_kthread+0x10/0x10
   ? rcu_is_watching+0x11/0xb0
   ? _raw_spin_unlock_irq+0x24/0x50
   ? rcu_is_watching+0x11/0xb0
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x30/0x70
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1a/0x30
   </TASK>

  Allocated by task 147416:
   kasan_save_stack+0x2c/0x50
   kasan_save_track+0x10/0x30
   __kasan_kmalloc+0xa6/0xb0
   alloc_work_entries+0xa9/0x260 [iw_cm]
   iw_cm_connect+0x23/0x4a0 [iw_cm]
   rdma_connect_locked+0xbfd/0x1920 [rdma_cm]
   nvme_rdma_cm_handler+0x8e5/0x1b60 [nvme_rdma]
   cma_cm_event_handler+0xae/0x320 [rdma_cm]
   cma_work_handler+0x106/0x1b0 [rdma_cm]
   process_one_work+0x84f/0x1460
   worker_thread+0x5ef/0xfd0
   kthread+0x3b0/0x770
   ret_from_fork+0x30/0x70
   ret_from_fork_asm+0x1a/0x30

  Freed by task 147091:
   kasan_save_stack+0x2c/0x50
   kasan_save_track+0x10/0x30
   kasan_save_free_info+0x37/0x60
   __kasan_slab_free+0x4b/0x70
   kfree+0x13a/0x4b0
   dealloc_work_entries+0x125/0x1f0 [iw_cm]
   iwcm_deref_id+0x6f/0xa0 [iw_cm]
   cm_work_handler+0x136/0x1ba0 [iw_cm]
   process_one_work+0x84f/0x1460
   worker_thread+0x5ef/0xfd0
   kthread+0x3b0/0x770
   ret_from_fork+0x30/0x70
   ret_from_fork_asm+0x1a/0x30

  Last potentially related work creation:
   kasan_save_stack+0x2c/0x50
   kasan_record_aux_stack+0xa3/0xb0
   __queue_work+0x2ff/0x1390
   queue_work_on+0x67/0xc0
   cm_event_handler+0x46a/0x820 [iw_cm]
   siw_cm_upcall+0x330/0x650 [siw]
   siw_cm_work_handler+0x6b9/0x2b20 [siw]
   process_one_work+0x84f/0x1460
   worker_thread+0x5ef/0xfd0
   kthread+0x3b0/0x770
   ret_from_fork+0x30/0x70
   ret_from_fork_asm+0x1a/0x30

This BUG is reproducible by repeating the blktests test case nvme/061
for the rdma transport and the siw driver.

To avoid the use-after-free of cm_id_private work objects, ensure that
the last reference to the cm_id is decremented not in the event handler
works, but in the cm_id destruction context. For that purpose, move
iwcm_deref_id() call from destroy_cm_id() to the callers of
destroy_cm_id(). In iw_destroy_cm_id(), call iwcm_deref_id() after
flushing the pending works.

During the fix work, I noticed that iw_destroy_cm_id() is called from
cm_work_handler() and process_event() context. However, the comment of
iw_destroy_cm_id() notes that the function "cannot be called by the
event thread". Drop the false comment.

Closes: https://lore.kernel.org/linux-rdma/r5676e754sv35aq7cdsqrlnvyhiq5zktteaurl7vmfih35efko@z6lay7uypy3c/
Fixes: 59c68ac ("iw_cm: free cm_id resources on the last deref")
Cc: [email protected]
Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Link: https://patch.msgid.link/[email protected]
Reviewed-by: Zhu Yanjun <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants