Skip to content

TEST1 THIS IS JUST A TEST #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

TEST1 THIS IS JUST A TEST #4

wants to merge 1 commit into from

Conversation

LinuxMinion
Copy link
Member

Test update process

Test update process
@LinuxMinion LinuxMinion self-assigned this Jul 21, 2018
@LinuxMinion
Copy link
Member Author

This is a test of the pull request.

gregmarsden pushed a commit that referenced this pull request Aug 31, 2018
commit 2546da9 upstream.

The RX SGL in processing is already registered with the RX SGL tracking
list to support proper cleanup. The cleanup code path uses the
sg_num_bytes variable which must therefore be always initialized, even
in the error code path.

Signed-off-by: Stephan Mueller <[email protected]>
Reported-by: [email protected]
#syz test: https://github.com/google/kmsan.git master
CC: <[email protected]> #4.14
Fixes: e870456 ("crypto: algif_skcipher - overhaul memory management")
Fixes: d887c52 ("crypto: algif_aead - overhaul memory management")
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Aug 31, 2018
commit 3f6e698 upstream.

Since commit 1bb8866 ("mtd: nand: denali: handle timing parameters
by setup_data_interface()"), denali_dt.c gets the clock rate from the
clock driver.  The driver expects the frequency of the bus interface
clock, whereas the clock driver of SOCFPGA provides the core clock.
Thus, the setup_data_interface() hook calculates timing parameters
based on a wrong frequency.

To make it work without relying on the clock driver, hard-code the clock
frequency, 200MHz.  This is fine for existing DT of UniPhier, and also
fixes the issue of SOCFPGA because both platforms use 200 MHz for the
bus interface clock.

Fixes: 1bb8866 ("mtd: nand: denali: handle timing parameters by setup_data_interface()")
Cc: linux-stable <[email protected]> #4.14+
Reported-by: Philipp Rosenberger <[email protected]>
Suggested-by: Boris Brezillon <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Tested-by: Richard Weinberger <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Aug 31, 2018
commit af1fc5b upstream.

This manifsted as strace segfaulting on HSDK because gcc was targetting
the accumulator registers as GPRs, which kernek was not saving/restoring
by default.

Cc: [email protected]   #4.14+
Signed-off-by: Vineet Gupta <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Aug 31, 2018
[ Upstream commit 4f4616c ]

Similar to what we do when we remove a PCI function, set the
QEDF_UNLOADING flag to prevent any requests from being queued while a
vport is being deleted.  This prevents any requests from getting stuck
in limbo when the vport is unloaded or deleted.

Fixes the crash:

PID: 106676  TASK: ffff9a436aa90000  CPU: 12  COMMAND: "multipathd"
 #0 [ffff9a43567d3550] machine_kexec+522 at ffffffffaca60b2a
 #1 [ffff9a43567d35b0] __crash_kexec+114 at ffffffffacb13512
 #2 [ffff9a43567d3680] crash_kexec+48 at ffffffffacb13600
 #3 [ffff9a43567d3698] oops_end+168 at ffffffffad117768
 #4 [ffff9a43567d36c0] no_context+645 at ffffffffad106f52
 #5 [ffff9a43567d3710] __bad_area_nosemaphore+116 at ffffffffad106fe9
 #6 [ffff9a43567d3760] bad_area+70 at ffffffffad107379
 #7 [ffff9a43567d3788] __do_page_fault+1247 at ffffffffad11a8cf
 #8 [ffff9a43567d37f0] do_page_fault+53 at ffffffffad11a915
 #9 [ffff9a43567d3820] page_fault+40 at ffffffffad116768
    [exception RIP: qedf_init_task+61]
    RIP: ffffffffc0e13c2d  RSP: ffff9a43567d38d0  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: ffffbe920472c738  RCX: ffff9a434fa0e3e8
    RDX: ffff9a434f695280  RSI: ffffbe920472c738  RDI: ffff9a43aa359c80
    RBP: ffff9a43567d3950   R8: 0000000000000c15   R9: ffff9a3fb09b9880
    R10: ffff9a434fa0e3e8  R11: ffff9a43567d35ce  R12: 0000000000000000
    R13: ffff9a434f695280  R14: ffff9a43aa359c80  R15: ffff9a3fb9e005c0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

Signed-off-by: Chad Dupuis <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Aug 31, 2018
commit e8d4bfe upstream.

When executing filesystem sync or umount on overlayfs,
dirty data does not get synced as expected on upper filesystem.
This patch fixes sync filesystem method to keep data consistency
for overlayfs.

Signed-off-by: Chengguang Xu <[email protected]>
Fixes: e593b2b ("ovl: properly implement sync_filesystem()")
Cc: <[email protected]> #4.11
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Sep 7, 2018
commit 89da619 upstream.

Kernel panic when with high memory pressure, calltrace looks like,

PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java"
 #0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb
 #1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942
 #2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30
 #3 [ffff881ec7ed7778] oops_end at ffffffff816902c8
 #4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46
 #5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc
 #6 [ffff881ec7ed7838] __node_set at ffffffff81680300
 #7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f
 #8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5
 #9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8
    [exception RIP: _raw_spin_lock_irqsave+47]
    RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046
    RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8
    RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008
    RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098
    R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

It happens in the pagefault and results in double pagefault
during compacting pages when memory allocation fails.

Analysed the vmcore, the page leads to second pagefault is corrupted
with _mapcount=-256, but private=0.

It's caused by the race between migration and ballooning, and lock
missing in virtballoon_migratepage() of virtio_balloon driver.
This patch fix the bug.

Fixes: e225042 ("virtio_balloon: introduce migration primitives to balloon pages")
Cc: [email protected]
Signed-off-by: Jiang Biao <[email protected]>
Signed-off-by: Huang Chong <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Sep 7, 2018
commit 147b27e upstream.

It may cause race by setting 'nvmeq' in nvme_init_request()
because .init_request is called inside switching io scheduler, which
may happen when the NVMe device is being resetted and its nvme queues
are being freed and created. We don't have any sync between the two
pathes.

This patch changes the nvmeq allocation to occur at probe time so
there is no way we can dereference it at init_request.

[   93.268391] kernel BUG at drivers/nvme/host/pci.c:408!
[   93.274146] invalid opcode: 0000 [#1] SMP
[   93.278618] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss
nfsv4 dns_resolver nfs lockd grace fscache sunrpc ipmi_ssif vfat fat
intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt
intel_cstate ipmi_si iTCO_vendor_support intel_uncore mxm_wmi mei_me
ipmi_devintf intel_rapl_perf pcspkr sg ipmi_msghandler lpc_ich dcdbas mei
shpchp acpi_power_meter wmi dm_multipath ip_tables xfs libcrc32c sd_mod
mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops ttm drm ahci libahci nvme libata crc32c_intel nvme_core tg3
megaraid_sas ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[   93.349071] CPU: 5 PID: 1842 Comm: sh Not tainted 4.15.0-rc2.ming+ #4
[   93.356256] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017
[   93.364801] task: 00000000fb8abf2a task.stack: 0000000028bd82d1
[   93.371408] RIP: 0010:nvme_init_request+0x36/0x40 [nvme]
[   93.377333] RSP: 0018:ffffc90002537ca8 EFLAGS: 00010246
[   93.383161] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000008
[   93.391122] RDX: 0000000000000000 RSI: ffff880276ae0000 RDI: ffff88047bae9008
[   93.399084] RBP: ffff88047bae9008 R08: ffff88047bae9008 R09: 0000000009dabc00
[   93.407045] R10: 0000000000000004 R11: 000000000000299c R12: ffff880186bc1f00
[   93.415007] R13: ffff880276ae0000 R14: 0000000000000000 R15: 0000000000000071
[   93.422969] FS:  00007f33cf288740(0000) GS:ffff88047ba80000(0000) knlGS:0000000000000000
[   93.431996] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   93.438407] CR2: 00007f33cf28e000 CR3: 000000047e5bb006 CR4: 00000000001606e0
[   93.446368] Call Trace:
[   93.449103]  blk_mq_alloc_rqs+0x231/0x2a0
[   93.453579]  blk_mq_sched_alloc_tags.isra.8+0x42/0x80
[   93.459214]  blk_mq_init_sched+0x7e/0x140
[   93.463687]  elevator_switch+0x5a/0x1f0
[   93.467966]  ? elevator_get.isra.17+0x52/0xc0
[   93.472826]  elv_iosched_store+0xde/0x150
[   93.477299]  queue_attr_store+0x4e/0x90
[   93.481580]  kernfs_fop_write+0xfa/0x180
[   93.485958]  __vfs_write+0x33/0x170
[   93.489851]  ? __inode_security_revalidate+0x4c/0x60
[   93.495390]  ? selinux_file_permission+0xda/0x130
[   93.500641]  ? _cond_resched+0x15/0x30
[   93.504815]  vfs_write+0xad/0x1a0
[   93.508512]  SyS_write+0x52/0xc0
[   93.512113]  do_syscall_64+0x61/0x1a0
[   93.516199]  entry_SYSCALL64_slow_path+0x25/0x25
[   93.521351] RIP: 0033:0x7f33ce96aab0
[   93.525337] RSP: 002b:00007ffe57570238 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   93.533785] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f33ce96aab0
[   93.541746] RDX: 0000000000000006 RSI: 00007f33cf28e000 RDI: 0000000000000001
[   93.549707] RBP: 00007f33cf28e000 R08: 000000000000000a R09: 00007f33cf288740
[   93.557669] R10: 00007f33cf288740 R11: 0000000000000246 R12: 00007f33cec42400
[   93.565630] R13: 0000000000000006 R14: 0000000000000001 R15: 0000000000000000
[   93.573592] Code: 4c 8d 40 08 4c 39 c7 74 16 48 8b 00 48 8b 04 08 48 85 c0
74 16 48 89 86 78 01 00 00 31 c0 c3 8d 4a 01 48 63 c9 48 c1 e1 03 eb de <0f>
0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 f6 53 48 89
[   93.594676] RIP: nvme_init_request+0x36/0x40 [nvme] RSP: ffffc90002537ca8
[   93.602273] ---[ end trace 810dde3993e5f14e ]---

Reported-by: Yi Zhang <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jon Derrick <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Sep 21, 2018
The writeback thread would exit with a lock held when the cache device
is detached via sysfs interface, fix it by releasing the held lock
before exiting the while-loop.

Fixes: fadd94e (bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set)
Signed-off-by: Shan Hai <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Tested-by: Shenghui Wang <[email protected]>
Cc: [email protected] #4.17+
Signed-off-by: Jens Axboe <[email protected]>
(cherry picked from commit 3943b04)

Orabug: 28335202

Signed-off-by: Shan Hai <[email protected]>
Reviewed-by: Darren Kenny <[email protected]>
gregmarsden pushed a commit that referenced this pull request Sep 21, 2018
Orabug: 28518694

We're consistently hitting deadlocks here with XFS on recent kernels.
After some digging through the crash files, it looks like everyone in
the system is waiting for XFS to reclaim memory.

Something like this:

PID: 2733434  TASK: ffff8808cd242800  CPU: 19  COMMAND: "java"
 #0 [ffff880019c53588] __schedule at ffffffff818c4df2
 #1 [ffff880019c535d8] schedule at ffffffff818c5517
 #2 [ffff880019c535f8] _xfs_log_force_lsn at ffffffff81316348
 #3 [ffff880019c53688] xfs_log_force_lsn at ffffffff813164fb
 #4 [ffff880019c536b8] xfs_iunpin_wait at ffffffff8130835e
 #5 [ffff880019c53728] xfs_reclaim_inode at ffffffff812fd453
 #6 [ffff880019c53778] xfs_reclaim_inodes_ag at ffffffff812fd8c7
 #7 [ffff880019c53928] xfs_reclaim_inodes_nr at ffffffff812fe433
 #8 [ffff880019c53958] xfs_fs_free_cached_objects at ffffffff8130d3b9
 #9 [ffff880019c53968] super_cache_scan at ffffffff811a6f73

xfs_log_force_lsn is waiting for logs to get cleaned, which is waiting
for IO, which is waiting for workers to complete the IO which is waiting
for worker threads that don't exist yet:

PID: 2752451  TASK: ffff880bd6bdda00  CPU: 37  COMMAND: "kworker/37:1"
 #0 [ffff8808d20abbb0] __schedule at ffffffff818c4df2
 #1 [ffff8808d20abc00] schedule at ffffffff818c5517
 #2 [ffff8808d20abc20] schedule_timeout at ffffffff818c7c6c
 #3 [ffff8808d20abcc0] wait_for_completion_killable at ffffffff818c6495
 #4 [ffff8808d20abd30] kthread_create_on_node at ffffffff8106ec82
 #5 [ffff8808d20abdf0] create_worker at ffffffff8106752f
 #6 [ffff8808d20abe40] worker_thread at ffffffff810699be
 #7 [ffff8808d20abec0] kthread at ffffffff8106ef59
 #8 [ffff8808d20abf50] ret_from_fork at ffffffff818c8ac8

I think we should be using WQ_MEM_RECLAIM to make sure this thread
pool makes progress when we're not able to allocate new workers.

[dchinner: make all workqueues WQ_MEM_RECLAIM]

Signed-off-by: Chris Mason <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>

(cherry picked from commit 7a29ac4)

Signed-off-by: Junxiao Bi <[email protected]>
Reviewed-by: Shan Hai <[email protected]>
Signed-off-by: Brian Maly <[email protected]>
gregmarsden pushed a commit that referenced this pull request Sep 28, 2018
commit 9952509 upstream.

Avoid that KASAN reports the following:

BUG: KASAN: use-after-free in srpt_close_ch+0x4f/0x1b0 [ib_srpt]
Read of size 4 at addr ffff880151180cb8 by task check/4681

CPU: 15 PID: 4681 Comm: check Not tainted 4.18.0-rc2-dbg+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0xa4/0xf5
 print_address_description+0x6f/0x270
 kasan_report+0x241/0x360
 __asan_load4+0x78/0x80
 srpt_close_ch+0x4f/0x1b0 [ib_srpt]
 srpt_set_enabled+0xf7/0x1e0 [ib_srpt]
 srpt_tpg_enable_store+0xb8/0x120 [ib_srpt]
 configfs_write_file+0x14e/0x1d0 [configfs]
 __vfs_write+0xd2/0x3b0
 vfs_write+0x101/0x270
 ksys_write+0xab/0x120
 __x64_sys_write+0x43/0x50
 do_syscall_64+0x77/0x230
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: aaf45bd ("IB/srpt: Detect session shutdown reliably")
Signed-off-by: Bart Van Assche <[email protected]>
Cc: <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Sep 28, 2018
commit a5ba1d9 upstream.

We have reports of the following crash:

    PID: 7 TASK: ffff88085c6d61c0 CPU: 1 COMMAND: "kworker/u25:0"
    #0 [ffff88085c6db710] machine_kexec at ffffffff81046239
    #1 [ffff88085c6db760] crash_kexec at ffffffff810fc248
    #2 [ffff88085c6db830] oops_end at ffffffff81008ae7
    #3 [ffff88085c6db860] no_context at ffffffff81050b8f
    #4 [ffff88085c6db8b0] __bad_area_nosemaphore at ffffffff81050d75
    #5 [ffff88085c6db900] bad_area_nosemaphore at ffffffff81050e83
    #6 [ffff88085c6db910] __do_page_fault at ffffffff8105132e
    #7 [ffff88085c6db9b0] do_page_fault at ffffffff8105152c
    #8 [ffff88085c6db9c0] page_fault at ffffffff81a3f122
    [exception RIP: uart_put_char+149]
    RIP: ffffffff814b67b5 RSP: ffff88085c6dba78 RFLAGS: 00010006
    RAX: 0000000000000292 RBX: ffffffff827c5120 RCX: 0000000000000081
    RDX: 0000000000000000 RSI: 000000000000005f RDI: ffffffff827c5120
    RBP: ffff88085c6dba98 R8: 000000000000012c R9: ffffffff822ea320
    R10: ffff88085fe4db04 R11: 0000000000000001 R12: ffff881059f9c000
    R13: 0000000000000001 R14: 000000000000005f R15: 0000000000000fba
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
    #9 [ffff88085c6dbaa0] tty_put_char at ffffffff81497544
    #10 [ffff88085c6dbac0] do_output_char at ffffffff8149c91c
    #11 [ffff88085c6dbae0] __process_echoes at ffffffff8149cb8b
    #12 [ffff88085c6dbb30] commit_echoes at ffffffff8149cdc2
    #13 [ffff88085c6dbb60] n_tty_receive_buf_fast at ffffffff8149e49b
    #14 [ffff88085c6dbbc0] __receive_buf at ffffffff8149ef5a
    #15 [ffff88085c6dbc20] n_tty_receive_buf_common at ffffffff8149f016
    #16 [ffff88085c6dbca0] n_tty_receive_buf2 at ffffffff8149f194
    #17 [ffff88085c6dbcb0] flush_to_ldisc at ffffffff814a238a
    #18 [ffff88085c6dbd50] process_one_work at ffffffff81090be2
    #19 [ffff88085c6dbe20] worker_thread at ffffffff81091b4d
    #20 [ffff88085c6dbeb0] kthread at ffffffff81096384
    #21 [ffff88085c6dbf50] ret_from_fork at ffffffff81a3d69f​

after slogging through some dissasembly:

ffffffff814b6720 <uart_put_char>:
ffffffff814b6720:	55                   	push   %rbp
ffffffff814b6721:	48 89 e5             	mov    %rsp,%rbp
ffffffff814b6724:	48 83 ec 20          	sub    $0x20,%rsp
ffffffff814b6728:	48 89 1c 24          	mov    %rbx,(%rsp)
ffffffff814b672c:	4c 89 64 24 08       	mov    %r12,0x8(%rsp)
ffffffff814b6731:	4c 89 6c 24 10       	mov    %r13,0x10(%rsp)
ffffffff814b6736:	4c 89 74 24 18       	mov    %r14,0x18(%rsp)
ffffffff814b673b:	e8 b0 8e 58 00       	callq  ffffffff81a3f5f0 <mcount>
ffffffff814b6740:	4c 8b a7 88 02 00 00 	mov    0x288(%rdi),%r12
ffffffff814b6747:	45 31 ed             	xor    %r13d,%r13d
ffffffff814b674a:	41 89 f6             	mov    %esi,%r14d
ffffffff814b674d:	49 83 bc 24 70 01 00 	cmpq   $0x0,0x170(%r12)
ffffffff814b6754:	00 00
ffffffff814b6756:	49 8b 9c 24 80 01 00 	mov    0x180(%r12),%rbx
ffffffff814b675d:	00
ffffffff814b675e:	74 2f                	je     ffffffff814b678f <uart_put_char+0x6f>
ffffffff814b6760:	48 89 df             	mov    %rbx,%rdi
ffffffff814b6763:	e8 a8 67 58 00       	callq  ffffffff81a3cf10 <_raw_spin_lock_irqsave>
ffffffff814b6768:	41 8b 8c 24 78 01 00 	mov    0x178(%r12),%ecx
ffffffff814b676f:	00
ffffffff814b6770:	89 ca                	mov    %ecx,%edx
ffffffff814b6772:	f7 d2                	not    %edx
ffffffff814b6774:	41 03 94 24 7c 01 00 	add    0x17c(%r12),%edx
ffffffff814b677b:	00
ffffffff814b677c:	81 e2 ff 0f 00 00    	and    $0xfff,%edx
ffffffff814b6782:	75 23                	jne    ffffffff814b67a7 <uart_put_char+0x87>
ffffffff814b6784:	48 89 c6             	mov    %rax,%rsi
ffffffff814b6787:	48 89 df             	mov    %rbx,%rdi
ffffffff814b678a:	e8 e1 64 58 00       	callq  ffffffff81a3cc70 <_raw_spin_unlock_irqrestore>
ffffffff814b678f:	44 89 e8             	mov    %r13d,%eax
ffffffff814b6792:	48 8b 1c 24          	mov    (%rsp),%rbx
ffffffff814b6796:	4c 8b 64 24 08       	mov    0x8(%rsp),%r12
ffffffff814b679b:	4c 8b 6c 24 10       	mov    0x10(%rsp),%r13
ffffffff814b67a0:	4c 8b 74 24 18       	mov    0x18(%rsp),%r14
ffffffff814b67a5:	c9                   	leaveq
ffffffff814b67a6:	c3                   	retq
ffffffff814b67a7:	49 8b 94 24 70 01 00 	mov    0x170(%r12),%rdx
ffffffff814b67ae:	00
ffffffff814b67af:	48 63 c9             	movslq %ecx,%rcx
ffffffff814b67b2:	41 b5 01             	mov    $0x1,%r13b
ffffffff814b67b5:	44 88 34 0a          	mov    %r14b,(%rdx,%rcx,1)
ffffffff814b67b9:	41 8b 94 24 78 01 00 	mov    0x178(%r12),%edx
ffffffff814b67c0:	00
ffffffff814b67c1:	83 c2 01             	add    $0x1,%edx
ffffffff814b67c4:	81 e2 ff 0f 00 00    	and    $0xfff,%edx
ffffffff814b67ca:	41 89 94 24 78 01 00 	mov    %edx,0x178(%r12)
ffffffff814b67d1:	00
ffffffff814b67d2:	eb b0                	jmp    ffffffff814b6784 <uart_put_char+0x64>
ffffffff814b67d4:	66 66 66 2e 0f 1f 84 	data32 data32 nopw %cs:0x0(%rax,%rax,1)
ffffffff814b67db:	00 00 00 00 00

for our build, this is crashing at:

    circ->buf[circ->head] = c;

Looking in uart_port_startup(), it seems that circ->buf (state->xmit.buf)
protected by the "per-port mutex", which based on uart_port_check() is
state->port.mutex. Indeed, the lock acquired in uart_put_char() is
uport->lock, i.e. not the same lock.

Anyway, since the lock is not acquired, if uart_shutdown() is called, the
last chunk of that function may release state->xmit.buf before its assigned
to null, and cause the race above.

To fix it, let's lock uport->lock when allocating/deallocating
state->xmit.buf in addition to the per-port mutex.

v2: switch to locking uport->lock on allocation/deallocation instead of
    locking the per-port mutex in uart_put_char. Note that since
    uport->lock is a spin lock, we have to switch the allocation to
    GFP_ATOMIC.
v3: move the allocation outside the lock, so we can switch back to
    GFP_KERNEL

Signed-off-by: Tycho Andersen <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Sep 28, 2018
commit 3943b04 upstream.

The writeback thread would exit with a lock held when the cache device
is detached via sysfs interface, fix it by releasing the held lock
before exiting the while-loop.

Fixes: fadd94e (bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set)
Signed-off-by: Shan Hai <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Tested-by: Shenghui Wang <[email protected]>
Cc: [email protected] #4.17+
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Oct 5, 2018
The customer hit this crash few times.

PID: 31556  TASK: ffff880f823caa00  CPU: 1   COMMAND: "cellsrv"
 #0 [ffff880f823db850] machine_kexec at ffffffff8105d93c
 #1 [ffff880f823db8b0] crash_kexec at ffffffff811103b3
 #2 [ffff880f823db980] oops_end at ffffffff8101a788
 #3 [ffff880f823db9b0] no_context at ffffffff8106b9cf
 #4 [ffff880f823dba20] __bad_area_nosemaphore at ffffffff8106bc9d
 #5 [ffff880f823dba70] bad_area at ffffffff8106be97
 #6 [ffff880f823dbaa0] __do_page_fault at ffffffff8106c71e
 #7 [ffff880f823dbb00] do_page_fault at ffffffff8106c81f
 #8 [ffff880f823dbb40] page_fault at ffffffff816b5a9f
    [exception RIP: rds_ib_inc_copy_to_user+104]
    RIP: ffffffffa04607b8  RSP: ffff880f823dbbf8  RFLAGS: 00010287
    RAX: 0000000000000340  RBX: 0000000000001000  RCX: 0000000000004000
    RDX: 0000000000001000  RSI: ffff88176cea2000  RDI: ffff8817d291f520
    RBP: ffff880f823dbc48   R8: 0000000000001340   R9: 0000000000001000
    R10: 0000000000001200  R11: ffff880f823dc000  R12: ffff880f823dbed0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000001000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff880f823dbc50] rds_recvmsg at ffffffffa041d837 [rds]

int rds_ib_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to)
...
...
        ibinc = container_of(inc, struct rds_ib_incoming, ii_inc);
        frag = list_entry(ibinc->ii_frags.next, struct rds_page_frag, f_item);
        len = be32_to_cpu(inc->i_hdr.h_len);
        sg = frag->f_sg;

        while (iov_iter_count(to) && copied < len) {
                to_copy = min_t(unsigned long, iov_iter_count(to),
                                sg->length - frag_off);
                ...

sg is NULL and it crashes accessing sg->length above.

The cause looks like is due to ic->i_frag_sz returning incorrect value.
16KB when 4KB was expected.

                if (copied % ic->i_frag_sz == 0) {
                        frag = list_entry(frag->f_item.next,
                                          struct rds_page_frag, f_item);
                        frag_off = 0;
                        sg = frag->f_sg;
                }

The other end is using 4KB RDS fragsize (Solaris Super Cluster).
This end is UEK4 (4.1.12-94.8.4.el6uek.x86_64).

The message being copied arrived over 4KB RDS frag size connection.
But during the above check ic->i_frag_sz is 16KB.
This can happen during a reconnect at the connection setup phase.
We start off with ic->i_frag_sz as 16KB. Then settle down at 4KB.

Failing this check
  if (copied % ic->i_frag_sz == 0) {
can result in sg not getting set correctly.

Say, "copied" = 4KB but ic->i_frag_sz is 16KB when it should be 4KB.

During race condition with a reconnect, ic->i_frag_sz can be 16KB
even though once the connection is set up it settled down to 4KB.
It can change from 4KB to 16KB and back to 4KB during connection setup
due to reconnect.

We started seeing this crash after bug 26848749.
But prior to that the same scenario could result in data copied to user
from incorrect "sg" resulting in data corruption.

Orabug: 28506569

Signed-off-by: Venkat Venkatsubra <[email protected]>
Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Brian Maly <[email protected]>
gregmarsden pushed a commit that referenced this pull request Nov 9, 2018
This reverts commit 2059e52.

This commit causes reboot to fail on imx6 wandboard, so let's
revert it.

Cc: <[email protected]> #4.14
Reported-by: Rasmus Villemoes <[email protected]>
Signed-off-by: Fabio Estevam <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Nov 16, 2018
Commit 822fb18 ("xen-netfront: wait xenbus state change when load
module manually") added a new wait queue to wait on for a state change
when the module is loaded manually. Unfortunately there is no wakeup
anywhere to stop that waiting.

Instead of introducing a new wait queue rename the existing
module_unload_q to module_wq and use it for both purposes (loading and
unloading).

As any state change of the backend might be intended to stop waiting
do the wake_up_all() in any case when netback_changed() is called.

Fixes: 822fb18 ("xen-netfront: wait xenbus state change when load module manually")
Cc: <[email protected]> #4.18
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 8edfe2e)

Orabug: 28671425

Signed-off-by: Liam Merwick <[email protected]>
Reviewed-by: Darren Kenny <[email protected]>
Tested-by: Boris Ostrovsky <[email protected]>
gregmarsden pushed a commit that referenced this pull request Nov 16, 2018
When netvsc device is removed it can call reschedule in RCU context.
This happens because canceling the subchannel setup work could (in theory)
cause a reschedule when manipulating the timer.

To reproduce, run with lockdep enabled kernel and unbind
a network device from hv_netvsc (via sysfs).

[  160.682011] WARNING: suspicious RCU usage
[  160.707466] 4.19.0-rc3-uio+ #2 Not tainted
[  160.709937] -----------------------------
[  160.712352] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section!
[  160.723691]
[  160.723691] other info that might help us debug this:
[  160.723691]
[  160.730955]
[  160.730955] rcu_scheduler_active = 2, debug_locks = 1
[  160.762813] 5 locks held by rebind-eth.sh/1812:
[  160.766851]  #0: 000000008befa37a (sb_writers#6){.+.+}, at: vfs_write+0x184/0x1b0
[  160.773416]  #1: 00000000b097f236 (&of->mutex){+.+.}, at: kernfs_fop_write+0xe2/0x1a0
[  160.783766]  #2: 0000000041ee6889 (kn->count#3){++++}, at: kernfs_fop_write+0xeb/0x1a0
[  160.787465]  #3: 0000000056d92a74 (&dev->mutex){....}, at: device_release_driver_internal+0x39/0x250
[  160.816987]  #4: 0000000030f6031e (rcu_read_lock){....}, at: netvsc_remove+0x1e/0x250 [hv_netvsc]
[  160.828629]
[  160.828629] stack backtrace:
[  160.831966] CPU: 1 PID: 1812 Comm: rebind-eth.sh Not tainted 4.19.0-rc3-uio+ #2
[  160.832952] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
[  160.832952] Call Trace:
[  160.832952]  dump_stack+0x85/0xcb
[  160.832952]  ___might_sleep+0x1a3/0x240
[  160.832952]  __flush_work+0x57/0x2e0
[  160.832952]  ? __mutex_lock+0x83/0x990
[  160.832952]  ? __kernfs_remove+0x24f/0x2e0
[  160.832952]  ? __kernfs_remove+0x1b2/0x2e0
[  160.832952]  ? mark_held_locks+0x50/0x80
[  160.832952]  ? get_work_pool+0x90/0x90
[  160.832952]  __cancel_work_timer+0x13c/0x1e0
[  160.832952]  ? netvsc_remove+0x1e/0x250 [hv_netvsc]
[  160.832952]  ? __lock_is_held+0x55/0x90
[  160.832952]  netvsc_remove+0x9a/0x250 [hv_netvsc]
[  160.832952]  vmbus_remove+0x26/0x30
[  160.832952]  device_release_driver_internal+0x18a/0x250
[  160.832952]  unbind_store+0xb4/0x180
[  160.832952]  kernfs_fop_write+0x113/0x1a0
[  160.832952]  __vfs_write+0x36/0x1a0
[  160.832952]  ? rcu_read_lock_sched_held+0x6b/0x80
[  160.832952]  ? rcu_sync_lockdep_assert+0x2e/0x60
[  160.832952]  ? __sb_start_write+0x141/0x1a0
[  160.832952]  ? vfs_write+0x184/0x1b0
[  160.832952]  vfs_write+0xbe/0x1b0
[  160.832952]  ksys_write+0x55/0xc0
[  160.832952]  do_syscall_64+0x60/0x1b0
[  160.832952]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  160.832952] RIP: 0033:0x7fe48f4c8154

Resolve this by getting RTNL earlier. This is safe because the subchannel
work queue does trylock on RTNL and will detect the race.

Fixes: 7b2ee50 ("hv_netvsc: common detach logic")
Signed-off-by: Stephen Hemminger <[email protected]>
Reviewed-by: Haiyang Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 018349d)

Orabug: 28671425

Signed-off-by: Liam Merwick <[email protected]>
Reviewed-by: Darren Kenny <[email protected]>
Reviewed-by: Alejandro Jimenez <[email protected]>
Tested-by: Vijay Balakrishna <[email protected]>
gregmarsden pushed a commit that referenced this pull request Nov 23, 2018
The customer hit this crash few times.

PID: 31556  TASK: ffff880f823caa00  CPU: 1   COMMAND: "cellsrv"
 #0 [ffff880f823db850] machine_kexec at ffffffff8105d93c
 #1 [ffff880f823db8b0] crash_kexec at ffffffff811103b3
 #2 [ffff880f823db980] oops_end at ffffffff8101a788
 #3 [ffff880f823db9b0] no_context at ffffffff8106b9cf
 #4 [ffff880f823dba20] __bad_area_nosemaphore at ffffffff8106bc9d
 #5 [ffff880f823dba70] bad_area at ffffffff8106be97
 #6 [ffff880f823dbaa0] __do_page_fault at ffffffff8106c71e
 #7 [ffff880f823dbb00] do_page_fault at ffffffff8106c81f
 #8 [ffff880f823dbb40] page_fault at ffffffff816b5a9f
    [exception RIP: rds_ib_inc_copy_to_user+104]
    RIP: ffffffffa04607b8  RSP: ffff880f823dbbf8  RFLAGS: 00010287
    RAX: 0000000000000340  RBX: 0000000000001000  RCX: 0000000000004000
    RDX: 0000000000001000  RSI: ffff88176cea2000  RDI: ffff8817d291f520
    RBP: ffff880f823dbc48   R8: 0000000000001340   R9: 0000000000001000
    R10: 0000000000001200  R11: ffff880f823dc000  R12: ffff880f823dbed0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000001000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff880f823dbc50] rds_recvmsg at ffffffffa041d837 [rds]

int rds_ib_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to)
...
...
        ibinc = container_of(inc, struct rds_ib_incoming, ii_inc);
        frag = list_entry(ibinc->ii_frags.next, struct rds_page_frag, f_item);
        len = be32_to_cpu(inc->i_hdr.h_len);
        sg = frag->f_sg;

        while (iov_iter_count(to) && copied < len) {
                to_copy = min_t(unsigned long, iov_iter_count(to),
                                sg->length - frag_off);
                ...

sg is NULL and it crashes accessing sg->length above.

The cause looks like is due to ic->i_frag_sz returning incorrect value.
16KB when 4KB was expected.

                if (copied % ic->i_frag_sz == 0) {
                        frag = list_entry(frag->f_item.next,
                                          struct rds_page_frag, f_item);
                        frag_off = 0;
                        sg = frag->f_sg;
                }

The other end is using 4KB RDS fragsize (Solaris Super Cluster).
This end is UEK4 (4.1.12-94.8.4.el6uek.x86_64).

The message being copied arrived over 4KB RDS frag size connection.
But during the above check ic->i_frag_sz is 16KB.
This can happen during a reconnect at the connection setup phase.
We start off with ic->i_frag_sz as 16KB. Then settle down at 4KB.

Failing this check
  if (copied % ic->i_frag_sz == 0) {
can result in sg not getting set correctly.

Say, "copied" = 4KB but ic->i_frag_sz is 16KB when it should be 4KB.

During race condition with a reconnect, ic->i_frag_sz can be 16KB
even though once the connection is set up it settled down to 4KB.
It can change from 4KB to 16KB and back to 4KB during connection setup
due to reconnect.

We started seeing this crash after bug 26848749.
But prior to that the same scenario could result in data copied to user
from incorrect "sg" resulting in data corruption.

Orabug: 28748008

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>
gregmarsden pushed a commit that referenced this pull request Dec 14, 2018
[ Upstream commit c77ec61 ]

This patch adds to do sanity check with {sit,nat}_ver_bitmap_bytesize
during mount, in order to avoid accessing across cache boundary with
this abnormal bitmap size.

- Overview
buffer overrun in build_sit_info() when mounting a crafted f2fs image

- Reproduce

- Kernel message
[  548.580867] F2FS-fs (loop0): Invalid log blocks per segment (8201)

[  548.580877] F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
[  548.584979] ==================================================================
[  548.586568] BUG: KASAN: use-after-free in kmemdup+0x36/0x50
[  548.587715] Read of size 64 at addr ffff8801e9c265ff by task mount/1295

[  548.589428] CPU: 1 PID: 1295 Comm: mount Not tainted 4.18.0-rc1+ #4
[  548.589432] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  548.589438] Call Trace:
[  548.589474]  dump_stack+0x7b/0xb5
[  548.589487]  print_address_description+0x70/0x290
[  548.589492]  kasan_report+0x291/0x390
[  548.589496]  ? kmemdup+0x36/0x50
[  548.589509]  check_memory_region+0x139/0x190
[  548.589514]  memcpy+0x23/0x50
[  548.589518]  kmemdup+0x36/0x50
[  548.589545]  f2fs_build_segment_manager+0x8fa/0x3410
[  548.589551]  ? __asan_loadN+0xf/0x20
[  548.589560]  ? f2fs_sanity_check_ckpt+0x1be/0x240
[  548.589566]  ? f2fs_flush_sit_entries+0x10c0/0x10c0
[  548.589587]  ? __put_user_ns+0x40/0x40
[  548.589604]  ? find_next_bit+0x57/0x90
[  548.589610]  f2fs_fill_super+0x194b/0x2b40
[  548.589617]  ? f2fs_commit_super+0x1b0/0x1b0
[  548.589637]  ? set_blocksize+0x90/0x140
[  548.589651]  mount_bdev+0x1c5/0x210
[  548.589655]  ? f2fs_commit_super+0x1b0/0x1b0
[  548.589667]  f2fs_mount+0x15/0x20
[  548.589672]  mount_fs+0x60/0x1a0
[  548.589683]  ? alloc_vfsmnt+0x309/0x360
[  548.589688]  vfs_kern_mount+0x6b/0x1a0
[  548.589699]  do_mount+0x34a/0x18c0
[  548.589710]  ? lockref_put_or_lock+0xcf/0x160
[  548.589716]  ? copy_mount_string+0x20/0x20
[  548.589728]  ? memcg_kmem_put_cache+0x1b/0xa0
[  548.589734]  ? kasan_check_write+0x14/0x20
[  548.589740]  ? _copy_from_user+0x6a/0x90
[  548.589744]  ? memdup_user+0x42/0x60
[  548.589750]  ksys_mount+0x83/0xd0
[  548.589755]  __x64_sys_mount+0x67/0x80
[  548.589781]  do_syscall_64+0x78/0x170
[  548.589797]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  548.589820] RIP: 0033:0x7f76fc331b9a
[  548.589821] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[  548.589880] RSP: 002b:00007ffd4f0a0e48 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[  548.589890] RAX: ffffffffffffffda RBX: 000000000146c030 RCX: 00007f76fc331b9a
[  548.589892] RDX: 000000000146c210 RSI: 000000000146df30 RDI: 0000000001474ec0
[  548.589895] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[  548.589897] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001474ec0
[  548.589900] R13: 000000000146c210 R14: 0000000000000000 R15: 0000000000000003

[  548.590242] The buggy address belongs to the page:
[  548.591243] page:ffffea0007a70980 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[  548.592886] flags: 0x2ffff0000000000()
[  548.593665] raw: 02ffff0000000000 dead000000000100 dead000000000200 0000000000000000
[  548.595258] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[  548.603713] page dumped because: kasan: bad access detected

[  548.605203] Memory state around the buggy address:
[  548.606198]  ffff8801e9c26480: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  548.607676]  ffff8801e9c26500: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  548.609157] >ffff8801e9c26580: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  548.610629]                                                                 ^
[  548.612088]  ffff8801e9c26600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  548.613674]  ffff8801e9c26680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[  548.615141] ==================================================================
[  548.616613] Disabling lock debugging due to kernel taint
[  548.622871] WARNING: CPU: 1 PID: 1295 at mm/page_alloc.c:4065 __alloc_pages_slowpath+0xe4a/0x1420
[  548.622878] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[  548.623217] CPU: 1 PID: 1295 Comm: mount Tainted: G    B             4.18.0-rc1+ #4
[  548.623219] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  548.623226] RIP: 0010:__alloc_pages_slowpath+0xe4a/0x1420
[  548.623227] Code: ff ff 01 89 85 c8 fe ff ff e9 91 fc ff ff 41 89 c5 e9 5c fc ff ff 0f 0b 89 f8 25 ff ff f7 ff 89 85 8c fe ff ff e9 d5 f2 ff ff <0f> 0b e9 65 f2 ff ff 65 8b 05 38 81 d2 47 f6 c4 01 74 1c 65 48 8b
[  548.623281] RSP: 0018:ffff8801f28c7678 EFLAGS: 00010246
[  548.623284] RAX: 0000000000000000 RBX: 00000000006040c0 RCX: ffffffffb82f73b7
[  548.623287] RDX: 1ffff1003e518eeb RSI: 000000000000000c RDI: 0000000000000000
[  548.623290] RBP: ffff8801f28c7880 R08: 0000000000000000 R09: ffffed0047fff2c5
[  548.623292] R10: 0000000000000001 R11: ffffed0047fff2c4 R12: ffff8801e88de040
[  548.623295] R13: 00000000006040c0 R14: 000000000000000c R15: ffff8801f28c7938
[  548.623299] FS:  00007f76fca51840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[  548.623302] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  548.623304] CR2: 00007f19b9171760 CR3: 00000001ed952000 CR4: 00000000000006e0
[  548.623317] Call Trace:
[  548.623325]  ? kasan_check_read+0x11/0x20
[  548.623330]  ? __zone_watermark_ok+0x92/0x240
[  548.623336]  ? get_page_from_freelist+0x1c3/0x1d90
[  548.623347]  ? _raw_spin_lock_irqsave+0x2a/0x60
[  548.623353]  ? warn_alloc+0x250/0x250
[  548.623358]  ? save_stack+0x46/0xd0
[  548.623361]  ? kasan_kmalloc+0xad/0xe0
[  548.623366]  ? __isolate_free_page+0x2a0/0x2a0
[  548.623370]  ? mount_fs+0x60/0x1a0
[  548.623374]  ? vfs_kern_mount+0x6b/0x1a0
[  548.623378]  ? do_mount+0x34a/0x18c0
[  548.623383]  ? ksys_mount+0x83/0xd0
[  548.623387]  ? __x64_sys_mount+0x67/0x80
[  548.623391]  ? do_syscall_64+0x78/0x170
[  548.623396]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  548.623401]  __alloc_pages_nodemask+0x3c5/0x400
[  548.623407]  ? __alloc_pages_slowpath+0x1420/0x1420
[  548.623412]  ? __mutex_lock_slowpath+0x20/0x20
[  548.623417]  ? kvmalloc_node+0x31/0x80
[  548.623424]  alloc_pages_current+0x75/0x110
[  548.623436]  kmalloc_order+0x24/0x60
[  548.623442]  kmalloc_order_trace+0x24/0xb0
[  548.623448]  __kmalloc_track_caller+0x207/0x220
[  548.623455]  ? f2fs_build_node_manager+0x399/0xbb0
[  548.623460]  kmemdup+0x20/0x50
[  548.623465]  f2fs_build_node_manager+0x399/0xbb0
[  548.623470]  f2fs_fill_super+0x195e/0x2b40
[  548.623477]  ? f2fs_commit_super+0x1b0/0x1b0
[  548.623481]  ? set_blocksize+0x90/0x140
[  548.623486]  mount_bdev+0x1c5/0x210
[  548.623489]  ? f2fs_commit_super+0x1b0/0x1b0
[  548.623495]  f2fs_mount+0x15/0x20
[  548.623498]  mount_fs+0x60/0x1a0
[  548.623503]  ? alloc_vfsmnt+0x309/0x360
[  548.623508]  vfs_kern_mount+0x6b/0x1a0
[  548.623513]  do_mount+0x34a/0x18c0
[  548.623518]  ? lockref_put_or_lock+0xcf/0x160
[  548.623523]  ? copy_mount_string+0x20/0x20
[  548.623528]  ? memcg_kmem_put_cache+0x1b/0xa0
[  548.623533]  ? kasan_check_write+0x14/0x20
[  548.623537]  ? _copy_from_user+0x6a/0x90
[  548.623542]  ? memdup_user+0x42/0x60
[  548.623547]  ksys_mount+0x83/0xd0
[  548.623552]  __x64_sys_mount+0x67/0x80
[  548.623557]  do_syscall_64+0x78/0x170
[  548.623562]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  548.623566] RIP: 0033:0x7f76fc331b9a
[  548.623567] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[  548.623632] RSP: 002b:00007ffd4f0a0e48 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[  548.623636] RAX: ffffffffffffffda RBX: 000000000146c030 RCX: 00007f76fc331b9a
[  548.623639] RDX: 000000000146c210 RSI: 000000000146df30 RDI: 0000000001474ec0
[  548.623641] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[  548.623643] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001474ec0
[  548.623646] R13: 000000000146c210 R14: 0000000000000000 R15: 0000000000000003
[  548.623650] ---[ end trace 4ce02f25ff7d3df5 ]---
[  548.623656] F2FS-fs (loop0): Failed to initialize F2FS node manager
[  548.627936] F2FS-fs (loop0): Invalid log blocks per segment (8201)

[  548.627940] F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
[  548.635835] F2FS-fs (loop0): Failed to initialize F2FS node manager

- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.c#L3578

	sit_i->sit_bitmap = kmemdup(src_bitmap, bitmap_size, GFP_KERNEL);

Buffer overrun happens when doing memcpy. I suspect there is missing (inconsistent) checks on bitmap_size.

Reported by Wen Xu ([email protected]) from SSLab, Gatech.

Reported-by: Wen Xu <[email protected]>
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Dec 14, 2018
[ Upstream commit 46b3722 ]

We occasionaly hit following assert failure in 'perf top', when processing the
/proc info in multiple threads.

  perf: ...include/linux/refcount.h:109: refcount_inc:
        Assertion `!(!refcount_inc_not_zero(r))' failed.

The gdb backtrace looks like this:

  [Switching to Thread 0x7ffff11ba700 (LWP 13749)]
  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  (gdb)
  #0  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  #1  0x00007ffff5085800 in abort () from /lib64/libc.so.6
  #2  0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
  #3  0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
  #4  0x0000000000535373 in refcount_inc (r=0x7fffdc009be0)
      at ...include/linux/refcount.h:109
  #5  0x00000000005354f1 in comm_str__get (cs=0x7fffdc009bc0)
      at util/comm.c:24
  #6  0x00000000005356bd in __comm_str__findnew (str=0x7fffd000b260 ":2",
      root=0xbed5c0 <comm_str_root>) at util/comm.c:72
  #7  0x000000000053579e in comm_str__findnew (str=0x7fffd000b260 ":2",
      root=0xbed5c0 <comm_str_root>) at util/comm.c:95
  #8  0x000000000053582e in comm__new (str=0x7fffd000b260 ":2",
      timestamp=0, exec=false) at util/comm.c:111
  #9  0x00000000005363bc in thread__new (pid=2, tid=2) at util/thread.c:57
  #10 0x0000000000523da0 in ____machine__findnew_thread (machine=0xbfde38,
      threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:457
  #11 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
  ...

The failing assertion is this one:

  REFCOUNT_WARN(!refcount_inc_not_zero(r), ...

The problem is that we keep global comm_str_root list, which
is accessed by multiple threads during the 'perf top' startup
and following 2 paths can race:

  thread 1:
    ...
    thread__new
      comm__new
        comm_str__findnew
          down_write(&comm_str_lock);
          __comm_str__findnew
            comm_str__get

  thread 2:
    ...
    comm__override or comm__free
      comm_str__put
        refcount_dec_and_test
          down_write(&comm_str_lock);
          rb_erase(&cs->rb_node, &comm_str_root);

Because thread 2 first decrements the refcnt and only after then it removes the
struct comm_str from the list, the thread 1 can find this object on the list
with refcnt equls to 0 and hit the assert.

This patch fixes the thread 1 __comm_str__findnew path, by ignoring objects
that already dropped the refcnt to 0. For the rest of the objects we take the
refcnt before comparing its name and release it afterwards with comm_str__put,
which can also release the object completely.

Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Lukasz Odzioba <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/20180720101740.GA27176@krava
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Dec 14, 2018
commit 8edfe2e upstream.

Commit 822fb18 ("xen-netfront: wait xenbus state change when load
module manually") added a new wait queue to wait on for a state change
when the module is loaded manually. Unfortunately there is no wakeup
anywhere to stop that waiting.

Instead of introducing a new wait queue rename the existing
module_unload_q to module_wq and use it for both purposes (loading and
unloading).

As any state change of the backend might be intended to stop waiting
do the wake_up_all() in any case when netback_changed() is called.

Fixes: 822fb18 ("xen-netfront: wait xenbus state change when load module manually")
Cc: <[email protected]> #4.18
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Dec 14, 2018
[ Upstream commit b71c69c ]

Fixes this warning that was provoked by a pairing:

[60258.016221] WARNING: possible recursive locking detected
[60258.021558] 4.15.0-RD1812-BSP #1 Tainted: G           O
[60258.027146] --------------------------------------------
[60258.032464] kworker/u5:0/70 is trying to acquire lock:
[60258.037609]  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}, at: [<87759073>] bt_accept_enqueue+0x3c/0x74
[60258.046863]
[60258.046863] but task is already holding lock:
[60258.052704]  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}, at: [<d22d7106>] l2cap_sock_new_connection_cb+0x1c/0x88
[60258.062905]
[60258.062905] other info that might help us debug this:
[60258.069441]  Possible unsafe locking scenario:
[60258.069441]
[60258.075368]        CPU0
[60258.077821]        ----
[60258.080272]   lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP);
[60258.085510]   lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP);
[60258.090748]
[60258.090748]  *** DEADLOCK ***
[60258.090748]
[60258.096676]  May be due to missing lock nesting notation
[60258.096676]
[60258.103472] 5 locks held by kworker/u5:0/70:
[60258.107747]  #0:  ((wq_completion)%shdev->name#2){+.+.}, at: [<9460d092>] process_one_work+0x130/0x4fc
[60258.117263]  #1:  ((work_completion)(&hdev->rx_work)){+.+.}, at: [<9460d092>] process_one_work+0x130/0x4fc
[60258.126942]  #2:  (&conn->chan_lock){+.+.}, at: [<7877c8c3>] l2cap_connect+0x80/0x4f8
[60258.134806]  #3:  (&chan->lock/2){+.+.}, at: [<2e16c724>] l2cap_connect+0x8c/0x4f8
[60258.142410]  #4:  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}, at: [<d22d7106>] l2cap_sock_new_connection_cb+0x1c/0x88
[60258.153043]
[60258.153043] stack backtrace:
[60258.157413] CPU: 1 PID: 70 Comm: kworker/u5:0 Tainted: G           O     4.15.0-RD1812-BSP #1
[60258.165945] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[60258.172485] Workqueue: hci0 hci_rx_work
[60258.176331] Backtrace:
[60258.178797] [<8010c9fc>] (dump_backtrace) from [<8010ccbc>] (show_stack+0x18/0x1c)
[60258.186379]  r7:80e55fe4 r6:80e55fe4 r5:20050093 r4:00000000
[60258.192058] [<8010cca4>] (show_stack) from [<809864e8>] (dump_stack+0xb0/0xdc)
[60258.199301] [<80986438>] (dump_stack) from [<8016ecc8>] (__lock_acquire+0xffc/0x11d4)
[60258.207144]  r9:5e2bb019 r8:630f974c r7:ba8a5940 r6:ba8a5ed8 r5:815b5220 r4:80fa081c
[60258.214901] [<8016dccc>] (__lock_acquire) from [<8016f620>] (lock_acquire+0x78/0x98)
[60258.222655]  r10:00000040 r9:00000040 r8:808729f0 r7:00000001 r6:00000000 r5:60050013
[60258.230491]  r4:00000000
[60258.233045] [<8016f5a8>] (lock_acquire) from [<806ee974>] (lock_sock_nested+0x64/0x88)
[60258.240970]  r7:00000000 r6:b796e870 r5:00000001 r4:b796e800
[60258.246643] [<806ee910>] (lock_sock_nested) from [<808729f0>] (bt_accept_enqueue+0x3c/0x74)
[60258.255004]  r8:00000001 r7:ba7d3c00 r6:ba7d3ea4 r5:ba7d2000 r4:b796e800
[60258.261717] [<808729b4>] (bt_accept_enqueue) from [<808aa39c>] (l2cap_sock_new_connection_cb+0x68/0x88)
[60258.271117]  r5:b796e800 r4:ba7d2000
[60258.274708] [<808aa334>] (l2cap_sock_new_connection_cb) from [<808a294c>] (l2cap_connect+0x190/0x4f8)
[60258.283933]  r5:00000001 r4:ba6dce00
[60258.287524] [<808a27bc>] (l2cap_connect) from [<808a4a14>] (l2cap_recv_frame+0x744/0x2cf8)
[60258.295800]  r10:ba6dcf24 r9:00000004 r8:b78d8014 r7:00000004 r6:bb05d000 r5:00000004
[60258.303635]  r4:bb05d008
[60258.306183] [<808a42d0>] (l2cap_recv_frame) from [<808a7808>] (l2cap_recv_acldata+0x210/0x214)
[60258.314805]  r10:b78e7800 r9:bb05d960 r8:00000001 r7:bb05d000 r6:0000000c r5:b7957a80
[60258.322641]  r4:ba6dce00
[60258.325188] [<808a75f8>] (l2cap_recv_acldata) from [<8087630c>] (hci_rx_work+0x35c/0x4e8)
[60258.333374]  r6:80e5743c r5:bb05d7c8 r4:b7957a80
[60258.338004] [<80875fb0>] (hci_rx_work) from [<8013dc7c>] (process_one_work+0x1a4/0x4fc)
[60258.346018]  r10:00000001 r9:00000000 r8:baabfef8 r7:ba997500 r6:baaba800 r5:baaa5d00
[60258.353853]  r4:bb05d7c8
[60258.356401] [<8013dad8>] (process_one_work) from [<8013e028>] (worker_thread+0x54/0x5cc)
[60258.364503]  r10:baabe038 r9:baaba834 r8:80e05900 r7:00000088 r6:baaa5d18 r5:baaba800
[60258.372338]  r4:baaa5d00
[60258.374888] [<8013dfd4>] (worker_thread) from [<801448f8>] (kthread+0x134/0x160)
[60258.382295]  r10:ba8310b8 r9:bb07dbfc r8:8013dfd4 r7:baaa5d00 r6:00000000 r5:baaa8ac0
[60258.390130]  r4:ba831080
[60258.392682] [<801447c4>] (kthread) from [<801080b4>] (ret_from_fork+0x14/0x20)
[60258.399915]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:801447c4
[60258.407751]  r4:baaa8ac0 r3:baabe000

Signed-off-by: Philipp Puschmann <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Jan 18, 2019
commit 3e1a127 upstream.

When we disable hotplugging on the GPU, we need to be able to
synchronize with each connector's hotplug interrupt handler before the
interrupt is finally disabled. This can be a problem however, since
nouveau_connector_detect() currently grabs a runtime power reference
when handling connector probing. This will deadlock the runtime suspend
handler like so:

[  861.480896] INFO: task kworker/0:2:61 blocked for more than 120 seconds.
[  861.483290]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.485158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.486332] kworker/0:2     D    0    61      2 0x80000000
[  861.487044] Workqueue: events nouveau_display_hpd_work [nouveau]
[  861.487737] Call Trace:
[  861.488394]  __schedule+0x322/0xaf0
[  861.489070]  schedule+0x33/0x90
[  861.489744]  rpm_resume+0x19c/0x850
[  861.490392]  ? finish_wait+0x90/0x90
[  861.491068]  __pm_runtime_resume+0x4e/0x90
[  861.491753]  nouveau_display_hpd_work+0x22/0x60 [nouveau]
[  861.492416]  process_one_work+0x231/0x620
[  861.493068]  worker_thread+0x44/0x3a0
[  861.493722]  kthread+0x12b/0x150
[  861.494342]  ? wq_pool_ids_show+0x140/0x140
[  861.494991]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.495648]  ret_from_fork+0x3a/0x50
[  861.496304] INFO: task kworker/6:2:320 blocked for more than 120 seconds.
[  861.496968]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.497654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.498341] kworker/6:2     D    0   320      2 0x80000080
[  861.499045] Workqueue: pm pm_runtime_work
[  861.499739] Call Trace:
[  861.500428]  __schedule+0x322/0xaf0
[  861.501134]  ? wait_for_completion+0x104/0x190
[  861.501851]  schedule+0x33/0x90
[  861.502564]  schedule_timeout+0x3a5/0x590
[  861.503284]  ? mark_held_locks+0x58/0x80
[  861.503988]  ? _raw_spin_unlock_irq+0x2c/0x40
[  861.504710]  ? wait_for_completion+0x104/0x190
[  861.505417]  ? trace_hardirqs_on_caller+0xf4/0x190
[  861.506136]  ? wait_for_completion+0x104/0x190
[  861.506845]  wait_for_completion+0x12c/0x190
[  861.507555]  ? wake_up_q+0x80/0x80
[  861.508268]  flush_work+0x1c9/0x280
[  861.508990]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[  861.509735]  nvif_notify_put+0xb1/0xc0 [nouveau]
[  861.510482]  nouveau_display_fini+0xbd/0x170 [nouveau]
[  861.511241]  nouveau_display_suspend+0x67/0x120 [nouveau]
[  861.511969]  nouveau_do_suspend+0x5e/0x2d0 [nouveau]
[  861.512715]  nouveau_pmops_runtime_suspend+0x47/0xb0 [nouveau]
[  861.513435]  pci_pm_runtime_suspend+0x6b/0x180
[  861.514165]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.514897]  __rpm_callback+0x7a/0x1d0
[  861.515618]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.516313]  rpm_callback+0x24/0x80
[  861.517027]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.517741]  rpm_suspend+0x142/0x6b0
[  861.518449]  pm_runtime_work+0x97/0xc0
[  861.519144]  process_one_work+0x231/0x620
[  861.519831]  worker_thread+0x44/0x3a0
[  861.520522]  kthread+0x12b/0x150
[  861.521220]  ? wq_pool_ids_show+0x140/0x140
[  861.521925]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.522622]  ret_from_fork+0x3a/0x50
[  861.523299] INFO: task kworker/6:0:1329 blocked for more than 120 seconds.
[  861.523977]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.524644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.525349] kworker/6:0     D    0  1329      2 0x80000000
[  861.526073] Workqueue: events nvif_notify_work [nouveau]
[  861.526751] Call Trace:
[  861.527411]  __schedule+0x322/0xaf0
[  861.528089]  schedule+0x33/0x90
[  861.528758]  rpm_resume+0x19c/0x850
[  861.529399]  ? finish_wait+0x90/0x90
[  861.530073]  __pm_runtime_resume+0x4e/0x90
[  861.530798]  nouveau_connector_detect+0x7e/0x510 [nouveau]
[  861.531459]  ? ww_mutex_lock+0x47/0x80
[  861.532097]  ? ww_mutex_lock+0x47/0x80
[  861.532819]  ? drm_modeset_lock+0x88/0x130 [drm]
[  861.533481]  drm_helper_probe_detect_ctx+0xa0/0x100 [drm_kms_helper]
[  861.534127]  drm_helper_hpd_irq_event+0xa4/0x120 [drm_kms_helper]
[  861.534940]  nouveau_connector_hotplug+0x98/0x120 [nouveau]
[  861.535556]  nvif_notify_work+0x2d/0xb0 [nouveau]
[  861.536221]  process_one_work+0x231/0x620
[  861.536994]  worker_thread+0x44/0x3a0
[  861.537757]  kthread+0x12b/0x150
[  861.538463]  ? wq_pool_ids_show+0x140/0x140
[  861.539102]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.539815]  ret_from_fork+0x3a/0x50
[  861.540521]
               Showing all locks held in the system:
[  861.541696] 2 locks held by kworker/0:2/61:
[  861.542406]  #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.543071]  #1: 0000000076868126 ((work_completion)(&drm->hpd_work)){+.+.}, at: process_one_work+0x1b3/0x620
[  861.543814] 1 lock held by khungtaskd/64:
[  861.544535]  #0: 0000000059db4b53 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
[  861.545160] 3 locks held by kworker/6:2/320:
[  861.545896]  #0: 00000000d9e1bc59 ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.546702]  #1: 00000000c9f92d84 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
[  861.547443]  #2: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: nouveau_display_fini+0x96/0x170 [nouveau]
[  861.548146] 1 lock held by dmesg/983:
[  861.548889] 2 locks held by zsh/1250:
[  861.549605]  #0: 00000000348e3cf6 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
[  861.550393]  #1: 000000007009a7a8 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870
[  861.551122] 6 locks held by kworker/6:0/1329:
[  861.551957]  #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.552765]  #1: 00000000ddb499ad ((work_completion)(&notify->work)#2){+.+.}, at: process_one_work+0x1b3/0x620
[  861.553582]  #2: 000000006e013cbe (&dev->mode_config.mutex){+.+.}, at: drm_helper_hpd_irq_event+0x6c/0x120 [drm_kms_helper]
[  861.554357]  #3: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: drm_helper_hpd_irq_event+0x78/0x120 [drm_kms_helper]
[  861.555227]  #4: 0000000044f294d9 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x3d/0x100 [drm_kms_helper]
[  861.556133]  #5: 00000000db193642 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x4b/0x130 [drm]

[  861.557864] =============================================

[  861.559507] NMI backtrace for cpu 2
[  861.560363] CPU: 2 PID: 64 Comm: khungtaskd Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.561197] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018
[  861.561948] Call Trace:
[  861.562757]  dump_stack+0x8e/0xd3
[  861.563516]  nmi_cpu_backtrace.cold.3+0x14/0x5a
[  861.564269]  ? lapic_can_unplug_cpu.cold.27+0x42/0x42
[  861.565029]  nmi_trigger_cpumask_backtrace+0xa1/0xae
[  861.565789]  arch_trigger_cpumask_backtrace+0x19/0x20
[  861.566558]  watchdog+0x316/0x580
[  861.567355]  kthread+0x12b/0x150
[  861.568114]  ? reset_hung_task_detector+0x20/0x20
[  861.568863]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.569598]  ret_from_fork+0x3a/0x50
[  861.570370] Sending NMI from CPU 2 to CPUs 0-1,3-7:
[  861.571426] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x7f/0x120
[  861.571429] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x7f/0x120
[  861.571432] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x7f/0x120
[  861.571464] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x7f/0x120
[  861.571467] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x7f/0x120
[  861.571469] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x7f/0x120
[  861.571472] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x7f/0x120
[  861.572428] Kernel panic - not syncing: hung_task: blocked tasks

So: fix this by making it so that normal hotplug handling /only/ happens
so long as the GPU is currently awake without any pending runtime PM
requests. In the event that a hotplug occurs while the device is
suspending or resuming, we can simply defer our response until the GPU
is fully runtime resumed again.

Changes since v4:
- Use a new trick I came up with using pm_runtime_get() instead of the
  hackish junk we had before

Signed-off-by: Lyude Paul <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
Acked-by: Daniel Vetter <[email protected]>
Cc: [email protected]
Cc: Lukas Wunner <[email protected]>
Signed-off-by: Ben Skeggs <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Jan 18, 2019
…error

The sequence that leads to this state is as follows.

1) First we see CQ error logged.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core
0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2
vendor_error_syndrome=0x0

2) That is followed by the drop of the associated RDS connection.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection
<192.168.54.43,192.168.54.1,0> dropped due to 'qp event'

3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that.

4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection.

crash64> bt 62577
PID: 62577  TASK: ffff88143f045400  CPU: 4   COMMAND: "kworker/u224:1"
 #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b
 #1 [ffff8813663bbbb0] schedule at ffffffff816abca7
 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71
 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma]
 #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds]
 #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds]
 #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1
 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b
 #8 [ffff8813663bbec0] kthread at ffffffff810a304b
 #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752
crash64>

It was stuck here in rds_ib_conn_shutdown for ever:

                /* quiesce tx and rx completion before tearing down */
                while (!wait_event_timeout(rds_ib_ring_empty_wait,
                                rds_ib_ring_empty(&ic->i_recv_ring) &&
                                (atomic_read(&ic->i_signaled_sends) == 0),
                                msecs_to_jiffies(5000))) {

                        /* Try to reap pending RX completions every 5 secs */
                        if (!rds_ib_ring_empty(&ic->i_recv_ring)) {
                                spin_lock_bh(&ic->i_rx_lock);
                                rds_ib_rx(ic);
                                spin_unlock_bh(&ic->i_rx_lock);
                        }
                }

The recv ring was not empty.
w_alloc_ptr = 560
w_free_ptr  = 256

This is what Mellanox had to say:
When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will
generate Async event to notify this error, also the QPs that tries to access
this CQ will be put to error state but will not be flushed since we must not
post CQEs to a broken CQ. The QP that tries to access will also issue an
Async catas event.

In summary we cannot wait for any more WR_FLUSH_ERRs in that state.

Orabug: 29180452

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>
gregmarsden pushed a commit that referenced this pull request Jan 18, 2019
The sequence that leads to this state is as follows.

1) First we see CQ error logged.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core
0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2
vendor_error_syndrome=0x0

2) That is followed by the drop of the associated RDS connection.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection
<192.168.54.43,192.168.54.1,0> dropped due to 'qp event'

3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that.

4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection.

crash64> bt 62577
PID: 62577  TASK: ffff88143f045400  CPU: 4   COMMAND: "kworker/u224:1"
 #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b
 #1 [ffff8813663bbbb0] schedule at ffffffff816abca7
 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71
 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma]
 #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds]
 #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds]
 #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1
 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b
 #8 [ffff8813663bbec0] kthread at ffffffff810a304b
 #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752
crash64>

It was stuck here in rds_ib_conn_shutdown for ever:

                /* quiesce tx and rx completion before tearing down */
                while (!wait_event_timeout(rds_ib_ring_empty_wait,
                                rds_ib_ring_empty(&ic->i_recv_ring) &&
                                (atomic_read(&ic->i_signaled_sends) == 0),
                                msecs_to_jiffies(5000))) {

                        /* Try to reap pending RX completions every 5 secs */
                        if (!rds_ib_ring_empty(&ic->i_recv_ring)) {
                                spin_lock_bh(&ic->i_rx_lock);
                                rds_ib_rx(ic);
                                spin_unlock_bh(&ic->i_rx_lock);
                        }
                }

The recv ring was not empty.
w_alloc_ptr = 560
w_free_ptr  = 256

This is what Mellanox had to say:
When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will
generate Async event to notify this error, also the QPs that tries to access
this CQ will be put to error state but will not be flushed since we must not
post CQEs to a broken CQ. The QP that tries to access will also issue an
Async catas event.

In summary we cannot wait for any more WR_FLUSH_ERRs in that state.

Orabug: 28733324

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>
Signed-off-by: Brian Maly <[email protected]>
gregmarsden pushed a commit that referenced this pull request Feb 1, 2019
commit 764baba upstream.

Commit 31747ed ("ovl: hash directory inodes for fsnotify")
fixed an issue of inotify watch on directory that stops getting
events after dropping dentry caches.

A similar issue exists for non-dir non-upper files, for example:

$ mkdir -p lower upper work merged
$ touch lower/foo
$ mount -t overlay -o
lowerdir=lower,workdir=work,upperdir=upper none merged
$ inotifywait merged/foo &
$ echo 2 > /proc/sys/vm/drop_caches
$ cat merged/foo

inotifywait doesn't get the OPEN event, because ovl_lookup() called
from 'cat' allocates a new overlay inode and does not reuse the
watched inode.

Fix this by hashing non-dir overlay inodes by lower real inode in
the following cases that were not hashed before this change:
 - A non-upper overlay mount
 - A lower non-hardlink when index=off

A helper ovl_hash_bylower() was added to put all the logic and
documentation about which real inode an overlay inode is hashed by
into one place.

The issue dates back to initial version of overlayfs, but this
patch depends on ovl_inode code that was introduced in kernel v4.13.

Cc: <[email protected]> #v4.13
Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Mark Salyzyn <[email protected]> #4.14
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Feb 8, 2019
…_values fails

[ Upstream commit 7915919 ]

Fixes a use-after-free reported by KASAN when later
iscsi_target_login_sess_out gets called and it tries to access
conn->sess->se_sess:

Disabling lock debugging due to kernel taint
iSCSI Login timeout on Network Portal [::]:3260
iSCSI Login negotiation failed.
==================================================================
BUG: KASAN: use-after-free in
iscsi_target_login_sess_out.cold.12+0x58/0xff [iscsi_target_mod]
Read of size 8 at addr ffff880109d070c8 by task iscsi_np/980

CPU: 1 PID: 980 Comm: iscsi_np Tainted: G           O
4.17.8kasan.sess.connops+ #4
Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB,
BIOS 5.6.5 05/19/2014
Call Trace:
 dump_stack+0x71/0xac
 print_address_description+0x65/0x22e
 ? iscsi_target_login_sess_out.cold.12+0x58/0xff [iscsi_target_mod]
 kasan_report.cold.6+0x241/0x2fd
 iscsi_target_login_sess_out.cold.12+0x58/0xff [iscsi_target_mod]
 iscsi_target_login_thread+0x1086/0x1710 [iscsi_target_mod]
 ? __sched_text_start+0x8/0x8
 ? iscsi_target_login_sess_out+0x250/0x250 [iscsi_target_mod]
 ? __kthread_parkme+0xcc/0x100
 ? parse_args.cold.14+0xd3/0xd3
 ? iscsi_target_login_sess_out+0x250/0x250 [iscsi_target_mod]
 kthread+0x1a0/0x1c0
 ? kthread_bind+0x30/0x30
 ret_from_fork+0x35/0x40

Allocated by task 980:
 kasan_kmalloc+0xbf/0xe0
 kmem_cache_alloc_trace+0x112/0x210
 iscsi_target_login_thread+0x816/0x1710 [iscsi_target_mod]
 kthread+0x1a0/0x1c0
 ret_from_fork+0x35/0x40

Freed by task 980:
 __kasan_slab_free+0x125/0x170
 kfree+0x90/0x1d0
 iscsi_target_login_thread+0x1577/0x1710 [iscsi_target_mod]
 kthread+0x1a0/0x1c0
 ret_from_fork+0x35/0x40

The buggy address belongs to the object at ffff880109d06f00
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 456 bytes inside of
 512-byte region [ffff880109d06f00, ffff880109d07100)
The buggy address belongs to the page:
page:ffffea0004274180 count:1 mapcount:0 mapping:0000000000000000
index:0x0 compound_mapcount: 0
flags: 0x17fffc000008100(slab|head)
raw: 017fffc000008100 0000000000000000 0000000000000000 00000001000c000c
raw: dead000000000100 dead000000000200 ffff88011b002e00 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880109d06f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff880109d07000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff880109d07080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                              ^
 ffff880109d07100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff880109d07180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

Signed-off-by: Vincent Pelletier <[email protected]>
[rebased against idr/ida changes and to handle ret review comments from Matthew]
Signed-off-by: Mike Christie <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Reviewed-by: Matthew Wilcox <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 18, 2025
The customer hit this crash few times.

PID: 31556  TASK: ffff880f823caa00  CPU: 1   COMMAND: "cellsrv"
 #0 [ffff880f823db850] machine_kexec at ffffffff8105d93c
 #1 [ffff880f823db8b0] crash_kexec at ffffffff811103b3
 #2 [ffff880f823db980] oops_end at ffffffff8101a788
 #3 [ffff880f823db9b0] no_context at ffffffff8106b9cf
 #4 [ffff880f823dba20] __bad_area_nosemaphore at ffffffff8106bc9d
 #5 [ffff880f823dba70] bad_area at ffffffff8106be97
 #6 [ffff880f823dbaa0] __do_page_fault at ffffffff8106c71e
 #7 [ffff880f823dbb00] do_page_fault at ffffffff8106c81f
 #8 [ffff880f823dbb40] page_fault at ffffffff816b5a9f
    [exception RIP: rds_ib_inc_copy_to_user+104]
    RIP: ffffffffa04607b8  RSP: ffff880f823dbbf8  RFLAGS: 00010287
    RAX: 0000000000000340  RBX: 0000000000001000  RCX: 0000000000004000
    RDX: 0000000000001000  RSI: ffff88176cea2000  RDI: ffff8817d291f520
    RBP: ffff880f823dbc48   R8: 0000000000001340   R9: 0000000000001000
    R10: 0000000000001200  R11: ffff880f823dc000  R12: ffff880f823dbed0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000001000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff880f823dbc50] rds_recvmsg at ffffffffa041d837 [rds]

int rds_ib_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to)
...
...
        ibinc = container_of(inc, struct rds_ib_incoming, ii_inc);
        frag = list_entry(ibinc->ii_frags.next, struct rds_page_frag, f_item);
        len = be32_to_cpu(inc->i_hdr.h_len);
        sg = frag->f_sg;

        while (iov_iter_count(to) && copied < len) {
                to_copy = min_t(unsigned long, iov_iter_count(to),
                                sg->length - frag_off);
                ...

sg is NULL and it crashes accessing sg->length above.

The cause looks like is due to ic->i_frag_sz returning incorrect value.
16KB when 4KB was expected.

                if (copied % ic->i_frag_sz == 0) {
                        frag = list_entry(frag->f_item.next,
                                          struct rds_page_frag, f_item);
                        frag_off = 0;
                        sg = frag->f_sg;
                }

The other end is using 4KB RDS fragsize (Solaris Super Cluster).
This end is UEK4 (4.1.12-94.8.4.el6uek.x86_64).

The message being copied arrived over 4KB RDS frag size connection.
But during the above check ic->i_frag_sz is 16KB.
This can happen during a reconnect at the connection setup phase.
We start off with ic->i_frag_sz as 16KB. Then settle down at 4KB.

Failing this check
  if (copied % ic->i_frag_sz == 0) {
can result in sg not getting set correctly.

Say, "copied" = 4KB but ic->i_frag_sz is 16KB when it should be 4KB.

During race condition with a reconnect, ic->i_frag_sz can be 16KB
even though once the connection is set up it settled down to 4KB.
It can change from 4KB to 16KB and back to 4KB during connection setup
due to reconnect.

We started seeing this crash after bug 26848749.
But prior to that the same scenario could result in data copied to user
from incorrect "sg" resulting in data corruption.

Orabug: 28748008

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>

Orabug: 33590097

UEK6 => UEK7

(cherry picked from commit 14858a3)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>

Orabug: 33590087

UEK7 => LUCI

(cherry picked from commit e86878f)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 18, 2025
…error

The sequence that leads to this state is as follows.

1) First we see CQ error logged.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core
0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2
vendor_error_syndrome=0x0

2) That is followed by the drop of the associated RDS connection.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection
<192.168.54.43,192.168.54.1,0> dropped due to 'qp event'

3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that.

4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection.

crash64> bt 62577
PID: 62577  TASK: ffff88143f045400  CPU: 4   COMMAND: "kworker/u224:1"
 #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b
 #1 [ffff8813663bbbb0] schedule at ffffffff816abca7
 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71
 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma]
 #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds]
 #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds]
 #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1
 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b
 #8 [ffff8813663bbec0] kthread at ffffffff810a304b
 #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752
crash64>

It was stuck here in rds_ib_conn_shutdown for ever:

                /* quiesce tx and rx completion before tearing down */
                while (!wait_event_timeout(rds_ib_ring_empty_wait,
                                rds_ib_ring_empty(&ic->i_recv_ring) &&
                                (atomic_read(&ic->i_signaled_sends) == 0),
                                msecs_to_jiffies(5000))) {

                        /* Try to reap pending RX completions every 5 secs */
                        if (!rds_ib_ring_empty(&ic->i_recv_ring)) {
                                spin_lock_bh(&ic->i_rx_lock);
                                rds_ib_rx(ic);
                                spin_unlock_bh(&ic->i_rx_lock);
                        }
                }

The recv ring was not empty.
w_alloc_ptr = 560
w_free_ptr  = 256

This is what Mellanox had to say:
When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will
generate Async event to notify this error, also the QPs that tries to access
this CQ will be put to error state but will not be flushed since we must not
post CQEs to a broken CQ. The QP that tries to access will also issue an
Async catas event.

In summary we cannot wait for any more WR_FLUSH_ERRs in that state.

Orabug: 29180452

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>

Orabug: 33590097

UEK6 => UEK7

(cherry picked from commit 964cad6)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>

Orabug: 33590087

UEK7 => LUCI

(cherry picked from commit e40c8e4)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 18, 2025
Scenario:
1. Port down and do fail over
2. Ap do rds_bind syscall

PID: 47039  TASK: ffff89887e2fe640  CPU: 47  COMMAND: "kworker/u:6"
 #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
 #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
 #2 [ffff898e35f15b30] oops_end at ffffffff8150f518
 #3 [ffff898e35f15b60] no_context at ffffffff8104854c
 #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
 #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
 #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
 #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff898e35f15dc8  RFLAGS: 00010282
    RAX: 00000000fffffffe  RBX: ffff889b77f6fc00  RCX:ffffffff81c99d88
    RDX: 0000000000000000  RSI: ffff896019ee08e8  RDI:ffff889b77f6fc00
    RBP: ffff898e35f15df0   R8: ffff896019ee08c8  R9:0000000000000000
    R10: 0000000000000400  R11: 0000000000000000  R12:ffff896019ee08c0
    R13: ffff889b77f6fe68  R14: ffffffff81c99d80  R15: ffffffffa022a1e0
    ORIG_RAX: ffffffffffffffff  CS: 0010 SS: 0018
 #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
 #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
 #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
 #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6

PID: 45659  TASK: ffff880d313d2500  CPU: 31  COMMAND: "oracle_45659_ap"
 #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
 #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
 #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
 #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
 #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
 #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
 #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
 #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
 #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670

PID: 45659                          PID: 47039
rds_ib_laddr_check
  /* create id_priv with a null event_handler */
  rdma_create_id
  rdma_bind_addr
    cma_acquire_dev
      /* add id_priv to cma_dev->id_list */
      cma_attach_to_dev
                                    cma_ndev_work_handler
                                      /* event_hanlder is null */
                                      id_priv->id.event_handler

Orabug: 27530931

Signed-off-by: Guanglei Li <[email protected]>
Signed-off-by: Honglei Wang <[email protected]>
Reviewed-by: Junxiao Bi <[email protected]>
Reviewed-by: Yanjun Zhu <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Acked-by: Doug Ledford <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 2c0aa08)
Reviewed-by: Håkon Bugge <[email protected]>
Signed-off-by: Somasundaram Krishnasamy <[email protected]>

Orabug: 33590097

UEK6 => UEK7

(cherry picked from commit 39e0939)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>

Orabug: 33590087

UEK7 => LUCI

(cherry picked from commit 7d342f8)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 18, 2025
Scenario:
1. Port down and do fail over
2. Ap do rds_bind syscall

PID: 47039  TASK: ffff89887e2fe640  CPU: 47  COMMAND: "kworker/u:6"
 #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
 #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
 #2 [ffff898e35f15b30] oops_end at ffffffff8150f518
 #3 [ffff898e35f15b60] no_context at ffffffff8104854c
 #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
 #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
 #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
 #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff898e35f15dc8  RFLAGS: 00010282
    RAX: 00000000fffffffe  RBX: ffff889b77f6fc00  RCX:ffffffff81c99d88
    RDX: 0000000000000000  RSI: ffff896019ee08e8  RDI:ffff889b77f6fc00
    RBP: ffff898e35f15df0   R8: ffff896019ee08c8  R9:0000000000000000
    R10: 0000000000000400  R11: 0000000000000000  R12:ffff896019ee08c0
    R13: ffff889b77f6fe68  R14: ffffffff81c99d80  R15: ffffffffa022a1e0
    ORIG_RAX: ffffffffffffffff  CS: 0010 SS: 0018
 #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
 #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
 #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
 #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6

PID: 45659  TASK: ffff880d313d2500  CPU: 31  COMMAND: "oracle_45659_ap"
 #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
 #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
 #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
 #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
 #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
 #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
 #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
 #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
 #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670

PID: 45659                          PID: 47039
rds_ib_laddr_check
  /* create id_priv with a null event_handler */
  rdma_create_id
  rdma_bind_addr
    cma_acquire_dev
      /* add id_priv to cma_dev->id_list */
      cma_attach_to_dev
                                    cma_ndev_work_handler
                                      /* event_hanlder is null */
                                      id_priv->id.event_handler

Orabug: 27530931

Signed-off-by: Guanglei Li <[email protected]>
Signed-off-by: Honglei Wang <[email protected]>
Reviewed-by: Junxiao Bi <[email protected]>
Reviewed-by: Yanjun Zhu <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Acked-by: Doug Ledford <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 2c0aa08)
Reviewed-by: Håkon Bugge <[email protected]>
Signed-off-by: Somasundaram Krishnasamy <[email protected]>

Orabug: 33590097

UEK6 => UEK7

(cherry picked from commit 39e0939)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>

Orabug: 33590087

UEK7 => LUCI

(cherry picked from commit 7d342f8)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 18, 2025
The customer hit this crash few times.

PID: 31556  TASK: ffff880f823caa00  CPU: 1   COMMAND: "cellsrv"
 #0 [ffff880f823db850] machine_kexec at ffffffff8105d93c
 #1 [ffff880f823db8b0] crash_kexec at ffffffff811103b3
 #2 [ffff880f823db980] oops_end at ffffffff8101a788
 #3 [ffff880f823db9b0] no_context at ffffffff8106b9cf
 #4 [ffff880f823dba20] __bad_area_nosemaphore at ffffffff8106bc9d
 #5 [ffff880f823dba70] bad_area at ffffffff8106be97
 #6 [ffff880f823dbaa0] __do_page_fault at ffffffff8106c71e
 #7 [ffff880f823dbb00] do_page_fault at ffffffff8106c81f
 #8 [ffff880f823dbb40] page_fault at ffffffff816b5a9f
    [exception RIP: rds_ib_inc_copy_to_user+104]
    RIP: ffffffffa04607b8  RSP: ffff880f823dbbf8  RFLAGS: 00010287
    RAX: 0000000000000340  RBX: 0000000000001000  RCX: 0000000000004000
    RDX: 0000000000001000  RSI: ffff88176cea2000  RDI: ffff8817d291f520
    RBP: ffff880f823dbc48   R8: 0000000000001340   R9: 0000000000001000
    R10: 0000000000001200  R11: ffff880f823dc000  R12: ffff880f823dbed0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000001000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff880f823dbc50] rds_recvmsg at ffffffffa041d837 [rds]

int rds_ib_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to)
...
...
        ibinc = container_of(inc, struct rds_ib_incoming, ii_inc);
        frag = list_entry(ibinc->ii_frags.next, struct rds_page_frag, f_item);
        len = be32_to_cpu(inc->i_hdr.h_len);
        sg = frag->f_sg;

        while (iov_iter_count(to) && copied < len) {
                to_copy = min_t(unsigned long, iov_iter_count(to),
                                sg->length - frag_off);
                ...

sg is NULL and it crashes accessing sg->length above.

The cause looks like is due to ic->i_frag_sz returning incorrect value.
16KB when 4KB was expected.

                if (copied % ic->i_frag_sz == 0) {
                        frag = list_entry(frag->f_item.next,
                                          struct rds_page_frag, f_item);
                        frag_off = 0;
                        sg = frag->f_sg;
                }

The other end is using 4KB RDS fragsize (Solaris Super Cluster).
This end is UEK4 (4.1.12-94.8.4.el6uek.x86_64).

The message being copied arrived over 4KB RDS frag size connection.
But during the above check ic->i_frag_sz is 16KB.
This can happen during a reconnect at the connection setup phase.
We start off with ic->i_frag_sz as 16KB. Then settle down at 4KB.

Failing this check
  if (copied % ic->i_frag_sz == 0) {
can result in sg not getting set correctly.

Say, "copied" = 4KB but ic->i_frag_sz is 16KB when it should be 4KB.

During race condition with a reconnect, ic->i_frag_sz can be 16KB
even though once the connection is set up it settled down to 4KB.
It can change from 4KB to 16KB and back to 4KB during connection setup
due to reconnect.

We started seeing this crash after bug 26848749.
But prior to that the same scenario could result in data copied to user
from incorrect "sg" resulting in data corruption.

Orabug: 28748008

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>

Orabug: 33590097

UEK6 => UEK7

(cherry picked from commit 14858a3)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>

Orabug: 33590087

UEK7 => LUCI

(cherry picked from commit e86878f)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 18, 2025
The customer hit this crash few times.

PID: 31556  TASK: ffff880f823caa00  CPU: 1   COMMAND: "cellsrv"
 #0 [ffff880f823db850] machine_kexec at ffffffff8105d93c
 #1 [ffff880f823db8b0] crash_kexec at ffffffff811103b3
 #2 [ffff880f823db980] oops_end at ffffffff8101a788
 #3 [ffff880f823db9b0] no_context at ffffffff8106b9cf
 #4 [ffff880f823dba20] __bad_area_nosemaphore at ffffffff8106bc9d
 #5 [ffff880f823dba70] bad_area at ffffffff8106be97
 #6 [ffff880f823dbaa0] __do_page_fault at ffffffff8106c71e
 #7 [ffff880f823dbb00] do_page_fault at ffffffff8106c81f
 #8 [ffff880f823dbb40] page_fault at ffffffff816b5a9f
    [exception RIP: rds_ib_inc_copy_to_user+104]
    RIP: ffffffffa04607b8  RSP: ffff880f823dbbf8  RFLAGS: 00010287
    RAX: 0000000000000340  RBX: 0000000000001000  RCX: 0000000000004000
    RDX: 0000000000001000  RSI: ffff88176cea2000  RDI: ffff8817d291f520
    RBP: ffff880f823dbc48   R8: 0000000000001340   R9: 0000000000001000
    R10: 0000000000001200  R11: ffff880f823dc000  R12: ffff880f823dbed0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000001000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff880f823dbc50] rds_recvmsg at ffffffffa041d837 [rds]

int rds_ib_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to)
...
...
        ibinc = container_of(inc, struct rds_ib_incoming, ii_inc);
        frag = list_entry(ibinc->ii_frags.next, struct rds_page_frag, f_item);
        len = be32_to_cpu(inc->i_hdr.h_len);
        sg = frag->f_sg;

        while (iov_iter_count(to) && copied < len) {
                to_copy = min_t(unsigned long, iov_iter_count(to),
                                sg->length - frag_off);
                ...

sg is NULL and it crashes accessing sg->length above.

The cause looks like is due to ic->i_frag_sz returning incorrect value.
16KB when 4KB was expected.

                if (copied % ic->i_frag_sz == 0) {
                        frag = list_entry(frag->f_item.next,
                                          struct rds_page_frag, f_item);
                        frag_off = 0;
                        sg = frag->f_sg;
                }

The other end is using 4KB RDS fragsize (Solaris Super Cluster).
This end is UEK4 (4.1.12-94.8.4.el6uek.x86_64).

The message being copied arrived over 4KB RDS frag size connection.
But during the above check ic->i_frag_sz is 16KB.
This can happen during a reconnect at the connection setup phase.
We start off with ic->i_frag_sz as 16KB. Then settle down at 4KB.

Failing this check
  if (copied % ic->i_frag_sz == 0) {
can result in sg not getting set correctly.

Say, "copied" = 4KB but ic->i_frag_sz is 16KB when it should be 4KB.

During race condition with a reconnect, ic->i_frag_sz can be 16KB
even though once the connection is set up it settled down to 4KB.
It can change from 4KB to 16KB and back to 4KB during connection setup
due to reconnect.

We started seeing this crash after bug 26848749.
But prior to that the same scenario could result in data copied to user
from incorrect "sg" resulting in data corruption.

Orabug: 28748008

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>

Orabug: 33590097

UEK6 => UEK7

(cherry picked from commit 14858a3)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>

Orabug: 33590087

UEK7 => LUCI

(cherry picked from commit e86878f)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 18, 2025
…error

The sequence that leads to this state is as follows.

1) First we see CQ error logged.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core
0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2
vendor_error_syndrome=0x0

2) That is followed by the drop of the associated RDS connection.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection
<192.168.54.43,192.168.54.1,0> dropped due to 'qp event'

3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that.

4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection.

crash64> bt 62577
PID: 62577  TASK: ffff88143f045400  CPU: 4   COMMAND: "kworker/u224:1"
 #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b
 #1 [ffff8813663bbbb0] schedule at ffffffff816abca7
 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71
 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma]
 #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds]
 #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds]
 #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1
 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b
 #8 [ffff8813663bbec0] kthread at ffffffff810a304b
 #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752
crash64>

It was stuck here in rds_ib_conn_shutdown for ever:

                /* quiesce tx and rx completion before tearing down */
                while (!wait_event_timeout(rds_ib_ring_empty_wait,
                                rds_ib_ring_empty(&ic->i_recv_ring) &&
                                (atomic_read(&ic->i_signaled_sends) == 0),
                                msecs_to_jiffies(5000))) {

                        /* Try to reap pending RX completions every 5 secs */
                        if (!rds_ib_ring_empty(&ic->i_recv_ring)) {
                                spin_lock_bh(&ic->i_rx_lock);
                                rds_ib_rx(ic);
                                spin_unlock_bh(&ic->i_rx_lock);
                        }
                }

The recv ring was not empty.
w_alloc_ptr = 560
w_free_ptr  = 256

This is what Mellanox had to say:
When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will
generate Async event to notify this error, also the QPs that tries to access
this CQ will be put to error state but will not be flushed since we must not
post CQEs to a broken CQ. The QP that tries to access will also issue an
Async catas event.

In summary we cannot wait for any more WR_FLUSH_ERRs in that state.

Orabug: 29180452

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>

Orabug: 33590097

UEK6 => UEK7

(cherry picked from commit 964cad6)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>

Orabug: 33590087

UEK7 => LUCI

(cherry picked from commit e40c8e4)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 18, 2025
One of our customers reported the following stack.

crash-7.3.0> bt
PID: 250515  TASK: ffff888189482f80  CPU: 1   COMMAND: "vmbackup"
 #0 [ffffc90025017878] die at ffffffff81033c22
 #1 [ffffc900250178a8] do_trap at ffffffff81030990
 #2 [ffffc900250178f8] do_error_trap at ffffffff810311d7
 #3 [ffffc900250179c0] do_invalid_op at ffffffff81031310
 #4 [ffffc900250179d0] invalid_op at ffffffff81a01f2a
    [exception RIP: ocfs2_truncate_rec+1914]
    RIP: ffffffffc1e73b4a  RSP: ffffc90025017a80  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: 0000000000053a75  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff8882d385be08  RDI: ffff8882d385be08
    RBP: ffffc90025017b10   R8: 0000000000000000   R9: 0000000000005900
    R10: 0000000000000001  R11: 0000000000aaaaaa  R12: 0000000000000001
    R13: ffff88829e5a9900  R14: ffffc90025017cf0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: e030  SS: e02b
 #5 [ffffc90025017b18] ocfs2_remove_extent at ffffffffc1e73e6c [ocfs2]
 #6 [ffffc90025017bc8] ocfs2_remove_btree_range at ffffffffc1e745f2 [ocfs2]
 #7 [ffffc90025017c60] ocfs2_commit_truncate at ffffffffc1e75b1f [ocfs2]
 #8 [ffffc90025017d68] __dta_ocfs2_wipe_inode_606 at ffffffffc1e9a3e0 [ocfs2]
 #9 [ffffc90025017dd8] ocfs2_evict_inode at ffffffffc1e9ac10 [ocfs2]
    RIP: 00007f9b26ec8307  RSP: 00007ffc5a193f68  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000ddd0a0  RCX: 00007f9b26ec8307
    RDX: 0000000000000001  RSI: 00007f9b2719e770  RDI: 0000000001010400
    RBP: 0000000001263d80   R8: 0000000000000000   R9: 00000000012146a0
    R10: 000000000000000d  R11: 0000000000000246  R12: 0000000000ddd0a0
    R13: 00007f9b27ba9595  R14: 00007f9b27ca4a50  R15: 00000000ffffffff
    ORIG_RAX: 0000000000000057  CS: 0033  SS: 002b crash-7.3.0>

This crash resulted due to invalid extent record selected for truncate.

At the top of the function ocfs2_truncate_rec(), the code checks if the
first extent record at the leaf extent list corresponding to the input
path is still empty. In that case the tree is rotated left to get rid of
the empty extent record but this rotation did not happen.

But the function ocfs2_truncate_rec() assumes that the top level call
to ocfs2_rotate_tree_left() to get rid of the empty extent always
succeeds and hence it decrements the input "index" value. This results
in selection of a wrong record for truncate that causes to hit a call to
BUG() with the message "Owner %llu: Invalid record truncate: (%u, %u) ".
The stack above is the panic stack caused due to hitting BUG().

Though the function ocfs2_rotate_tree_left() was intended to get rid of
the first empty record in the extent block, it did not call the function
ocfs2_rotate_rightmost_leaf_left() as it did not find h_next_leaf_blk
in the extentleaf block to be zero, instead, it proceeded to call
__ocfs2_rotate_tree_left(). However the input "index" value was indeed
pointing to the last extent record in the leaf block. The macro
path_leaf_bh() was returning rightmost extent block as per the tree-depth.
and the function ocfs2_find_cpos_for_right_leaf() also found out that
the extent block in question is indeed the rightmost and hence there is
nothing to rotate at the last extent record pointed by the input "index"
value. Hence the extent tree in the leaf block was not totated at all.

Hence, the real reason for the above panic is that the value of the field
h_next_leaf_blk in the right most leaf block was non-zero that caused
the tree not to rotate left resulting in selection of invalid record for
truncate.

The reason why h_next_leaf_blk was not cleared for the last extent block
is still not known and the code changes here is a workaround to avoid
the panic by verifying that the extent block in question is indeed the
rightmost leaf block in the tree and then correcting the invalid
h_next_leaf_blk value. These changes have been verified by the customer
by running the provided rpm in their env.

Orabug: 34393593

Signed-off-by: Gautham Ananthakrishna <[email protected]>
Reviewed-by: Junxiao Bi <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 18, 2025
…error

The sequence that leads to this state is as follows.

1) First we see CQ error logged.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core
0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2
vendor_error_syndrome=0x0

2) That is followed by the drop of the associated RDS connection.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection
<192.168.54.43,192.168.54.1,0> dropped due to 'qp event'

3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that.

4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection.

crash64> bt 62577
PID: 62577  TASK: ffff88143f045400  CPU: 4   COMMAND: "kworker/u224:1"
 #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b
 #1 [ffff8813663bbbb0] schedule at ffffffff816abca7
 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71
 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma]
 #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds]
 #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds]
 #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1
 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b
 #8 [ffff8813663bbec0] kthread at ffffffff810a304b
 #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752
crash64>

It was stuck here in rds_ib_conn_shutdown for ever:

                /* quiesce tx and rx completion before tearing down */
                while (!wait_event_timeout(rds_ib_ring_empty_wait,
                                rds_ib_ring_empty(&ic->i_recv_ring) &&
                                (atomic_read(&ic->i_signaled_sends) == 0),
                                msecs_to_jiffies(5000))) {

                        /* Try to reap pending RX completions every 5 secs */
                        if (!rds_ib_ring_empty(&ic->i_recv_ring)) {
                                spin_lock_bh(&ic->i_rx_lock);
                                rds_ib_rx(ic);
                                spin_unlock_bh(&ic->i_rx_lock);
                        }
                }

The recv ring was not empty.
w_alloc_ptr = 560
w_free_ptr  = 256

This is what Mellanox had to say:
When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will
generate Async event to notify this error, also the QPs that tries to access
this CQ will be put to error state but will not be flushed since we must not
post CQEs to a broken CQ. The QP that tries to access will also issue an
Async catas event.

In summary we cannot wait for any more WR_FLUSH_ERRs in that state.

Orabug: 29180452

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>

Orabug: 33590097

UEK6 => UEK7

(cherry picked from commit 964cad6)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>

Orabug: 33590087

UEK7 => LUCI

(cherry picked from commit e40c8e4)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 18, 2025
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879156
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <[email protected]>
Signed-off-by: Manjunath Patil <[email protected]>
Reviewed-by: Si-Wei Liu <[email protected]>
Reviewed-by: Jack Vogel <[email protected]>
Signed-off-by: Brian Maly <[email protected]>
(cherry picked from commit 0dd4b99)

Orabug: 36879126
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Signed-off-by: Harshvardhan Jha <[email protected]>
Reviewed-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 18, 2025
One of our customers reported the following stack.

crash-7.3.0> bt
PID: 250515  TASK: ffff888189482f80  CPU: 1   COMMAND: "vmbackup"
 #0 [ffffc90025017878] die at ffffffff81033c22
 #1 [ffffc900250178a8] do_trap at ffffffff81030990
 #2 [ffffc900250178f8] do_error_trap at ffffffff810311d7
 #3 [ffffc900250179c0] do_invalid_op at ffffffff81031310
 #4 [ffffc900250179d0] invalid_op at ffffffff81a01f2a
    [exception RIP: ocfs2_truncate_rec+1914]
    RIP: ffffffffc1e73b4a  RSP: ffffc90025017a80  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: 0000000000053a75  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff8882d385be08  RDI: ffff8882d385be08
    RBP: ffffc90025017b10   R8: 0000000000000000   R9: 0000000000005900
    R10: 0000000000000001  R11: 0000000000aaaaaa  R12: 0000000000000001
    R13: ffff88829e5a9900  R14: ffffc90025017cf0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: e030  SS: e02b
 #5 [ffffc90025017b18] ocfs2_remove_extent at ffffffffc1e73e6c [ocfs2]
 #6 [ffffc90025017bc8] ocfs2_remove_btree_range at ffffffffc1e745f2 [ocfs2]
 #7 [ffffc90025017c60] ocfs2_commit_truncate at ffffffffc1e75b1f [ocfs2]
 #8 [ffffc90025017d68] __dta_ocfs2_wipe_inode_606 at ffffffffc1e9a3e0 [ocfs2]
 #9 [ffffc90025017dd8] ocfs2_evict_inode at ffffffffc1e9ac10 [ocfs2]
    RIP: 00007f9b26ec8307  RSP: 00007ffc5a193f68  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000ddd0a0  RCX: 00007f9b26ec8307
    RDX: 0000000000000001  RSI: 00007f9b2719e770  RDI: 0000000001010400
    RBP: 0000000001263d80   R8: 0000000000000000   R9: 00000000012146a0
    R10: 000000000000000d  R11: 0000000000000246  R12: 0000000000ddd0a0
    R13: 00007f9b27ba9595  R14: 00007f9b27ca4a50  R15: 00000000ffffffff
    ORIG_RAX: 0000000000000057  CS: 0033  SS: 002b crash-7.3.0>

This crash resulted due to invalid extent record selected for truncate.

At the top of the function ocfs2_truncate_rec(), the code checks if the
first extent record at the leaf extent list corresponding to the input
path is still empty. In that case the tree is rotated left to get rid of
the empty extent record but this rotation did not happen.

But the function ocfs2_truncate_rec() assumes that the top level call
to ocfs2_rotate_tree_left() to get rid of the empty extent always
succeeds and hence it decrements the input "index" value. This results
in selection of a wrong record for truncate that causes to hit a call to
BUG() with the message "Owner %llu: Invalid record truncate: (%u, %u) ".
The stack above is the panic stack caused due to hitting BUG().

Though the function ocfs2_rotate_tree_left() was intended to get rid of
the first empty record in the extent block, it did not call the function
ocfs2_rotate_rightmost_leaf_left() as it did not find h_next_leaf_blk
in the extentleaf block to be zero, instead, it proceeded to call
__ocfs2_rotate_tree_left(). However the input "index" value was indeed
pointing to the last extent record in the leaf block. The macro
path_leaf_bh() was returning rightmost extent block as per the tree-depth.
and the function ocfs2_find_cpos_for_right_leaf() also found out that
the extent block in question is indeed the rightmost and hence there is
nothing to rotate at the last extent record pointed by the input "index"
value. Hence the extent tree in the leaf block was not totated at all.

Hence, the real reason for the above panic is that the value of the field
h_next_leaf_blk in the right most leaf block was non-zero that caused
the tree not to rotate left resulting in selection of invalid record for
truncate.

The reason why h_next_leaf_blk was not cleared for the last extent block
is still not known and the code changes here is a workaround to avoid
the panic by verifying that the extent block in question is indeed the
rightmost leaf block in the tree and then correcting the invalid
h_next_leaf_blk value. These changes have been verified by the customer
by running the provided rpm in their env.

Orabug: 34393593

Signed-off-by: Gautham Ananthakrishna <[email protected]>
Reviewed-by: Junxiao Bi <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 18, 2025
One of our customers reported the following stack.

crash-7.3.0> bt
PID: 250515  TASK: ffff888189482f80  CPU: 1   COMMAND: "vmbackup"
 #0 [ffffc90025017878] die at ffffffff81033c22
 #1 [ffffc900250178a8] do_trap at ffffffff81030990
 #2 [ffffc900250178f8] do_error_trap at ffffffff810311d7
 #3 [ffffc900250179c0] do_invalid_op at ffffffff81031310
 #4 [ffffc900250179d0] invalid_op at ffffffff81a01f2a
    [exception RIP: ocfs2_truncate_rec+1914]
    RIP: ffffffffc1e73b4a  RSP: ffffc90025017a80  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: 0000000000053a75  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff8882d385be08  RDI: ffff8882d385be08
    RBP: ffffc90025017b10   R8: 0000000000000000   R9: 0000000000005900
    R10: 0000000000000001  R11: 0000000000aaaaaa  R12: 0000000000000001
    R13: ffff88829e5a9900  R14: ffffc90025017cf0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: e030  SS: e02b
 #5 [ffffc90025017b18] ocfs2_remove_extent at ffffffffc1e73e6c [ocfs2]
 #6 [ffffc90025017bc8] ocfs2_remove_btree_range at ffffffffc1e745f2 [ocfs2]
 #7 [ffffc90025017c60] ocfs2_commit_truncate at ffffffffc1e75b1f [ocfs2]
 #8 [ffffc90025017d68] __dta_ocfs2_wipe_inode_606 at ffffffffc1e9a3e0 [ocfs2]
 #9 [ffffc90025017dd8] ocfs2_evict_inode at ffffffffc1e9ac10 [ocfs2]
    RIP: 00007f9b26ec8307  RSP: 00007ffc5a193f68  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000ddd0a0  RCX: 00007f9b26ec8307
    RDX: 0000000000000001  RSI: 00007f9b2719e770  RDI: 0000000001010400
    RBP: 0000000001263d80   R8: 0000000000000000   R9: 00000000012146a0
    R10: 000000000000000d  R11: 0000000000000246  R12: 0000000000ddd0a0
    R13: 00007f9b27ba9595  R14: 00007f9b27ca4a50  R15: 00000000ffffffff
    ORIG_RAX: 0000000000000057  CS: 0033  SS: 002b crash-7.3.0>

This crash resulted due to invalid extent record selected for truncate.

At the top of the function ocfs2_truncate_rec(), the code checks if the
first extent record at the leaf extent list corresponding to the input
path is still empty. In that case the tree is rotated left to get rid of
the empty extent record but this rotation did not happen.

But the function ocfs2_truncate_rec() assumes that the top level call
to ocfs2_rotate_tree_left() to get rid of the empty extent always
succeeds and hence it decrements the input "index" value. This results
in selection of a wrong record for truncate that causes to hit a call to
BUG() with the message "Owner %llu: Invalid record truncate: (%u, %u) ".
The stack above is the panic stack caused due to hitting BUG().

Though the function ocfs2_rotate_tree_left() was intended to get rid of
the first empty record in the extent block, it did not call the function
ocfs2_rotate_rightmost_leaf_left() as it did not find h_next_leaf_blk
in the extentleaf block to be zero, instead, it proceeded to call
__ocfs2_rotate_tree_left(). However the input "index" value was indeed
pointing to the last extent record in the leaf block. The macro
path_leaf_bh() was returning rightmost extent block as per the tree-depth.
and the function ocfs2_find_cpos_for_right_leaf() also found out that
the extent block in question is indeed the rightmost and hence there is
nothing to rotate at the last extent record pointed by the input "index"
value. Hence the extent tree in the leaf block was not totated at all.

Hence, the real reason for the above panic is that the value of the field
h_next_leaf_blk in the right most leaf block was non-zero that caused
the tree not to rotate left resulting in selection of invalid record for
truncate.

The reason why h_next_leaf_blk was not cleared for the last extent block
is still not known and the code changes here is a workaround to avoid
the panic by verifying that the extent block in question is indeed the
rightmost leaf block in the tree and then correcting the invalid
h_next_leaf_blk value. These changes have been verified by the customer
by running the provided rpm in their env.

Orabug: 34393593

Signed-off-by: Gautham Ananthakrishna <[email protected]>
Reviewed-by: Junxiao Bi <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 18, 2025
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879156
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <[email protected]>
Signed-off-by: Manjunath Patil <[email protected]>
Reviewed-by: Si-Wei Liu <[email protected]>
Reviewed-by: Jack Vogel <[email protected]>
Signed-off-by: Brian Maly <[email protected]>
(cherry picked from commit 0dd4b99)

Orabug: 36879126
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Signed-off-by: Harshvardhan Jha <[email protected]>
Reviewed-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 18, 2025
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879156
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <[email protected]>
Signed-off-by: Manjunath Patil <[email protected]>
Reviewed-by: Si-Wei Liu <[email protected]>
Reviewed-by: Jack Vogel <[email protected]>
Signed-off-by: Brian Maly <[email protected]>
(cherry picked from commit 0dd4b99)

Orabug: 36879126
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Signed-off-by: Harshvardhan Jha <[email protected]>
Reviewed-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Apr 25, 2025
Attempt to enable IPsec packet offload in tunnel mode in debug kernel
generates the following kernel panic, which is happening due to two
issues:
1. In SA add section, the should be _bh() variant when marking SA mode.
2. There is not needed flush_workqueue in SA delete routine. It is not
needed as at this stage as it is removed from SADB and the running work
will be canceled later in SA free.

 =====================================================
 WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
 6.12.0+ #4 Not tainted
 -----------------------------------------------------
 charon/1337 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
 ffff88810f365020 (&xa->xa_lock#24){+.+.}-{3:3}, at: mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]

 and this task is already holding:
 ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30
 which would create a new lock dependency:
  (&x->lock){+.-.}-{3:3} -> (&xa->xa_lock#24){+.+.}-{3:3}

 but this new dependency connects a SOFTIRQ-irq-safe lock:
  (&x->lock){+.-.}-{3:3}

 ... which became SOFTIRQ-irq-safe at:
   lock_acquire+0x1be/0x520
   _raw_spin_lock_bh+0x34/0x40
   xfrm_timer_handler+0x91/0xd70
   __hrtimer_run_queues+0x1dd/0xa60
   hrtimer_run_softirq+0x146/0x2e0
   handle_softirqs+0x266/0x860
   irq_exit_rcu+0x115/0x1a0
   sysvec_apic_timer_interrupt+0x6e/0x90
   asm_sysvec_apic_timer_interrupt+0x16/0x20
   default_idle+0x13/0x20
   default_idle_call+0x67/0xa0
   do_idle+0x2da/0x320
   cpu_startup_entry+0x50/0x60
   start_secondary+0x213/0x2a0
   common_startup_64+0x129/0x138

 to a SOFTIRQ-irq-unsafe lock:
  (&xa->xa_lock#24){+.+.}-{3:3}

 ... which became SOFTIRQ-irq-unsafe at:
 ...
   lock_acquire+0x1be/0x520
   _raw_spin_lock+0x2c/0x40
   xa_set_mark+0x70/0x110
   mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core]
   xfrm_dev_state_add+0x3bb/0xd70
   xfrm_add_sa+0x2451/0x4a90
   xfrm_user_rcv_msg+0x493/0x880
   netlink_rcv_skb+0x12e/0x380
   xfrm_netlink_rcv+0x6d/0x90
   netlink_unicast+0x42f/0x740
   netlink_sendmsg+0x745/0xbe0
   __sock_sendmsg+0xc5/0x190
   __sys_sendto+0x1fe/0x2c0
   __x64_sys_sendto+0xdc/0x1b0
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

 other info that might help us debug this:

  Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&xa->xa_lock#24);
                                local_irq_disable();
                                lock(&x->lock);
                                lock(&xa->xa_lock#24);
   <Interrupt>
     lock(&x->lock);

  *** DEADLOCK ***

 2 locks held by charon/1337:
  #0: ffffffff87f8f858 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x5e/0x90
  #1: ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30

 the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
 -> (&x->lock){+.-.}-{3:3} ops: 29 {
    HARDIRQ-ON-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock_bh+0x34/0x40
                     xfrm_alloc_spi+0xc0/0xe60
                     xfrm_alloc_userspi+0x5f6/0xbc0
                     xfrm_user_rcv_msg+0x493/0x880
                     netlink_rcv_skb+0x12e/0x380
                     xfrm_netlink_rcv+0x6d/0x90
                     netlink_unicast+0x42f/0x740
                     netlink_sendmsg+0x745/0xbe0
                     __sock_sendmsg+0xc5/0x190
                     __sys_sendto+0x1fe/0x2c0
                     __x64_sys_sendto+0xdc/0x1b0
                     do_syscall_64+0x6d/0x140
                     entry_SYSCALL_64_after_hwframe+0x4b/0x53
    IN-SOFTIRQ-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock_bh+0x34/0x40
                     xfrm_timer_handler+0x91/0xd70
                     __hrtimer_run_queues+0x1dd/0xa60
                     hrtimer_run_softirq+0x146/0x2e0
                     handle_softirqs+0x266/0x860
                     irq_exit_rcu+0x115/0x1a0
                     sysvec_apic_timer_interrupt+0x6e/0x90
                     asm_sysvec_apic_timer_interrupt+0x16/0x20
                     default_idle+0x13/0x20
                     default_idle_call+0x67/0xa0
                     do_idle+0x2da/0x320
                     cpu_startup_entry+0x50/0x60
                     start_secondary+0x213/0x2a0
                     common_startup_64+0x129/0x138
    INITIAL USE at:
                    lock_acquire+0x1be/0x520
                    _raw_spin_lock_bh+0x34/0x40
                    xfrm_alloc_spi+0xc0/0xe60
                    xfrm_alloc_userspi+0x5f6/0xbc0
                    xfrm_user_rcv_msg+0x493/0x880
                    netlink_rcv_skb+0x12e/0x380
                    xfrm_netlink_rcv+0x6d/0x90
                    netlink_unicast+0x42f/0x740
                    netlink_sendmsg+0x745/0xbe0
                    __sock_sendmsg+0xc5/0x190
                    __sys_sendto+0x1fe/0x2c0
                    __x64_sys_sendto+0xdc/0x1b0
                    do_syscall_64+0x6d/0x140
                    entry_SYSCALL_64_after_hwframe+0x4b/0x53
  }
  ... key      at: [<ffffffff87f9cd20>] __key.18+0x0/0x40

 the dependencies between the lock to be acquired
  and SOFTIRQ-irq-unsafe lock:
 -> (&xa->xa_lock#24){+.+.}-{3:3} ops: 9 {
    HARDIRQ-ON-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock_bh+0x34/0x40
                     mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core]
                     xfrm_dev_state_add+0x3bb/0xd70
                     xfrm_add_sa+0x2451/0x4a90
                     xfrm_user_rcv_msg+0x493/0x880
                     netlink_rcv_skb+0x12e/0x380
                     xfrm_netlink_rcv+0x6d/0x90
                     netlink_unicast+0x42f/0x740
                     netlink_sendmsg+0x745/0xbe0
                     __sock_sendmsg+0xc5/0x190
                     __sys_sendto+0x1fe/0x2c0
                     __x64_sys_sendto+0xdc/0x1b0
                     do_syscall_64+0x6d/0x140
                     entry_SYSCALL_64_after_hwframe+0x4b/0x53
    SOFTIRQ-ON-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock+0x2c/0x40
                     xa_set_mark+0x70/0x110
                     mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core]
                     xfrm_dev_state_add+0x3bb/0xd70
                     xfrm_add_sa+0x2451/0x4a90
                     xfrm_user_rcv_msg+0x493/0x880
                     netlink_rcv_skb+0x12e/0x380
                     xfrm_netlink_rcv+0x6d/0x90
                     netlink_unicast+0x42f/0x740
                     netlink_sendmsg+0x745/0xbe0
                     __sock_sendmsg+0xc5/0x190
                     __sys_sendto+0x1fe/0x2c0
                     __x64_sys_sendto+0xdc/0x1b0
                     do_syscall_64+0x6d/0x140
                     entry_SYSCALL_64_after_hwframe+0x4b/0x53
    INITIAL USE at:
                    lock_acquire+0x1be/0x520
                    _raw_spin_lock_bh+0x34/0x40
                    mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core]
                    xfrm_dev_state_add+0x3bb/0xd70
                    xfrm_add_sa+0x2451/0x4a90
                    xfrm_user_rcv_msg+0x493/0x880
                    netlink_rcv_skb+0x12e/0x380
                    xfrm_netlink_rcv+0x6d/0x90
                    netlink_unicast+0x42f/0x740
                    netlink_sendmsg+0x745/0xbe0
                    __sock_sendmsg+0xc5/0x190
                    __sys_sendto+0x1fe/0x2c0
                    __x64_sys_sendto+0xdc/0x1b0
                    do_syscall_64+0x6d/0x140
                    entry_SYSCALL_64_after_hwframe+0x4b/0x53
  }
  ... key      at: [<ffffffffa078ff60>] __key.48+0x0/0xfffffffffff210a0 [mlx5_core]
  ... acquired at:
    __lock_acquire+0x30a0/0x5040
    lock_acquire+0x1be/0x520
    _raw_spin_lock_bh+0x34/0x40
    mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
    xfrm_dev_state_delete+0x90/0x160
    __xfrm_state_delete+0x662/0xae0
    xfrm_state_delete+0x1e/0x30
    xfrm_del_sa+0x1c2/0x340
    xfrm_user_rcv_msg+0x493/0x880
    netlink_rcv_skb+0x12e/0x380
    xfrm_netlink_rcv+0x6d/0x90
    netlink_unicast+0x42f/0x740
    netlink_sendmsg+0x745/0xbe0
    __sock_sendmsg+0xc5/0x190
    __sys_sendto+0x1fe/0x2c0
    __x64_sys_sendto+0xdc/0x1b0
    do_syscall_64+0x6d/0x140
    entry_SYSCALL_64_after_hwframe+0x4b/0x53

 stack backtrace:
 CPU: 7 UID: 0 PID: 1337 Comm: charon Not tainted 6.12.0+ #4
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x74/0xd0
  check_irq_usage+0x12e8/0x1d90
  ? print_shortest_lock_dependencies_backwards+0x1b0/0x1b0
  ? check_chain_key+0x1bb/0x4c0
  ? __lockdep_reset_lock+0x180/0x180
  ? check_path.constprop.0+0x24/0x50
  ? mark_lock+0x108/0x2fb0
  ? print_circular_bug+0x9b0/0x9b0
  ? mark_lock+0x108/0x2fb0
  ? print_usage_bug.part.0+0x670/0x670
  ? check_prev_add+0x1c4/0x2310
  check_prev_add+0x1c4/0x2310
  __lock_acquire+0x30a0/0x5040
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  lock_acquire+0x1be/0x520
  ? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  ? __xfrm_state_delete+0x5f0/0xae0
  ? lock_downgrade+0x6b0/0x6b0
  _raw_spin_lock_bh+0x34/0x40
  ? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
  mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
  xfrm_dev_state_delete+0x90/0x160
  __xfrm_state_delete+0x662/0xae0
  xfrm_state_delete+0x1e/0x30
  xfrm_del_sa+0x1c2/0x340
  ? xfrm_get_sa+0x250/0x250
  ? check_chain_key+0x1bb/0x4c0
  xfrm_user_rcv_msg+0x493/0x880
  ? copy_sec_ctx+0x270/0x270
  ? check_chain_key+0x1bb/0x4c0
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  netlink_rcv_skb+0x12e/0x380
  ? copy_sec_ctx+0x270/0x270
  ? netlink_ack+0xd90/0xd90
  ? netlink_deliver_tap+0xcd/0xb60
  xfrm_netlink_rcv+0x6d/0x90
  netlink_unicast+0x42f/0x740
  ? netlink_attachskb+0x730/0x730
  ? lock_acquire+0x1be/0x520
  netlink_sendmsg+0x745/0xbe0
  ? netlink_unicast+0x740/0x740
  ? __might_fault+0xbb/0x170
  ? netlink_unicast+0x740/0x740
  __sock_sendmsg+0xc5/0x190
  ? fdget+0x163/0x1d0
  __sys_sendto+0x1fe/0x2c0
  ? __x64_sys_getpeername+0xb0/0xb0
  ? do_user_addr_fault+0x856/0xe30
  ? lock_acquire+0x1be/0x520
  ? __task_pid_nr_ns+0x117/0x410
  ? lock_downgrade+0x6b0/0x6b0
  __x64_sys_sendto+0xdc/0x1b0
  ? lockdep_hardirqs_on_prepare+0x284/0x400
  do_syscall_64+0x6d/0x140
  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 RIP: 0033:0x7f7d31291ba4
 Code: 7d e8 89 4d d4 e8 4c 42 f7 ff 44 8b 4d d0 4c 8b 45 c8 89 c3 44 8b 55 d4 8b 7d e8 b8 2c 00 00 00 48 8b 55 d8 48 8b 75 e0 0f 05 <48> 3d 00 f0 ff ff 77 34 89 df 48 89 45 e8 e8 99 42 f7 ff 48 8b 45
 RSP: 002b:00007f7d2ccd94f0 EFLAGS: 00000297 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7d31291ba4
 RDX: 0000000000000028 RSI: 00007f7d2ccd96a0 RDI: 000000000000000a
 RBP: 00007f7d2ccd9530 R08: 00007f7d2ccd9598 R09: 000000000000000c
 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000028
 R13: 00007f7d2ccd9598 R14: 00007f7d2ccd96a0 R15: 00000000000000e1
  </TASK>

Fixes: 4c24272 ("net/mlx5e: Listen to ARP events to update IPsec L2 headers in tunnel mode")
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

Orabug: 37710815

(cherry picked from commit 2c36880)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 2c36880

Signed-off-by: Mikhael Goikhman <[email protected]>
Signed-off-by: Qing Huang <[email protected]>
Reviewed-by: Sharath Srinivasan <[email protected]>
Signed-off-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request May 2, 2025
[ Upstream commit b61e69bb1c049cf507e3c654fa3dc1568231bd07 ]

syzbot report a deadlock in diFree. [1]

When calling "ioctl$LOOP_SET_STATUS64", the offset value passed in is 4,
which does not match the mounted loop device, causing the mapping of the
mounted loop device to be invalidated.

When creating the directory and creating the inode of iag in diReadSpecial(),
read the page of fixed disk inode (AIT) in raw mode in read_metapage(), the
metapage data it returns is corrupted, which causes the nlink value of 0 to be
assigned to the iag inode when executing copy_from_dinode(), which ultimately
causes a deadlock when entering diFree().

To avoid this, first check the nlink value of dinode before setting iag inode.

[1]
WARNING: possible recursive locking detected
6.12.0-rc7-syzkaller-00212-g4a5df3796467 #0 Not tainted
--------------------------------------------
syz-executor301/5309 is trying to acquire lock:
ffff888044548920 (&(imap->im_aglock[index])){+.+.}-{3:3}, at: diFree+0x37c/0x2fb0 fs/jfs/jfs_imap.c:889

but task is already holding lock:
ffff888044548920 (&(imap->im_aglock[index])){+.+.}-{3:3}, at: diAlloc+0x1b6/0x1630

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(imap->im_aglock[index]));
  lock(&(imap->im_aglock[index]));

 *** DEADLOCK ***

 May be due to missing lock nesting notation

5 locks held by syz-executor301/5309:
 #0: ffff8880422a4420 (sb_writers#9){.+.+}-{0:0}, at: mnt_want_write+0x3f/0x90 fs/namespace.c:515
 #1: ffff88804755b390 (&type->i_mutex_dir_key#6/1){+.+.}-{3:3}, at: inode_lock_nested include/linux/fs.h:850 [inline]
 #1: ffff88804755b390 (&type->i_mutex_dir_key#6/1){+.+.}-{3:3}, at: filename_create+0x260/0x540 fs/namei.c:4026
 #2: ffff888044548920 (&(imap->im_aglock[index])){+.+.}-{3:3}, at: diAlloc+0x1b6/0x1630
 #3: ffff888044548890 (&imap->im_freelock){+.+.}-{3:3}, at: diNewIAG fs/jfs/jfs_imap.c:2460 [inline]
 #3: ffff888044548890 (&imap->im_freelock){+.+.}-{3:3}, at: diAllocExt fs/jfs/jfs_imap.c:1905 [inline]
 #3: ffff888044548890 (&imap->im_freelock){+.+.}-{3:3}, at: diAllocAG+0x4b7/0x1e50 fs/jfs/jfs_imap.c:1669
 #4: ffff88804755a618 (&jfs_ip->rdwrlock/1){++++}-{3:3}, at: diNewIAG fs/jfs/jfs_imap.c:2477 [inline]
 #4: ffff88804755a618 (&jfs_ip->rdwrlock/1){++++}-{3:3}, at: diAllocExt fs/jfs/jfs_imap.c:1905 [inline]
 #4: ffff88804755a618 (&jfs_ip->rdwrlock/1){++++}-{3:3}, at: diAllocAG+0x869/0x1e50 fs/jfs/jfs_imap.c:1669

stack backtrace:
CPU: 0 UID: 0 PID: 5309 Comm: syz-executor301 Not tainted 6.12.0-rc7-syzkaller-00212-g4a5df3796467 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 print_deadlock_bug+0x483/0x620 kernel/locking/lockdep.c:3037
 check_deadlock kernel/locking/lockdep.c:3089 [inline]
 validate_chain+0x15e2/0x5920 kernel/locking/lockdep.c:3891
 __lock_acquire+0x1384/0x2050 kernel/locking/lockdep.c:5202
 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
 __mutex_lock_common kernel/locking/mutex.c:608 [inline]
 __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
 diFree+0x37c/0x2fb0 fs/jfs/jfs_imap.c:889
 jfs_evict_inode+0x32d/0x440 fs/jfs/inode.c:156
 evict+0x4e8/0x9b0 fs/inode.c:725
 diFreeSpecial fs/jfs/jfs_imap.c:552 [inline]
 duplicateIXtree+0x3c6/0x550 fs/jfs/jfs_imap.c:3022
 diNewIAG fs/jfs/jfs_imap.c:2597 [inline]
 diAllocExt fs/jfs/jfs_imap.c:1905 [inline]
 diAllocAG+0x17dc/0x1e50 fs/jfs/jfs_imap.c:1669
 diAlloc+0x1d2/0x1630 fs/jfs/jfs_imap.c:1590
 ialloc+0x8f/0x900 fs/jfs/jfs_inode.c:56
 jfs_mkdir+0x1c5/0xba0 fs/jfs/namei.c:225
 vfs_mkdir+0x2f9/0x4f0 fs/namei.c:4257
 do_mkdirat+0x264/0x3a0 fs/namei.c:4280
 __do_sys_mkdirat fs/namei.c:4295 [inline]
 __se_sys_mkdirat fs/namei.c:4293 [inline]
 __x64_sys_mkdirat+0x87/0xa0 fs/namei.c:4293
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=355da3b3a74881008e8f
Signed-off-by: Edward Adam Davis <[email protected]>
Signed-off-by: Dave Kleikamp <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit aeb926e605f97857504bdf748f575e40617e2ef9)
Signed-off-by: Jack Vogel <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request May 2, 2025
commit 93ae6e68b6d6b62d92b3a89d1c253d4a1721a1d3 upstream.

We have recently seen report of lockdep circular lock dependency warnings
on platforms like Skylake and Kabylake:

 ======================================================
 WARNING: possible circular locking dependency detected
 6.14.0-rc6-CI_DRM_16276-gca2c04fe76e8+ #1 Not tainted
 ------------------------------------------------------
 swapper/0/1 is trying to acquire lock:
 ffffffff8360ee48 (iommu_probe_device_lock){+.+.}-{3:3},
   at: iommu_probe_device+0x1d/0x70

 but task is already holding lock:
 ffff888102c7efa8 (&device->physical_node_lock){+.+.}-{3:3},
   at: intel_iommu_init+0xe75/0x11f0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #6 (&device->physical_node_lock){+.+.}-{3:3}:
        __mutex_lock+0xb4/0xe40
        mutex_lock_nested+0x1b/0x30
        intel_iommu_init+0xe75/0x11f0
        pci_iommu_init+0x13/0x70
        do_one_initcall+0x62/0x3f0
        kernel_init_freeable+0x3da/0x6a0
        kernel_init+0x1b/0x200
        ret_from_fork+0x44/0x70
        ret_from_fork_asm+0x1a/0x30

 -> #5 (dmar_global_lock){++++}-{3:3}:
        down_read+0x43/0x1d0
        enable_drhd_fault_handling+0x21/0x110
        cpuhp_invoke_callback+0x4c6/0x870
        cpuhp_issue_call+0xbf/0x1f0
        __cpuhp_setup_state_cpuslocked+0x111/0x320
        __cpuhp_setup_state+0xb0/0x220
        irq_remap_enable_fault_handling+0x3f/0xa0
        apic_intr_mode_init+0x5c/0x110
        x86_late_time_init+0x24/0x40
        start_kernel+0x895/0xbd0
        x86_64_start_reservations+0x18/0x30
        x86_64_start_kernel+0xbf/0x110
        common_startup_64+0x13e/0x141

 -> #4 (cpuhp_state_mutex){+.+.}-{3:3}:
        __mutex_lock+0xb4/0xe40
        mutex_lock_nested+0x1b/0x30
        __cpuhp_setup_state_cpuslocked+0x67/0x320
        __cpuhp_setup_state+0xb0/0x220
        page_alloc_init_cpuhp+0x2d/0x60
        mm_core_init+0x18/0x2c0
        start_kernel+0x576/0xbd0
        x86_64_start_reservations+0x18/0x30
        x86_64_start_kernel+0xbf/0x110
        common_startup_64+0x13e/0x141

 -> #3 (cpu_hotplug_lock){++++}-{0:0}:
        __cpuhp_state_add_instance+0x4f/0x220
        iova_domain_init_rcaches+0x214/0x280
        iommu_setup_dma_ops+0x1a4/0x710
        iommu_device_register+0x17d/0x260
        intel_iommu_init+0xda4/0x11f0
        pci_iommu_init+0x13/0x70
        do_one_initcall+0x62/0x3f0
        kernel_init_freeable+0x3da/0x6a0
        kernel_init+0x1b/0x200
        ret_from_fork+0x44/0x70
        ret_from_fork_asm+0x1a/0x30

 -> #2 (&domain->iova_cookie->mutex){+.+.}-{3:3}:
        __mutex_lock+0xb4/0xe40
        mutex_lock_nested+0x1b/0x30
        iommu_setup_dma_ops+0x16b/0x710
        iommu_device_register+0x17d/0x260
        intel_iommu_init+0xda4/0x11f0
        pci_iommu_init+0x13/0x70
        do_one_initcall+0x62/0x3f0
        kernel_init_freeable+0x3da/0x6a0
        kernel_init+0x1b/0x200
        ret_from_fork+0x44/0x70
        ret_from_fork_asm+0x1a/0x30

 -> #1 (&group->mutex){+.+.}-{3:3}:
        __mutex_lock+0xb4/0xe40
        mutex_lock_nested+0x1b/0x30
        __iommu_probe_device+0x24c/0x4e0
        probe_iommu_group+0x2b/0x50
        bus_for_each_dev+0x7d/0xe0
        iommu_device_register+0xe1/0x260
        intel_iommu_init+0xda4/0x11f0
        pci_iommu_init+0x13/0x70
        do_one_initcall+0x62/0x3f0
        kernel_init_freeable+0x3da/0x6a0
        kernel_init+0x1b/0x200
        ret_from_fork+0x44/0x70
        ret_from_fork_asm+0x1a/0x30

 -> #0 (iommu_probe_device_lock){+.+.}-{3:3}:
        __lock_acquire+0x1637/0x2810
        lock_acquire+0xc9/0x300
        __mutex_lock+0xb4/0xe40
        mutex_lock_nested+0x1b/0x30
        iommu_probe_device+0x1d/0x70
        intel_iommu_init+0xe90/0x11f0
        pci_iommu_init+0x13/0x70
        do_one_initcall+0x62/0x3f0
        kernel_init_freeable+0x3da/0x6a0
        kernel_init+0x1b/0x200
        ret_from_fork+0x44/0x70
        ret_from_fork_asm+0x1a/0x30

 other info that might help us debug this:

 Chain exists of:
   iommu_probe_device_lock --> dmar_global_lock -->
     &device->physical_node_lock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&device->physical_node_lock);
                                lock(dmar_global_lock);
                                lock(&device->physical_node_lock);
   lock(iommu_probe_device_lock);

  *** DEADLOCK ***

This driver uses a global lock to protect the list of enumerated DMA
remapping units. It is necessary due to the driver's support for dynamic
addition and removal of remapping units at runtime.

Two distinct code paths require iteration over this remapping unit list:

- Device registration and probing: the driver iterates the list to
  register each remapping unit with the upper layer IOMMU framework
  and subsequently probe the devices managed by that unit.
- Global configuration: Upper layer components may also iterate the list
  to apply configuration changes.

The lock acquisition order between these two code paths was reversed. This
caused lockdep warnings, indicating a risk of deadlock. Fix this warning
by releasing the global lock before invoking upper layer interfaces for
device registration.

Fixes: b150654 ("iommu/vt-d: Fix suspicious RCU usage")
Closes: https://lore.kernel.org/linux-iommu/SJ1PR11MB612953431F94F18C954C4A9CB9D32@SJ1PR11MB6129.namprd11.prod.outlook.com/
Tested-by: Chaitanya Kumar Borah <[email protected]>
Cc: [email protected]
Signed-off-by: Lu Baolu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 6722a0cb818691e63c613fa801d6ccfba3f7d38c)
Signed-off-by: Jack Vogel <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request May 2, 2025
commit 2ccd42b959aaf490333dbd3b9b102eaf295c036a upstream.

If we finds a vq without a name in our input array in
virtio_ccw_find_vqs(), we treat it as "non-existing" and set the vq pointer
to NULL; we will not call virtio_ccw_setup_vq() to allocate/setup a vq.

Consequently, we create only a queue if it actually exists (name != NULL)
and assign an incremental queue index to each such existing queue.

However, in virtio_ccw_register_adapter_ind()->get_airq_indicator() we
will not ignore these "non-existing queues", but instead assign an airq
indicator to them.

Besides never releasing them in virtio_ccw_drop_indicators() (because
there is no virtqueue), the bigger issue seems to be that there will be a
disagreement between the device and the Linux guest about the airq
indicator to be used for notifying a queue, because the indicator bit
for adapter I/O interrupt is derived from the queue index.

The virtio spec states under "Setting Up Two-Stage Queue Indicators":

	... indicator contains the guest address of an area wherein the
	indicators for the devices are contained, starting at bit_nr, one
	bit per virtqueue of the device.

And further in "Notification via Adapter I/O Interrupts":

	For notifying the driver of virtqueue buffers, the device sets the
	bit in the guest-provided indicator area at the corresponding
	offset.

For example, QEMU uses in virtio_ccw_notify() the queue index (passed as
"vector") to select the relevant indicator bit. If a queue does not exist,
it does not have a corresponding indicator bit assigned, because it
effectively doesn't have a queue index.

Using a virtio-balloon-ccw device under QEMU with free-page-hinting
disabled ("free-page-hint=off") but free-page-reporting enabled
("free-page-reporting=on") will result in free page reporting
not working as expected: in the virtio_balloon driver, we'll be stuck
forever in virtballoon_free_page_report()->wait_event(), because the
waitqueue will not be woken up as the notification from the device is
lost: it would use the wrong indicator bit.

Free page reporting stops working and we get splats (when configured to
detect hung wqs) like:

 INFO: task kworker/1:3:463 blocked for more than 61 seconds.
       Not tainted 6.14.0 #4
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/1:3 [...]
 Workqueue: events page_reporting_process
 Call Trace:
  [<000002f404e6dfb2>] __schedule+0x402/0x1640
  [<000002f404e6f22e>] schedule+0x3e/0xe0
  [<000002f3846a88fa>] virtballoon_free_page_report+0xaa/0x110 [virtio_balloon]
  [<000002f40435c8a4>] page_reporting_process+0x2e4/0x740
  [<000002f403fd3ee2>] process_one_work+0x1c2/0x400
  [<000002f403fd4b96>] worker_thread+0x296/0x420
  [<000002f403fe10b4>] kthread+0x124/0x290
  [<000002f403f4e0dc>] __ret_from_fork+0x3c/0x60
  [<000002f404e77272>] ret_from_fork+0xa/0x38

There was recently a discussion [1] whether the "holes" should be
treated differently again, effectively assigning also non-existing
queues a queue index: that should also fix the issue, but requires other
workarounds to not break existing setups.

Let's fix it without affecting existing setups for now by properly ignoring
the non-existing queues, so the indicator bits will match the queue
indexes.

[1] https://lore.kernel.org/all/[email protected]/

Fixes: a229989 ("virtio: don't allocate vqs when names[i] = NULL")
Reported-by: Chandra Merla <[email protected]>
Cc: [email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Tested-by: Thomas Huth <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit f268ee2fbb53bc7d8c30f36d91bc2a213c43662b)
Signed-off-by: Jack Vogel <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request May 16, 2025
commit 366e77cd4923c3aa45341e15dcaf3377af9b042f upstream.

Commit 7da55c2 ("drm/amd/display: Remove incorrect FP context
start") removes the FP context protection of dml2_create(), and it said
"All the DC_FP_START/END should be used before call anything from DML2".

However, dml2_validate()/dml21_validate() are not protected from their
callers, causing such errors:

 do_fpu invoked from kernel context![#1]:
 CPU: 10 UID: 0 PID: 331 Comm: kworker/10:1H Not tainted 6.14.0-rc6+ #4
 Workqueue: events_highpri dm_irq_work_func [amdgpu]
 pc ffff800003191eb0 ra ffff800003191e60 tp 9000000107a94000 sp 9000000107a975b0
 a0 9000000140ce4910 a1 0000000000000000 a2 9000000140ce49b0 a3 9000000140ce49a8
 a4 9000000140ce49a8 a5 0000000100000000 a6 0000000000000001 a7 9000000107a97660
 t0 ffff800003790000 t1 9000000140ce5000 t2 0000000000000001 t3 0000000000000000
 t4 0000000000000004 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
 t8 0000000100000000 u0 ffff8000031a3b9c s9 9000000130bc0000 s0 9000000132400000
 s1 9000000140ec0000 s2 9000000132400000 s3 9000000140ce0000 s4 90000000057f8b88
 s5 9000000140ec0000 s6 9000000140ce4910 s7 0000000000000001 s8 9000000130d45010
 ra: ffff800003191e60 dml21_map_dc_state_into_dml_display_cfg+0x40/0x1140 [amdgpu]
   ERA: ffff800003191eb0 dml21_map_dc_state_into_dml_display_cfg+0x90/0x1140 [amdgpu]
  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
  PRMD: 00000004 (PPLV0 +PIE -PWE)
  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
 ESTAT: 000f0000 [FPD] (IS= ECode=15 EsubCode=0)
  PRID: 0014d010 (Loongson-64bit, Loongson-3C6000/S)
 Process kworker/10:1H (pid: 331, threadinfo=000000007bf9ddb0, task=00000000cc4ab9f3)
 Stack : 0000000100000000 0000043800000780 0000000100000001 0000000100000001
         0000000000000000 0000078000000000 0000000000000438 0000078000000000
         0000000000000438 0000078000000000 0000000000000438 0000000100000000
         0000000100000000 0000000100000000 0000000100000000 0000000100000000
         0000000000000001 9000000140ec0000 9000000132400000 9000000132400000
         ffff800003408000 ffff800003408000 9000000132400000 9000000140ce0000
         9000000140ce0000 ffff800003193850 0000000000000001 9000000140ec0000
         9000000132400000 9000000140ec0860 9000000140ec0738 0000000000000001
         90000001405e8000 9000000130bc0000 9000000140ec02a8 ffff8000031b5db8
         0000000000000000 0000043800000780 0000000000000003 ffff8000031b79cc
         ...
 Call Trace:
 [<ffff800003191eb0>] dml21_map_dc_state_into_dml_display_cfg+0x90/0x1140 [amdgpu]
 [<ffff80000319384c>] dml21_validate+0xcc/0x520 [amdgpu]
 [<ffff8000031b8948>] dc_validate_global_state+0x2e8/0x460 [amdgpu]
 [<ffff800002e94034>] create_validate_stream_for_sink+0x3d4/0x420 [amdgpu]
 [<ffff800002e940e4>] amdgpu_dm_connector_mode_valid+0x64/0x240 [amdgpu]
 [<900000000441d6b8>] drm_connector_mode_valid+0x38/0x80
 [<900000000441d824>] __drm_helper_update_and_validate+0x124/0x3e0
 [<900000000441ddc0>] drm_helper_probe_single_connector_modes+0x2e0/0x620
 [<90000000044050dc>] drm_client_modeset_probe+0x23c/0x1780
 [<9000000004420384>] __drm_fb_helper_initial_config_and_unlock+0x44/0x5a0
 [<9000000004403acc>] drm_client_dev_hotplug+0xcc/0x140
 [<ffff800002e9ab50>] handle_hpd_irq_helper+0x1b0/0x1e0 [amdgpu]
 [<90000000038f5da0>] process_one_work+0x160/0x300
 [<90000000038f6718>] worker_thread+0x318/0x440
 [<9000000003901b8c>] kthread+0x12c/0x220
 [<90000000038b1484>] ret_from_kernel_thread+0x8/0xa4

Unfortunately, protecting dml2_validate()/dml21_validate() out of DML2
causes "sleeping function called from invalid context", so protect them
with DC_FP_START() and DC_FP_END() inside.

Fixes: 7da55c2 ("drm/amd/display: Remove incorrect FP context start")
Cc: [email protected]
Signed-off-by: Huacai Chen <[email protected]>
Tested-by: Dongyan Qian <[email protected]>
Reviewed-by: Aurabindo Pillai <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 74d6fba60f05ca6b298702233b6e6cc7629eeb5a)
Signed-off-by: Jack Vogel <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request May 16, 2025
…cal section

[ Upstream commit 85b2b9c ]

A circular lock dependency splat has been seen involving down_trylock():

  ======================================================
  WARNING: possible circular locking dependency detected
  6.12.0-41.el10.s390x+debug
  ------------------------------------------------------
  dd/32479 is trying to acquire lock:
  0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90

  but task is already holding lock:
  000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0

  the existing dependency chain (in reverse order) is:
  -> #4 (&zone->lock){-.-.}-{2:2}:
  -> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
  -> #2 (&rq->__lock){-.-.}-{2:2}:
  -> #1 (&p->pi_lock){-.-.}-{2:2}:
  -> #0 ((console_sem).lock){-.-.}-{2:2}:

The console_sem -> pi_lock dependency is due to calling try_to_wake_up()
while holding the console_sem raw_spinlock. This dependency can be broken
by using wake_q to do the wakeup instead of calling try_to_wake_up()
under the console_sem lock. This will also make the semaphore's
raw_spinlock become a terminal lock without taking any further locks
underneath it.

The hrtimer_bases.lock is a raw_spinlock while zone->lock is a
spinlock. The hrtimer_bases.lock -> zone->lock dependency happens via
the debug_objects_fill_pool() helper function in the debugobjects code.

  -> #4 (&zone->lock){-.-.}-{2:2}:
         __lock_acquire+0xe86/0x1cc0
         lock_acquire.part.0+0x258/0x630
         lock_acquire+0xb8/0xe0
         _raw_spin_lock_irqsave+0xb4/0x120
         rmqueue_bulk+0xac/0x8f0
         __rmqueue_pcplist+0x580/0x830
         rmqueue_pcplist+0xfc/0x470
         rmqueue.isra.0+0xdec/0x11b0
         get_page_from_freelist+0x2ee/0xeb0
         __alloc_pages_noprof+0x2c2/0x520
         alloc_pages_mpol_noprof+0x1fc/0x4d0
         alloc_pages_noprof+0x8c/0xe0
         allocate_slab+0x320/0x460
         ___slab_alloc+0xa58/0x12b0
         __slab_alloc.isra.0+0x42/0x60
         kmem_cache_alloc_noprof+0x304/0x350
         fill_pool+0xf6/0x450
         debug_object_activate+0xfe/0x360
         enqueue_hrtimer+0x34/0x190
         __run_hrtimer+0x3c8/0x4c0
         __hrtimer_run_queues+0x1b2/0x260
         hrtimer_interrupt+0x316/0x760
         do_IRQ+0x9a/0xe0
         do_irq_async+0xf6/0x160

Normally a raw_spinlock to spinlock dependency is not legitimate
and will be warned if CONFIG_PROVE_RAW_LOCK_NESTING is enabled,
but debug_objects_fill_pool() is an exception as it explicitly
allows this dependency for non-PREEMPT_RT kernel without causing
PROVE_RAW_LOCK_NESTING lockdep splat. As a result, this dependency is
legitimate and not a bug.

Anyway, semaphore is the only locking primitive left that is still
using try_to_wake_up() to do wakeup inside critical section, all the
other locking primitives had been migrated to use wake_q to do wakeup
outside of the critical section. It is also possible that there are
other circular locking dependencies involving printk/console_sem or
other existing/new semaphores lurking somewhere which may show up in
the future. Let just do the migration now to wake_q to avoid headache
like this.

Reported-by: [email protected]
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit 46c66d975a58a9fc04cb340001b815d930643aa6)
Signed-off-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request May 16, 2025
…cal section

[ Upstream commit 85b2b9c ]

A circular lock dependency splat has been seen involving down_trylock():

  ======================================================
  WARNING: possible circular locking dependency detected
  6.12.0-41.el10.s390x+debug
  ------------------------------------------------------
  dd/32479 is trying to acquire lock:
  0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90

  but task is already holding lock:
  000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0

  the existing dependency chain (in reverse order) is:
  -> #4 (&zone->lock){-.-.}-{2:2}:
  -> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
  -> #2 (&rq->__lock){-.-.}-{2:2}:
  -> #1 (&p->pi_lock){-.-.}-{2:2}:
  -> #0 ((console_sem).lock){-.-.}-{2:2}:

The console_sem -> pi_lock dependency is due to calling try_to_wake_up()
while holding the console_sem raw_spinlock. This dependency can be broken
by using wake_q to do the wakeup instead of calling try_to_wake_up()
under the console_sem lock. This will also make the semaphore's
raw_spinlock become a terminal lock without taking any further locks
underneath it.

The hrtimer_bases.lock is a raw_spinlock while zone->lock is a
spinlock. The hrtimer_bases.lock -> zone->lock dependency happens via
the debug_objects_fill_pool() helper function in the debugobjects code.

  -> #4 (&zone->lock){-.-.}-{2:2}:
         __lock_acquire+0xe86/0x1cc0
         lock_acquire.part.0+0x258/0x630
         lock_acquire+0xb8/0xe0
         _raw_spin_lock_irqsave+0xb4/0x120
         rmqueue_bulk+0xac/0x8f0
         __rmqueue_pcplist+0x580/0x830
         rmqueue_pcplist+0xfc/0x470
         rmqueue.isra.0+0xdec/0x11b0
         get_page_from_freelist+0x2ee/0xeb0
         __alloc_pages_noprof+0x2c2/0x520
         alloc_pages_mpol_noprof+0x1fc/0x4d0
         alloc_pages_noprof+0x8c/0xe0
         allocate_slab+0x320/0x460
         ___slab_alloc+0xa58/0x12b0
         __slab_alloc.isra.0+0x42/0x60
         kmem_cache_alloc_noprof+0x304/0x350
         fill_pool+0xf6/0x450
         debug_object_activate+0xfe/0x360
         enqueue_hrtimer+0x34/0x190
         __run_hrtimer+0x3c8/0x4c0
         __hrtimer_run_queues+0x1b2/0x260
         hrtimer_interrupt+0x316/0x760
         do_IRQ+0x9a/0xe0
         do_irq_async+0xf6/0x160

Normally a raw_spinlock to spinlock dependency is not legitimate
and will be warned if CONFIG_PROVE_RAW_LOCK_NESTING is enabled,
but debug_objects_fill_pool() is an exception as it explicitly
allows this dependency for non-PREEMPT_RT kernel without causing
PROVE_RAW_LOCK_NESTING lockdep splat. As a result, this dependency is
legitimate and not a bug.

Anyway, semaphore is the only locking primitive left that is still
using try_to_wake_up() to do wakeup inside critical section, all the
other locking primitives had been migrated to use wake_q to do wakeup
outside of the critical section. It is also possible that there are
other circular locking dependencies involving printk/console_sem or
other existing/new semaphores lurking somewhere which may show up in
the future. Let just do the migration now to wake_q to avoid headache
like this.

Reported-by: [email protected]
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit d6ae75c3ba1fb0dc388b0fed11b962b8aeed21e2)
Signed-off-by: Alok Tiwari <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 13, 2025
commit be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7 upstream.

When migrating a THP, concurrent access to the PMD migration entry during
a deferred split scan can lead to an invalid address access, as
illustrated below.  To prevent this invalid access, it is necessary to
check the PMD migration entry and return early.  In this context, there is
no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the
equality of the target folio.  Since the PMD migration entry is locked, it
cannot be served as the target.

Mailing list discussion and explanation from Hugh Dickins: "An anon_vma
lookup points to a location which may contain the folio of interest, but
might instead contain another folio: and weeding out those other folios is
precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of
replacing the wrong folio" comment a few lines above it) is for."

BUG: unable to handle page fault for address: ffffea60001db008
CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60
Call Trace:
<TASK>
try_to_migrate_one+0x28c/0x3730
rmap_walk_anon+0x4f6/0x770
unmap_folio+0x196/0x1f0
split_huge_page_to_list_to_order+0x9f6/0x1560
deferred_split_scan+0xac5/0x12a0
shrinker_debugfs_scan_write+0x376/0x470
full_proxy_write+0x15c/0x220
vfs_write+0x2fc/0xcb0
ksys_write+0x146/0x250
do_syscall_64+0x6a/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e

The bug is found by syzkaller on an internal kernel, then confirmed on
upstream.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Fixes: 84c3fc4 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Gavin Guo <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Acked-by: Zi Yan <[email protected]>
Reviewed-by: Gavin Shan <[email protected]>
Cc: Florent Revest <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 6166c3cf405441f7147b322980144feb3cefc617)
Signed-off-by: Jack Vogel <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 13, 2025
commit 78ab4be549533432d97ea8989d2f00b508fa68d8 upstream.

A warning on driver removal started occurring after commit 9dd05df8403b
("net: warn if NAPI instance wasn't shut down"). Disable tx napi before
deleting it in mt76_dma_cleanup().

 WARNING: CPU: 4 PID: 18828 at net/core/dev.c:7288 __netif_napi_del_locked+0xf0/0x100
 CPU: 4 UID: 0 PID: 18828 Comm: modprobe Not tainted 6.15.0-rc4 #4 PREEMPT(lazy)
 Hardware name: ASUS System Product Name/PRIME X670E-PRO WIFI, BIOS 3035 09/05/2024
 RIP: 0010:__netif_napi_del_locked+0xf0/0x100
 Call Trace:
 <TASK>
 mt76_dma_cleanup+0x54/0x2f0 [mt76]
 mt7921_pci_remove+0xd5/0x190 [mt7921e]
 pci_device_remove+0x47/0xc0
 device_release_driver_internal+0x19e/0x200
 driver_detach+0x48/0x90
 bus_remove_driver+0x6d/0xf0
 pci_unregister_driver+0x2e/0xb0
 __do_sys_delete_module.isra.0+0x197/0x2e0
 do_syscall_64+0x7b/0x160
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Tested with mt7921e but the same pattern can be actually applied to other
mt76 drivers calling mt76_dma_cleanup() during removal. Tx napi is enabled
in their *_dma_init() functions and only toggled off and on again inside
their suspend/resume/reset paths. So it should be okay to disable tx
napi in such a generic way.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 2ac515a ("mt76: mt76x02: use napi polling for tx cleanup")
Cc: [email protected]
Signed-off-by: Fedor Pchelkin <[email protected]>
Tested-by: Ming Yen Hsieh <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 2b81e76db3667d1f7f2ad44e9835cdaf8dea95a8)
Signed-off-by: Jack Vogel <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 13, 2025
[ Upstream commit 88f7f56d16f568f19e1a695af34a7f4a6ce537a6 ]

When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush()
generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC,
which causes the flush_bio to be throttled by wbt_wait().

An example from v5.4, similar problem also exists in upstream:

    crash> bt 2091206
    PID: 2091206  TASK: ffff2050df92a300  CPU: 109  COMMAND: "kworker/u260:0"
     #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8
     #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4
     #2 [ffff800084a2f880] schedule at ffff800040bfa4b4
     #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4
     #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc
     #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0
     #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254
     #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38
     #8 [ffff800084a2fa60] generic_make_request at ffff800040570138
     #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4
    #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs]
    #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs]
    #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs]
    #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs]
    #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs]
    #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs]
    #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08
    #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc
    #18 [ffff800084a2fe70] kthread at ffff800040118de4

After commit 2def284 ("xfs: don't allow log IO to be throttled"),
the metadata submitted by xlog_write_iclog() should not be throttled.
But due to the existence of the dm layer, throttling flush_bio indirectly
causes the metadata bio to be throttled.

Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes
wbt_should_throttle() return false to avoid wbt_wait().

Signed-off-by: Jinliang Zheng <[email protected]>
Reviewed-by: Tianxiang Peng <[email protected]>
Reviewed-by: Hao Peng <[email protected]>
Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit b55a97d1bd4083729a60d19beffe85d4c96680de)
Signed-off-by: Jack Vogel <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 13, 2025
The Multi-Gen LRU (MGLRU) is enabled by default via
CONFIG_LRU_GEN_ENABLED=y, but it can be dynamically disabled by writing
n to /sys/kernel/mm/lru_gen/enabled. While attempting to disable the
MGLRU and switching over to the classic LRU (anon/file), the following
crash is seen:

>> trace
 #0  crash_setup_regs (./arch/x86/include/asm/kexec.h:111:15)
 #1  __crash_kexec (kernel/crash_core.c:122:4)
 #2  panic (kernel/panic.c:399:3)
 #3  oops_end (arch/x86/kernel/dumpstack.c:382:3)
 #4  do_trap_no_signal (arch/x86/kernel/traps.c:156:3)
 #5  do_trap (arch/x86/kernel/traps.c:197:7)
 #6  do_error_trap (arch/x86/kernel/traps.c:217:3)
 #7  handle_invalid_op (arch/x86/kernel/traps.c:254:2)
 #8  exc_invalid_op (arch/x86/kernel/traps.c:314:2)
 #9  asm_exc_invalid_op+0x1a/0x1f (./arch/x86/include/asm/idtentry.h:621)
 #10 __list_del_entry_valid_or_report (lib/list_debug.c:62:6)
 #11 __list_del_entry_valid (./include/linux/list.h:124:9)
 #12 __list_del_entry (./include/linux/list.h:215:7)
 #13 list_del (./include/linux/list.h:229:2)
 #14 lruvec_del_folio (./include/linux/mm_inline.h:361:3)
 #15 lru_activate (mm/swap.c:342:2)
 #16 folio_batch_move_lru (mm/swap.c:199:3)
 #17 lru_add_drain (mm/swap.c:734:2)
 #18 wp_can_reuse_anon_folio (mm/memory.c:3788:3)
 #19 do_wp_page (mm/memory.c:3900:39)
 #20 __handle_mm_fault (mm/memory.c:6090:9)
 #21 handle_mm_fault (mm/memory.c:6258:9)
 #22 do_user_addr_fault (arch/x86/mm/fault.c:1338:10)
 #23 handle_page_fault (arch/x86/mm/fault.c:1481:3)
 #24 exc_page_fault (arch/x86/mm/fault.c:1539:2)
 #25 asm_exc_page_fault+0x26/0x2b (./arch/x86/include/asm/idtentry.h:623)
 #26 0x7f82f0896ed2

The issue occurs because the MGLRU-specific version of
lruvec_reparent_relocate() is invoked even when the system is using the
classic LRU. Fix this, by ensuring the call is properly gated by
lru_gen_enabled(), which checks whether MGLRU is enabled either
statically via Kconfig or dynamically via the /sys interface.

Orabug: 37920452

Fixes: cc65a5d ("mm: memcontrol: use obj_cgroup APIs to charge the LRU pages")
Cc: Tom Hromatka <[email protected]>
Reported-by: Sidhartha Kumar <[email protected]>
Reviewed-by: Sidhartha Kumar <[email protected]>
Reviewed-by: Harry Yoo <[email protected]>
Tested-by: Harry Yoo <[email protected]>
Reviewed-by: Imran Khan <[email protected]>
Signed-off-by: Kamalesh Babulal <[email protected]>
Signed-off-by: Harry Yoo <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 13, 2025
[ Upstream commit b61e69bb1c049cf507e3c654fa3dc1568231bd07 ]

syzbot report a deadlock in diFree. [1]

When calling "ioctl$LOOP_SET_STATUS64", the offset value passed in is 4,
which does not match the mounted loop device, causing the mapping of the
mounted loop device to be invalidated.

When creating the directory and creating the inode of iag in diReadSpecial(),
read the page of fixed disk inode (AIT) in raw mode in read_metapage(), the
metapage data it returns is corrupted, which causes the nlink value of 0 to be
assigned to the iag inode when executing copy_from_dinode(), which ultimately
causes a deadlock when entering diFree().

To avoid this, first check the nlink value of dinode before setting iag inode.

[1]
WARNING: possible recursive locking detected
6.12.0-rc7-syzkaller-00212-g4a5df3796467 #0 Not tainted
--------------------------------------------
syz-executor301/5309 is trying to acquire lock:
ffff888044548920 (&(imap->im_aglock[index])){+.+.}-{3:3}, at: diFree+0x37c/0x2fb0 fs/jfs/jfs_imap.c:889

but task is already holding lock:
ffff888044548920 (&(imap->im_aglock[index])){+.+.}-{3:3}, at: diAlloc+0x1b6/0x1630

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(imap->im_aglock[index]));
  lock(&(imap->im_aglock[index]));

 *** DEADLOCK ***

 May be due to missing lock nesting notation

5 locks held by syz-executor301/5309:
 #0: ffff8880422a4420 (sb_writers#9){.+.+}-{0:0}, at: mnt_want_write+0x3f/0x90 fs/namespace.c:515
 #1: ffff88804755b390 (&type->i_mutex_dir_key#6/1){+.+.}-{3:3}, at: inode_lock_nested include/linux/fs.h:850 [inline]
 #1: ffff88804755b390 (&type->i_mutex_dir_key#6/1){+.+.}-{3:3}, at: filename_create+0x260/0x540 fs/namei.c:4026
 #2: ffff888044548920 (&(imap->im_aglock[index])){+.+.}-{3:3}, at: diAlloc+0x1b6/0x1630
 #3: ffff888044548890 (&imap->im_freelock){+.+.}-{3:3}, at: diNewIAG fs/jfs/jfs_imap.c:2460 [inline]
 #3: ffff888044548890 (&imap->im_freelock){+.+.}-{3:3}, at: diAllocExt fs/jfs/jfs_imap.c:1905 [inline]
 #3: ffff888044548890 (&imap->im_freelock){+.+.}-{3:3}, at: diAllocAG+0x4b7/0x1e50 fs/jfs/jfs_imap.c:1669
 #4: ffff88804755a618 (&jfs_ip->rdwrlock/1){++++}-{3:3}, at: diNewIAG fs/jfs/jfs_imap.c:2477 [inline]
 #4: ffff88804755a618 (&jfs_ip->rdwrlock/1){++++}-{3:3}, at: diAllocExt fs/jfs/jfs_imap.c:1905 [inline]
 #4: ffff88804755a618 (&jfs_ip->rdwrlock/1){++++}-{3:3}, at: diAllocAG+0x869/0x1e50 fs/jfs/jfs_imap.c:1669

stack backtrace:
CPU: 0 UID: 0 PID: 5309 Comm: syz-executor301 Not tainted 6.12.0-rc7-syzkaller-00212-g4a5df3796467 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 print_deadlock_bug+0x483/0x620 kernel/locking/lockdep.c:3037
 check_deadlock kernel/locking/lockdep.c:3089 [inline]
 validate_chain+0x15e2/0x5920 kernel/locking/lockdep.c:3891
 __lock_acquire+0x1384/0x2050 kernel/locking/lockdep.c:5202
 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
 __mutex_lock_common kernel/locking/mutex.c:608 [inline]
 __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
 diFree+0x37c/0x2fb0 fs/jfs/jfs_imap.c:889
 jfs_evict_inode+0x32d/0x440 fs/jfs/inode.c:156
 evict+0x4e8/0x9b0 fs/inode.c:725
 diFreeSpecial fs/jfs/jfs_imap.c:552 [inline]
 duplicateIXtree+0x3c6/0x550 fs/jfs/jfs_imap.c:3022
 diNewIAG fs/jfs/jfs_imap.c:2597 [inline]
 diAllocExt fs/jfs/jfs_imap.c:1905 [inline]
 diAllocAG+0x17dc/0x1e50 fs/jfs/jfs_imap.c:1669
 diAlloc+0x1d2/0x1630 fs/jfs/jfs_imap.c:1590
 ialloc+0x8f/0x900 fs/jfs/jfs_inode.c:56
 jfs_mkdir+0x1c5/0xba0 fs/jfs/namei.c:225
 vfs_mkdir+0x2f9/0x4f0 fs/namei.c:4257
 do_mkdirat+0x264/0x3a0 fs/namei.c:4280
 __do_sys_mkdirat fs/namei.c:4295 [inline]
 __se_sys_mkdirat fs/namei.c:4293 [inline]
 __x64_sys_mkdirat+0x87/0xa0 fs/namei.c:4293
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=355da3b3a74881008e8f
Signed-off-by: Edward Adam Davis <[email protected]>
Signed-off-by: Dave Kleikamp <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit 86bfeaa18f9e4615b97f2d613e0fcc4ced196527)
Signed-off-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 13, 2025
commit 78cfd17 upstream.

Undefined behavior is triggered when bnxt_qplib_alloc_init_hwq is called
with hwq_attr->aux_depth != 0 and hwq_attr->aux_stride == 0.
In that case, "roundup_pow_of_two(hwq_attr->aux_stride)" gets called.
roundup_pow_of_two is documented as undefined for 0.

Fix it in the one caller that had this combination.

The undefined behavior was detected by UBSAN:
  UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
  shift exponent 64 is too large for 64-bit type 'long unsigned int'
  CPU: 24 PID: 1075 Comm: (udev-worker) Not tainted 6.9.0-rc6+ #4
  Hardware name: Abacus electric, s.r.o. - [email protected] Super Server/H12SSW-iN, BIOS 2.7 10/25/2023
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5d/0x80
   ubsan_epilogue+0x5/0x30
   __ubsan_handle_shift_out_of_bounds.cold+0x61/0xec
   __roundup_pow_of_two+0x25/0x35 [bnxt_re]
   bnxt_qplib_alloc_init_hwq+0xa1/0x470 [bnxt_re]
   bnxt_qplib_create_qp+0x19e/0x840 [bnxt_re]
   bnxt_re_create_qp+0x9b1/0xcd0 [bnxt_re]
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __kmalloc+0x1b6/0x4f0
   ? create_qp.part.0+0x128/0x1c0 [ib_core]
   ? __pfx_bnxt_re_create_qp+0x10/0x10 [bnxt_re]
   create_qp.part.0+0x128/0x1c0 [ib_core]
   ib_create_qp_kernel+0x50/0xd0 [ib_core]
   create_mad_qp+0x8e/0xe0 [ib_core]
   ? __pfx_qp_event_handler+0x10/0x10 [ib_core]
   ib_mad_init_device+0x2be/0x680 [ib_core]
   add_client_context+0x10d/0x1a0 [ib_core]
   enable_device_and_get+0xe0/0x1d0 [ib_core]
   ib_register_device+0x53c/0x630 [ib_core]
   ? srso_alias_return_thunk+0x5/0xfbef5
   bnxt_re_probe+0xbd8/0xe50 [bnxt_re]
   ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re]
   auxiliary_bus_probe+0x49/0x80
   ? driver_sysfs_add+0x57/0xc0
   really_probe+0xde/0x340
   ? pm_runtime_barrier+0x54/0x90
   ? __pfx___driver_attach+0x10/0x10
   __driver_probe_device+0x78/0x110
   driver_probe_device+0x1f/0xa0
   __driver_attach+0xba/0x1c0
   bus_for_each_dev+0x8f/0xe0
   bus_add_driver+0x146/0x220
   driver_register+0x72/0xd0
   __auxiliary_driver_register+0x6e/0xd0
   ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
   bnxt_re_mod_init+0x3e/0xff0 [bnxt_re]
   ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
   do_one_initcall+0x5b/0x310
   do_init_module+0x90/0x250
   init_module_from_file+0x86/0xc0
   idempotent_init_module+0x121/0x2b0
   __x64_sys_finit_module+0x5e/0xb0
   do_syscall_64+0x82/0x160
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? syscall_exit_to_user_mode_prepare+0x149/0x170
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? syscall_exit_to_user_mode+0x75/0x230
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? do_syscall_64+0x8e/0x160
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __count_memcg_events+0x69/0x100
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? count_memcg_events.constprop.0+0x1a/0x30
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? handle_mm_fault+0x1f0/0x300
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? do_user_addr_fault+0x34e/0x640
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f4e5132821d
  Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 db 0c 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffca9c906a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
  RAX: ffffffffffffffda RBX: 0000563ec8a8f130 RCX: 00007f4e5132821d
  RDX: 0000000000000000 RSI: 00007f4e518fa07d RDI: 000000000000003b
  RBP: 00007ffca9c90760 R08: 00007f4e513f6b20 R09: 00007ffca9c906f0
  R10: 0000563ec8a8faa0 R11: 0000000000000246 R12: 00007f4e518fa07d
  R13: 0000000000020000 R14: 0000563ec8409e90 R15: 0000563ec8a8fa60
   </TASK>
  ---[ end trace ]---

Fixes: 0c4dcd6 ("RDMA/bnxt_re: Refactor hardware queue memory allocation")
Signed-off-by: Michal Schmidt <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Selvin Xavier <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Xiangyu Chen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
[Harshit: backport to 5.15.y, this is a clean cherrypick from 6.1.y
commit ]
Signed-off-by: Harshit Mogalapalli <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 66a9937187ac9b5c5ffff07b8b284483e56804d1)
Signed-off-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 13, 2025
commit fd7b4f9 upstream.

When CONFIG_KASAN_SW_TAGS and CONFIG_KASAN_STACK are enabled, the
object_is_on_stack() function may produce incorrect results due to the
presence of tags in the obj pointer, while the stack pointer does not have
tags.  This discrepancy can lead to incorrect stack object detection and
subsequently trigger warnings if CONFIG_DEBUG_OBJECTS is also enabled.

Example of the warning:

ODEBUG: object 3eff800082ea7bb0 is NOT on stack ffff800082ea0000, but annotated.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:557 __debug_object_init+0x330/0x364
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc5 #4
Hardware name: linux,dummy-virt (DT)
pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __debug_object_init+0x330/0x364
lr : __debug_object_init+0x330/0x364
sp : ffff800082ea7b40
x29: ffff800082ea7b40 x28: 98ff0000c0164518 x27: 98ff0000c0164534
x26: ffff800082d93ec8 x25: 0000000000000001 x24: 1cff0000c00172a0
x23: 0000000000000000 x22: ffff800082d93ed0 x21: ffff800081a24418
x20: 3eff800082ea7bb0 x19: efff800000000000 x18: 0000000000000000
x17: 00000000000000ff x16: 0000000000000047 x15: 206b63617473206e
x14: 0000000000000018 x13: ffff800082ea7780 x12: 0ffff800082ea78e
x11: 0ffff800082ea790 x10: 0ffff800082ea79d x9 : 34d77febe173e800
x8 : 34d77febe173e800 x7 : 0000000000000001 x6 : 0000000000000001
x5 : feff800082ea74b8 x4 : ffff800082870a90 x3 : ffff80008018d3c4
x2 : 0000000000000001 x1 : ffff800082858810 x0 : 0000000000000050
Call trace:
 __debug_object_init+0x330/0x364
 debug_object_init_on_stack+0x30/0x3c
 schedule_hrtimeout_range_clock+0xac/0x26c
 schedule_hrtimeout+0x1c/0x30
 wait_task_inactive+0x1d4/0x25c
 kthread_bind_mask+0x28/0x98
 init_rescuer+0x1e8/0x280
 workqueue_init+0x1a0/0x3cc
 kernel_init_freeable+0x118/0x200
 kernel_init+0x28/0x1f0
 ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
ODEBUG: object 3eff800082ea7bb0 is NOT on stack ffff800082ea0000, but annotated.
------------[ cut here ]------------

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Qun-Wei Lin <[email protected]>
Cc: Andrew Yang <[email protected]>
Cc: AngeloGioacchino Del Regno <[email protected]>
Cc: Casper Li <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chinwen Chang <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Matthias Brugger <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
[Minor context change fixed]
Signed-off-by: Zhi Yang <[email protected]>
Signed-off-by: He Zhe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 82e813b12b10ff705f3f5d600d8492fc5248618b)
Signed-off-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 13, 2025
commit 30ba2d0 upstream.

Commit c14f7cc ("PCI: Assign PCI domain IDs by ida_alloc()")
introduced a use-after-free bug in the bus removal cleanup. The issue was
found with kfence:

  [   19.293351] BUG: KFENCE: use-after-free read in pci_bus_release_domain_nr+0x10/0x70

  [   19.302817] Use-after-free read at 0x000000007f3b80eb (in kfence-#115):
  [   19.309677]  pci_bus_release_domain_nr+0x10/0x70
  [   19.309691]  dw_pcie_host_deinit+0x28/0x78
  [   19.309702]  tegra_pcie_deinit_controller+0x1c/0x38 [pcie_tegra194]
  [   19.309734]  tegra_pcie_dw_probe+0x648/0xb28 [pcie_tegra194]
  [   19.309752]  platform_probe+0x90/0xd8
  ...

  [   19.311457] kfence-#115: 0x00000000063a155a-0x00000000ba698da8, size=1072, cache=kmalloc-2k

  [   19.311469] allocated by task 96 on cpu 10 at 19.279323s:
  [   19.311562]  __kmem_cache_alloc_node+0x260/0x278
  [   19.311571]  kmalloc_trace+0x24/0x30
  [   19.311580]  pci_alloc_bus+0x24/0xa0
  [   19.311590]  pci_register_host_bridge+0x48/0x4b8
  [   19.311601]  pci_scan_root_bus_bridge+0xc0/0xe8
  [   19.311613]  pci_host_probe+0x18/0xc0
  [   19.311623]  dw_pcie_host_init+0x2c0/0x568
  [   19.311630]  tegra_pcie_dw_probe+0x610/0xb28 [pcie_tegra194]
  [   19.311647]  platform_probe+0x90/0xd8
  ...

  [   19.311782] freed by task 96 on cpu 10 at 19.285833s:
  [   19.311799]  release_pcibus_dev+0x30/0x40
  [   19.311808]  device_release+0x30/0x90
  [   19.311814]  kobject_put+0xa8/0x120
  [   19.311832]  device_unregister+0x20/0x30
  [   19.311839]  pci_remove_bus+0x78/0x88
  [   19.311850]  pci_remove_root_bus+0x5c/0x98
  [   19.311860]  dw_pcie_host_deinit+0x28/0x78
  [   19.311866]  tegra_pcie_deinit_controller+0x1c/0x38 [pcie_tegra194]
  [   19.311883]  tegra_pcie_dw_probe+0x648/0xb28 [pcie_tegra194]
  [   19.311900]  platform_probe+0x90/0xd8
  ...

  [   19.313579] CPU: 10 PID: 96 Comm: kworker/u24:2 Not tainted 6.2.0 #4
  [   19.320171] Hardware name:  /, BIOS 1.0-d7fb19b 08/10/2022
  [   19.325852] Workqueue: events_unbound deferred_probe_work_func

The stack trace is a bit misleading as dw_pcie_host_deinit() doesn't
directly call pci_bus_release_domain_nr(). The issue turns out to be in
pci_remove_root_bus() which first calls pci_remove_bus() which frees the
struct pci_bus when its struct device is released. Then
pci_bus_release_domain_nr() is called and accesses the freed struct
pci_bus. Reordering these fixes the issue.

Fixes: c14f7cc ("PCI: Assign PCI domain IDs by ida_alloc()")
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Jon Hunter <[email protected]>
Tested-by: Jon Hunter <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
Cc: [email protected]	# v6.2+
Cc: Pali Rohár <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit ad367516b1c09317111255ecfbf5e42c33e31918)
Signed-off-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 13, 2025
commit 78ab4be549533432d97ea8989d2f00b508fa68d8 upstream.

A warning on driver removal started occurring after commit 9dd05df8403b
("net: warn if NAPI instance wasn't shut down"). Disable tx napi before
deleting it in mt76_dma_cleanup().

 WARNING: CPU: 4 PID: 18828 at net/core/dev.c:7288 __netif_napi_del_locked+0xf0/0x100
 CPU: 4 UID: 0 PID: 18828 Comm: modprobe Not tainted 6.15.0-rc4 #4 PREEMPT(lazy)
 Hardware name: ASUS System Product Name/PRIME X670E-PRO WIFI, BIOS 3035 09/05/2024
 RIP: 0010:__netif_napi_del_locked+0xf0/0x100
 Call Trace:
 <TASK>
 mt76_dma_cleanup+0x54/0x2f0 [mt76]
 mt7921_pci_remove+0xd5/0x190 [mt7921e]
 pci_device_remove+0x47/0xc0
 device_release_driver_internal+0x19e/0x200
 driver_detach+0x48/0x90
 bus_remove_driver+0x6d/0xf0
 pci_unregister_driver+0x2e/0xb0
 __do_sys_delete_module.isra.0+0x197/0x2e0
 do_syscall_64+0x7b/0x160
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Tested with mt7921e but the same pattern can be actually applied to other
mt76 drivers calling mt76_dma_cleanup() during removal. Tx napi is enabled
in their *_dma_init() functions and only toggled off and on again inside
their suspend/resume/reset paths. So it should be okay to disable tx
napi in such a generic way.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 2ac515a ("mt76: mt76x02: use napi polling for tx cleanup")
Cc: [email protected]
Signed-off-by: Fedor Pchelkin <[email protected]>
Tested-by: Ming Yen Hsieh <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit ca5b213bf4b4224335a8131a26805d16503fca5f)
Signed-off-by: Vijayendra Suman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant