Closed
Description
Traces with the latest 5.4.17-2136.334.6.1.el8uek.x86_64
UEK6 kernel.
kernel NULL pointer dereference
[ 1924.256982] BUG: kernel NULL pointer dereference, address: 0000000000000058
[ 1924.260317] #PF: supervisor read access in kernel mode
[ 1924.260317] #PF: error_code(0x0000) - not-present page
[ 1924.260317] PGD 897b1e067 P4D 0
[ 1924.260317] Oops: 0000 [#1] SMP NOPTI
[ 1924.260317] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.4.17-2136.334.6.1.el8uek.x86_64 #3
[ 1924.260317] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/13/2024
[ 1924.260317] RIP: 0010:dm_softirq_done+0x4f/0x240 [dm_mod]
[ 1924.260317] Code: 51 01 00 00 44 0f b6 bf 60 01 00 00 4d 8b ac 24 10 01 00 00 45 89 fe f6 47 1d 04 75 58 49 8b 7d 08 48 85 ff 74 4f 48 8b 47 08 <48> 8b 40 58 48 85 c0 74 42 49 8d 4d 50 44 89 fa 4c 89 e6 e8 69 ff
[ 1924.260317] RSP: 0018:ff66070e00210ee0 EFLAGS: 00010282
[ 1924.260317] RAX: 0000000000000000 RBX: ff3464adca1f0540 RCX: dead000000000122
[ 1924.260317] RDX: ff66070e00210f20 RSI: ff3464adca1f0598 RDI: ff66070e0009b040
[ 1924.260317] RBP: ff66070e00210f10 R08: ff3464ae1fbedfc0 R09: 0000000000000100
[ 1924.260317] R10: 0000000000000001 R11: 0000000000000230 R12: ff3464adc3ea0a80
[ 1924.260317] R13: ff3464adca1f0658 R14: 0000000000000000 R15: 0000000000000000
[ 1924.260317] FS: 0000000000000000(0000) GS:ff3464ae1fbc0000(0000) knlGS:0000000000000000
[ 1924.260317] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1924.260317] CR2: 0000000000000058 CR3: 000000087fd7e003 CR4: 0000000000361ee0
[ 1924.260317] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1924.260317] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1924.260317] Call Trace:
[ 1924.260317] <IRQ>
[ 1924.260317] ? show_regs.cold.12+0x1a/0x1c
[ 1924.260317] ? __die+0x86/0xd2
[ 1924.260317] ? no_context.isra.25+0x13f/0x552
[ 1924.260317] ? ftrace_ops_assist_func+0x78/0x112
[ 1924.260317] ? __bad_area_nosemaphore+0x43/0x1d8
[ 1924.260317] ? bad_area_nosemaphore+0x16/0x1c
[ 1924.260317] ? __do_page_fault+0x2c8/0x4b8
[ 1924.260317] ? do_page_fault+0x36/0x122
[ 1924.358485] ? page_fault+0x13d/0x142
[ 1924.358485] ? dm_softirq_done+0x4f/0x240 [dm_mod]
[ 1924.363490] blk_done_softirq+0xa5/0xd1
[ 1924.363490] __do_softirq+0xd4/0x2cc
[ 1924.368482] irq_exit+0x103/0x108
[ 1924.370487] do_IRQ+0x59/0xe4
[ 1924.373481] common_interrupt+0xf/0x1d2
[ 1924.373481] </IRQ>
[ 1924.373481] RIP: 0010:native_safe_halt+0x12/0x18
[ 1924.373481] Code: 48 02 20 48 8b 00 a8 08 75 bc e9 60 ff ff ff cc cc cc cc cc cc cc cc cc 55 48 89 e5 0f 1f 44 00 00 0f 00 2d b2 c3 57 00 fb f4 <5d> c3 cc cc cc cc 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00
[ 1924.388482] RSP: 0018:ff66070e000d3e70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
[ 1924.390490] RAX: ffffffffa5a8fc90 RBX: 0000000000000007 RCX: 0000000000000001
[ 1924.390490] RDX: 000000000048dd7a RSI: ff66070e000d3e60 RDI: 0000000000000000
[ 1924.400415] RBP: ff66070e000d3e70 R08: fffffffffff396a8 R09: 00ea9b5c0bd41f3f
[ 1924.402483] R10: 00000000000000ec R11: 000000000000075a R12: 0000000000000007
[ 1924.405481] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1924.405481] ? __sched_text_end+0x1/0x0
[ 1924.405481] default_idle+0x22/0x151
[ 1924.416481] arch_cpu_idle+0x15/0x1b
[ 1924.417974] default_idle_call+0x30/0x36
[ 1924.420489] do_idle+0x1e3/0x25a
[ 1924.422485] cpu_startup_entry+0x1d/0x1f
[ 1924.422485] start_secondary+0x177/0x1cb
[ 1924.422485] secondary_startup_64+0xb6/0xb6
[ 1924.430482] Modules linked in: dm_queue_length iscsi_tcp libiscsi_tcp libiscsi target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod dm_multipath vxlan ip6_udp_tunnel udp_tunnel act_mirred sch_ingress ifb cls_u32 act_gact cls_bpf sch_hfsc rfkill scsi_transport_iscsi nft_counter nft_chain_nat xt_nat nf_nat nft_compat sunrpc intel_rapl_msr intel_rapl_common nfit libnvdimm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel vfat fat mlx5_ib ib_uverbs aesni_intel crypto_simd cryptd glue_helper ib_core pcspkr joydev hv_utils sch_fq_codel binfmt_misc xfs mlx5_core mlxfw tls psample sr_mod cdrom sd_mod pci_hyperv pci_hyperv_intf sg serio_raw hv_storvsc hv_netvsc hyperv_keyboard scsi_transport_fc hid_hyperv hv_vmbus dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c
[ 1924.466487] CR2: 0000000000000058
[ 1924.469483] ---[ end trace 7b55ee81713bcee6 ]---
[ 1924.469483] RIP: 0010:dm_softirq_done+0x4f/0x240 [dm_mod]
[ 1924.469483] Code: 51 01 00 00 44 0f b6 bf 60 01 00 00 4d 8b ac 24 10 01 00 00 45 89 fe f6 47 1d 04 75 58 49 8b 7d 08 48 85 ff 74 4f 48 8b 47 08 <48> 8b 40 58 48 85 c0 74 42 49 8d 4d 50 44 89 fa 4c 89 e6 e8 69 ff
[ 1924.486483] RSP: 0018:ff66070e00210ee0 EFLAGS: 00010282
[ 1924.486483] RAX: 0000000000000000 RBX: ff3464adca1f0540 RCX: dead000000000122
[ 1924.491484] RDX: ff66070e00210f20 RSI: ff3464adca1f0598 RDI: ff66070e0009b040
[ 1924.495487] RBP: ff66070e00210f10 R08: ff3464ae1fbedfc0 R09: 0000000000000100
[ 1924.500484] R10: 0000000000000001 R11: 0000000000000230 R12: ff3464adc3ea0a80
[ 1924.505277] R13: ff3464adca1f0658 R14: 0000000000000000 R15: 0000000000000000
[ 1924.505277] FS: 0000000000000000(0000) GS:ff3464ae1fbc0000(0000) knlGS:0000000000000000
[ 1924.505277] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1924.514485] CR2: 0000000000000058 CR3: 000000087fd7e003 CR4: 0000000000361ee0
[ 1924.518489] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1924.524494] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1924.524494] Kernel panic - not syncing: Fatal exception in interrupt
[ 1924.531484] Kernel Offset: 0x24000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1924.531484] Rebooting in 1 seconds..
unable to handle page fault for address
[ 4499.650701] BUG: unable to handle page fault for address: ff6dba89c0223048
[ 4499.657016] #PF: supervisor read access in kernel mode
[ 4499.660694] #PF: error_code(0x0000) - not-present page
[ 4499.661044] scsi 76:0:0:0: Direct-Access LIO-ORG IBLOCK 4.0 PQ: 0 ANSI: 5
[ 4499.661853] PGD 107d65067 P4D 107d66067 PUD 107d67067 PMD 107499067 PTE 0
[ 4499.661853] Oops: 0000 [#1] SMP NOPTI
[ 4499.661853] CPU: 5 PID: 1897 Comm: flashgrid_initi Not tainted 5.4.17-2136.334.6.1.el8uek.x86_64 #3
[ 4499.661853] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/13/2024
[ 4499.661853] RIP: 0010:dm_softirq_done+0x4b/0x240 [dm_mod]
[ 4499.661853] Code: 85 e4 0f 84 51 01 00 00 44 0f b6 bf 60 01 00 00 4d 8b ac 24 10 01 00 00 45 89 fe f6 47 1d 04 75 58 49 8b 7d 08 48 85 ff 74 4f <48> 8b 47 08 48 8b 40 58 48 85 c0 74 42 49 8d 4d 50 44 89 fa 4c 89
[ 4499.661853] RSP: 0000:ff6dba89c01b8ee0 EFLAGS: 00010282
[ 4499.661853] RAX: ffffffffc0248c90 RBX: ff4862149ae40000 RCX: dead000000000122
[ 4499.661853] RDX: ff6dba89c01b8f20 RSI: ff4862149ae40058 RDI: ff6dba89c0223040
[ 4499.661853] RBP: ff6dba89c01b8f10 R08: ff4862149fb6dfc0 R09: 0000000000000100
[ 4499.661853] R10: 0000000000000001 R11: 00000000000004d0 R12: ff48621476b40a80
[ 4499.661853] R13: ff4862149ae40118 R14: 0000000000000000 R15: 0000000000000000
[ 4499.661853] FS: 00007f604bc61740(0000) GS:ff4862149fb40000(0000) knlGS:0000000000000000
[ 4499.661853] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4499.661853] CR2: ff6dba89c0223048 CR3: 000000088430e004 CR4: 0000000000361ee0
[ 4499.661853] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4499.661853] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4499.661853] Call Trace:
[ 4499.661853] <IRQ>
[ 4499.661853] ? show_regs.cold.12+0x1a/0x1c
[ 4499.661853] ? __die+0x86/0xd2
[ 4499.661853] ? no_context.isra.25+0x13f/0x552
[ 4499.661853] ? kprobe_ftrace_handler+0xa1/0xff
[ 4499.668591] scsi 76:0:0:0: alua: supports implicit and explicit TPGS
[ 4499.668563] ? __bad_area_nosemaphore+0x43/0x1d8
[ 4499.668563] ? bad_area_nosemaphore+0x16/0x1c
[ 4499.668563] ? do_kern_addr_fault+0x72/0x81
[ 4499.668563] ? __do_page_fault+0x276/0x4b8
[ 4499.668563] ? do_page_fault+0x36/0x122
[ 4499.668563] ? page_fault+0x13d/0x142
[ 4499.674446] scsi 76:0:0:0: alua: device naa.60014059cc3a0c2d09041e1bea47f0bf port group 0 rel port 1
[ 4499.668563] ? dm_mq_queue_rq+0x410/0x410 [dm_mod]
[ 4499.668563] ? dm_softirq_done+0x4b/0x240 [dm_mod]
[ 4499.668563] blk_done_softirq+0xa5/0xd1
[ 4499.668563] __do_softirq+0xd4/0x2cc
[ 4499.668563] irq_exit+0x103/0x108
[ 4499.684898] sd 76:0:0:0: Attached scsi generic sg92 type 0
[ 4499.685696] sd 76:0:0:0: [sdcn] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 4499.685698] sd 76:0:0:0: [sdcn] 4096-byte physical blocks
[ 4499.685838] sd 76:0:0:0: [sdcn] Write Protect is off
[ 4499.685840] sd 76:0:0:0: [sdcn] Mode Sense: 43 00 00 08
[ 4499.686113] sd 76:0:0:0: [sdcn] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 4499.686479] do_IRQ+0x59/0xe4
[ 4499.686479] common_interrupt+0xf/0x1d2
[ 4499.686479] </IRQ>
[ 4499.686479] RIP: 0033:0x7f604b3c0940
[ 4499.686479] Code: 25 7d 9f 50 00 0f 1f 44 00 00 f3 0f 1e fa f2 ff 25 75 9f 50 00 0f 1f 44 00 00 f3 0f 1e fa f2 ff 25 6d 9f 50 00 0f 1f 44 00 00 <f3> 0f 1e fa f2 ff 25 65 9f 50 00 0f 1f 44 00 00 f3 0f 1e fa f2 ff
[ 4499.686479] RSP: 002b:00007ffdc0c3ac78 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
[ 4499.686479] RAX: 0000000000c79d12 RBX: 00007f604bb91260 RCX: 00007f603d639660
[ 4499.686479] RDX: 0000000000d28270 RSI: 0000000000d28270 RDI: 00007f603d639660
[ 4499.686479] RBP: 00007f604ba9ed58 R08: 0000000000c79d06 R09: 0000000000000002
[ 4499.686479] R10: b6152e7475dc8841 R11: 000000000000000f R12: 00007f604ba9ec80
[ 4499.686479] R13: 00007f604ba9ed50 R14: 00007f604b9fca58 R15: 0000000000c79d04
[ 4499.686479] Modules linked in: dm_queue_length iscsi_tcp libiscsi_tcp libiscsi target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod dm_multipath vxlan ip6_udp_tunnel udp_tunnel act_mirred sch_ingress ifb cls_u32 act_gact cls_bpf sch_hfsc rfkill scsi_transport_iscsi nft_counter nft_chain_nat xt_nat nf_nat nft_compat sunrpc intel_rapl_msr intel_rapl_common nfit libnvdimm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul vfat fat ghash_clmulni_intel mlx5_ib aesni_intel crypto_simd ib_uverbs cryptd ib_core hv_utils glue_helper pcspkr joydev sch_fq_codel binfmt_misc xfs mlx5_core sr_mod mlxfw cdrom tls sd_mod psample sg pci_hyperv pci_hyperv_intf hv_storvsc serio_raw hv_netvsc scsi_transport_fc hid_hyperv hyperv_keyboard hv_vmbus dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c
[ 4499.691301] sd 76:0:0:0: [sdcn] Optimal transfer size 524288 bytes
[ 4499.883451] CR2: ff6dba89c0223048
[ 4499.892959] ---[ end trace 48e68afa59564894 ]---
[ 4499.892959] RIP: 0010:dm_softirq_done+0x4b/0x240 [dm_mod]
[ 4499.892959] Code: 85 e4 0f 84 51 01 00 00 44 0f b6 bf 60 01 00 00 4d 8b ac 24 10 01 00 00 45 89 fe f6 47 1d 04 75 58 49 8b 7d 08 48 85 ff 74 4f <48> 8b 47 08 48 8b 40 58 48 85 c0 74 42 49 8d 4d 50 44 89 fa 4c 89
[ 4499.892959] RSP: 0000:ff6dba89c01b8ee0 EFLAGS: 00010282
[ 4499.892959] RAX: ffffffffc0248c90 RBX: ff4862149ae40000 RCX: dead000000000122
[ 4499.892959] RDX: ff6dba89c01b8f20 RSI: ff4862149ae40058 RDI: ff6dba89c0223040
[ 4499.892959] RBP: ff6dba89c01b8f10 R08: ff4862149fb6dfc0 R09: 0000000000000100
[ 4499.892959] R10: 0000000000000001 R11: 00000000000004d0 R12: ff48621476b40a80
[ 4499.892959] R13: ff4862149ae40118 R14: 0000000000000000 R15: 0000000000000000
[ 4499.892959] FS: 00007f604bc61740(0000) GS:ff4862149fb40000(0000) knlGS:0000000000000000
[ 4499.892959] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4499.892959] CR2: ff6dba89c0223048 CR3: 000000088430e004 CR4: 0000000000361ee0
[ 4499.892959] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4499.892959] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4499.892959] Kernel panic - not syncing: Fatal exception in interrupt
[ 4499.892959] Kernel Offset: 0x15800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 4499.892959] Rebooting in 1 seconds..
Details
It happens on the initiator's side when the target server is unexpectedly rebooted on OL8 UEK6. We tested using different OS and iSCSI node.session.nr_sessions
parameters and came to conclusion that only UEK6 with nr_sessions > 1
is affected. The failure frequency is about 5 failures for 25 reboots (20%).
The summary of the configurations we tested is below:
OS | Kernel | Kernel Acronym | node.session.nr_sessions |
Kernel panic? |
---|---|---|---|---|
OL7 | 3.10.0-1160.119.1.0.1.el7.x86_64 |
RHCK | 4 | 😃No |
OL8 | 5.4.17-2136.334.6.1.el8uek.x86_64 |
UEK6 | 1 | 😃No |
OL8 | 5.4.17-2136.334.6.1.el8uek.x86_64 |
UEK6 | 2 | 😡Yes |
OL8 | 5.4.17-2136.334.6.1.el8uek.x86_64 |
UEK6 | 4 | 😡Yes |
RHEL8 | 4.18.0-553.8.1.el8_10.x86_64 |
RHCK | 4 | 😃No |
OL9 | 5.15.0-205.149.5.4.el9uek.x86_64 |
UEK7 | 4 | 😃No |
RHEL9 | 5.14.0-427.16.1.el9_4.x86_64 |
RHCK | 4 | 😃No |
These specific kernel traces are from Azure but we have encountered this issue on AWS too, so that it is not Azure specific.
Metadata
Metadata
Assignees
Labels
No labels