Упал сервер с ОС Oracle Linux 7 c ошибкой:
WARNING: CPU: 17 PID: 89073 at lib/list_debug.c:56 __list_del_entry_valid+0x6c/0xa0
...
general protection fault: 0000 [#1] SMP PTI
Странно что не было kernel panic, сервер всё равно в ребут ушёл.
Окружение
HP ProLiant DL580 Gen9 Oracle Linux Server release 7.6 4.14.35-1902.0.18.el7uek.x86_64
-
Лог
[174650.482576] list_del corruption. next->prev should be ffff88e098f2bc00, but was 0000000000007f68 [174650.482600] ------------[ cut here ]------------ [174650.482618] WARNING: CPU: 17 PID: 89073 at lib/list_debug.c:56 __list_del_entry_valid+0x6c/0xa0 [174650.482620] Modules linked in: cmac arc4 ecb md4 macsec sctp_diag sctp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 cifs ccm dns_resolver fscache bonding sunrpc sb_edac intel_powerclamp coretemp kvm_intel vfat fat kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd iTCO_wdt raid10 raid1 iTCO_vendor_support ipmi_ssif dm_multipath pcspkr dm_mod sg hpilo hpwdt ioatdma lpc_ich shpchp dca wmi ipmi_si ipmi_devintf ipmi_msghandler binfmt_misc ip_tables ext4 mbcache jbd2 fscrypto sd_mod mgag200 i2c_algo_bit bnx2x drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mdio libcrc32c hpsa nvme crc32c_intel ptp nvme_core scsi_transport_sas pps_core [174650.482729] CPU: 17 PID: 89073 Comm: kworker/u256:0 Not tainted 4.14.35-1902.0.18.el7uek.x86_64 #2 [174650.482731] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 10/21/2019 [174650.482739] Workqueue: writeback wb_workfn (flush-8:0) [174650.482743] task: ffff8907c27e97c0 task.stack: ffffae24b672c000 [174650.482748] RIP: 0010:__list_del_entry_valid+0x6c/0xa0 [174650.482750] RSP: 0018:ffff890dbda43c78 EFLAGS: 00010046 [174650.482753] RAX: 0000000000000054 RBX: ffff88e098f2bc00 RCX: 0000000000000000 [174650.482755] RDX: 0000000000000000 RSI: ffff890dbda56938 RDI: ffff890dbda56938 [174650.482757] RBP: ffff890dbda43c78 R08: 0000000000000000 R09: 0000000000001035 [174650.482759] R10: 0000000000000004 R11: 0000000000001034 R12: ffff88c08167a6c0 [174650.482761] R13: 0000000000000007 R14: ffff88e098f2b948 R15: ffff8907c1de8880 [174650.482764] FS: 0000000000000000(0000) GS:ffff890dbda40000(0000) knlGS:0000000000000000 [174650.482767] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [174650.482769] CR2: 00007f84d0662978 CR3: 000000494240a001 CR4: 00000000003606e0 [174650.482772] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [174650.482774] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [174650.482775] Call Trace: [174650.482778] <IRQ> [174650.482791] deadline_remove_request.isra.4+0x1d/0x86 [174650.482795] deadline_dispatch_requests+0x87/0x12f [174650.482804] blk_peek_request+0xa0/0x2cf [174650.482814] scsi_request_fn+0x3e/0x6ef [174650.482819] __blk_run_queue+0x46/0x61 [174650.482823] blk_run_queue+0x30/0x48 [174650.482827] scsi_run_queue+0x28e/0x314 [174650.482832] scsi_end_request+0x16a/0x1fd [174650.482836] scsi_io_completion+0x2db/0x665 [174650.482840] scsi_finish_command+0xdc/0x125 [174650.482845] scsi_softirq_done+0x145/0x163 [174650.482851] blk_done_softirq+0xa4/0xcc [174650.482859] __do_softirq+0xd9/0x28d [174650.482866] irq_exit+0xdf/0xe5 [174650.482869] do_IRQ+0x59/0xdb [174650.482878] common_interrupt+0x1ba/0x372 [174650.482879] </IRQ> [174650.482883] RIP: 0010:_raw_spin_unlock_irqrestore+0x1e/0x26 [174650.482885] RSP: 0018:ffffae24b672f858 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff5c [174650.482888] RAX: fffffaf127dcea00 RBX: 0000000000000287 RCX: 0000000000000000 [174650.482890] RDX: 0000000000000002 RSI: 0000000000000287 RDI: 0000000000000287 [174650.482891] RBP: ffffae24b672f860 R08: fffffaf127dcea00 R09: 0000000000000046 [174650.482893] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 [174650.482895] R13: 0000000000000000 R14: ffff88c9e0f8ecb8 R15: ffff88c9e0f8eca0 [174650.482904] __test_set_page_writeback+0x1e5/0x34e [174650.482966] ext4_bio_write_page+0x239/0x4c0 [ext4] [174650.482988] mpage_submit_page+0x57/0x70 [ext4] [174650.483007] mpage_map_and_submit_buffers+0x148/0x230 [ext4] [174650.483028] ext4_writepages+0x87e/0xf70 [ext4] [174650.483038] ? intel_map_sg+0x1a6/0x1f8 [174650.483046] ? fprop_fraction_percpu+0x2f/0x72 [174650.483051] do_writepages+0x1f/0x65 [174650.483054] __writeback_single_inode+0x53/0x395 [174650.483057] ? wb_calc_thresh+0x4f/0x65 [174650.483061] writeback_sb_inodes+0x2dc/0x5d3 [174650.483066] __writeback_inodes_wb+0x8c/0xbb [174650.483069] wb_writeback+0x2a1/0x32d [174650.483073] wb_workfn+0x1aa/0x3b0 [174650.483081] process_one_work+0x169/0x395 [174650.483087] worker_thread+0x4d/0x3e5 [174650.483092] kthread+0x105/0x138 [174650.483096] ? rescuer_thread+0x380/0x375 [174650.483099] ? kthread_bind+0x20/0x15 [174650.483104] ret_from_fork+0x3e/0x49 [174650.483106] Code: 48 89 c2 48 89 fe 31 c0 48 c7 c7 20 62 20 ac e8 4e 9e d0 ff 0f 0b 31 c0 5d c3 48 89 fe 31 c0 48 c7 c7 d0 62 20 ac e8 37 9e d0 ff <0f> 0b 31 c0 5d c3 48 89 fe 31 c0 48 c7 c7 90 62 20 ac e8 20 9e [174650.483166] ---[ end trace 67e4a96375a60cc1 ]--- [174650.483582] general protection fault: 0000 [#1] SMP PTI [174650.486351] Modules linked in: cmac arc4 ecb md4 macsec sctp_diag sctp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 cifs ccm dns_resolver fscache bonding sunrpc sb_edac intel_powerclamp coretemp kvm_intel vfat fat kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd iTCO_wdt raid10 raid1 iTCO_vendor_support ipmi_ssif dm_multipath pcspkr dm_mod sg hpilo hpwdt ioatdma lpc_ich shpchp dca wmi ipmi_si ipmi_devintf ipmi_msghandler binfmt_misc ip_tables ext4 mbcache jbd2 fscrypto sd_mod mgag200 i2c_algo_bit bnx2x drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mdio libcrc32c hpsa nvme crc32c_intel ptp nvme_core scsi_transport_sas pps_core [174650.502372] CPU: 17 PID: 89073 Comm: kworker/u256:0 Tainted: G W 4.14.35-1902.0.18.el7uek.x86_64 #2 [174650.505025] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 10/21/2019 [174650.507689] Workqueue: writeback wb_workfn (flush-8:0) [174650.510310] task: ffff8907c27e97c0 task.stack: ffffae24b672c000 [174650.512906] RIP: 0010:deadline_dispatch_requests+0x7e/0x12f [174650.515478] RSP: 0018:ffff890dbda43cb0 EFLAGS: 00010002 [174650.518022] RAX: ffff88e098f2c488 RBX: 202c000000000000 RCX: ffff88e098f2c400 [174650.520546] RDX: ffff88e098f2c400 RSI: ffff88e098f2c000 RDI: ffff88e098f2c088 [174650.523041] RBP: ffff890dbda43cc8 R08: 0000000000000020 R09: 0000000000000000 [174650.525517] R10: 0000000000000001 R11: 00000000038af9cb R12: ffff88e098f2c000 [174650.527951] R13: 0000000000000007 R14: ffff88e098f2bd48 R15: ffff8907c1de8880 [174650.530353] FS: 0000000000000000(0000) GS:ffff890dbda40000(0000) knlGS:0000000000000000 [174650.532731] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [174650.535094] CR2: 00007f84d0662978 CR3: 000000494240a001 CR4: 00000000003606e0 [174650.537438] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [174650.539741] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [174650.542015] Call Trace: [174650.544232] <IRQ> [174650.546426] blk_peek_request+0xa0/0x2cf [174650.548573] scsi_request_fn+0x3e/0x6ef [174650.550696] __blk_run_queue+0x46/0x61 [174650.552772] blk_run_queue+0x30/0x48 [174650.554827] scsi_run_queue+0x28e/0x314 [174650.556829] scsi_end_request+0x16a/0x1fd [174650.558797] scsi_io_completion+0x2db/0x665 [174650.560726] scsi_finish_command+0xdc/0x125 [174650.562623] scsi_softirq_done+0x145/0x163 [174650.565004] blk_done_softirq+0xa4/0xcc [174650.566820] __do_softirq+0xd9/0x28d [174650.568598] irq_exit+0xdf/0xe5 [174650.570330] do_IRQ+0x59/0xdb [174650.572019] common_interrupt+0x1ba/0x372 [174650.573689] </IRQ> [174650.575307] RIP: 0010:_raw_spin_unlock_irqrestore+0x1e/0x26 [174650.576911] RSP: 0018:ffffae24b672f858 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff5c [174650.578507] RAX: fffffaf127dcea00 RBX: 0000000000000287 RCX: 0000000000000000 [174650.580072] RDX: 0000000000000002 RSI: 0000000000000287 RDI: 0000000000000287 [174650.581619] RBP: ffffae24b672f860 R08: fffffaf127dcea00 R09: 0000000000000046 [174650.583121] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 [174650.584612] R13: 0000000000000000 R14: ffff88c9e0f8ecb8 R15: ffff88c9e0f8eca0 [174650.586062] __test_set_page_writeback+0x1e5/0x34e [174650.587509] ext4_bio_write_page+0x239/0x4c0 [ext4] [174650.588913] mpage_submit_page+0x57/0x70 [ext4] [174650.590261] mpage_map_and_submit_buffers+0x148/0x230 [ext4] [174650.591588] ext4_writepages+0x87e/0xf70 [ext4] [174650.592870] ? intel_map_sg+0x1a6/0x1f8 [174650.594093] ? fprop_fraction_percpu+0x2f/0x72 [174650.595259] do_writepages+0x1f/0x65 [174650.596397] __writeback_single_inode+0x53/0x395 [174650.597501] ? wb_calc_thresh+0x4f/0x65 [174650.598588] writeback_sb_inodes+0x2dc/0x5d3 [174650.599659] __writeback_inodes_wb+0x8c/0xbb [174650.600719] wb_writeback+0x2a1/0x32d [174650.601785] wb_workfn+0x1aa/0x3b0 [174650.602822] process_one_work+0x169/0x395 [174650.603871] worker_thread+0x4d/0x3e5 [174650.604892] kthread+0x105/0x138 [174650.605910] ? rescuer_thread+0x380/0x375 [174650.606932] ? kthread_bind+0x20/0x15 [174650.607932] ret_from_fork+0x3e/0x49 [174650.608901] Code: b9 35 48 00 41 83 e5 01 48 8d 88 78 ff ff ff 31 d2 4d 63 ed 48 85 c0 4c 89 e6 48 0f 45 d1 49 83 c5 06 4a 89 14 eb 49 8b 5c 24 30 <48> 8b 7b 18 e8 e9 fe ff ff 4c 89 e6 48 89 df e8 0e 25 fd ff b8 [174650.611057] RIP: deadline_dispatch_requests+0x7e/0x12f RSP: ffff890dbda43cb0
Нашёл похожую ошибку в том же месте и той же функции здесь у разработчиков Ceph и здесь у господина из Fujitsu.
Что-то не так на стыке железо-драйвера-ядро. Скорее всего поможет откат или обновление ядра. Если ещё раз повторится, будем думать.