Server crash mit Kernel 4.1

german

#1

Hallo,

unser Server hat nicht mehr reagiert,
auf Tastatur und Maus reagierte er nicht mehr, auch alle Dienste im Netzwerk reagierten nicht, anpingen ließ er sich allerdings noch.
Das System musste mit dem Frontpanelschalter ausgemacht und neu gestartet werden.

Hardware: Intel® Server System P4308RPLSHDR Barebone

Linux fs-pgd3 4.1.0-ucs167-amd64 #1 SMP Debian 4.1.6-1.167.201601252247 (2016-01-25) x86_64 GNU/Linux
Das Problem ist erst ein paar Tage nach Update auf Kernel 4.1 aufgetreten, davor monatelang keine solchen Probleme.

Für die Fehlersuche:
Infodatei im Anhang
und hier die verdächtigen Einträge im Syslog:

Feb 4 16:30:50 fs-pgd3 kernel: [241444.659626] INFO: rcu_sched self-detected stall on CPU { 2} (t=5250 jiffies g=4714608 c=4714607 q=7668)
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659640] Task dump for CPU 2:
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659642] smbd R running task 0 29234 29221 0x00000008
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659644] 0000000000000003 ffffffff81851340 ffffffff810d3c84 000000000047f070
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659645] ffff88043f497100 ffffffff81851340 ffffffff81851340 ffffffff818f6c60
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659646] ffffffff810d7659 0000000000000000 0000000000000000 00003d0906785140
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659647] Call Trace:
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659649] [] ? rcu_dump_cpu_stacks+0x84/0xc0
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659655] [] ? rcu_check_callbacks+0x449/0x740
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659657] [] ? update_wall_time+0x23d/0x670
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659668] [] ? tick_sched_do_timer+0x40/0x40
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659670] [] ? update_process_times+0x34/0x70
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659671] [] ? tick_sched_handle.isra.12+0x2c/0x70
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659673] [] ? tick_sched_timer+0x49/0x80
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659674] [] ? __run_hrtimer+0x6d/0x1b0
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659676] [] ? read_tsc+0x5/0x10
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659677] [] ? hrtimer_interrupt+0xed/0x210
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659679] [] ? smp_apic_timer_interrupt+0x39/0x50
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659682] [] ? apic_timer_interrupt+0x6e/0x80
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659682] [] ? _raw_spin_lock+0x21/0x50
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659685] [] ? unix_dgram_connect+0x93/0x200
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659688] [] ? SYSC_connect+0xe8/0x100
Feb 4 16:30:50 fs-pgd3 kernel: [241444.659690] [] ? system_call_fast_compare_end+0xc/0x6b
Feb 4 16:31:18 fs-pgd3 kernel: [241472.050960] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [smbd:29234]
Feb 4 16:31:18 fs-pgd3 kernel: [241472.050963] Modules linked in: arc4 ecb md4 hmac nls_utf8 cifs dns_resolver ppdev lp xt_addrtype xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 bridge stp llc overlay ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc quota_v2 quota_tree joydev hid_generic usbhid hid uas usb_storage x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper iTCO_wdt iTCO_vendor_support lpc_ich mfd_core xhci_pci xhci_hcd ie31200_edac winbond_cir ipmi_si int3402_thermal int3400_thermal int3403_thermal ipmi_msghandler edac_core i2c_i801 battery int340x_thermal_zone rc_core acpi_thermal_rel parport_pc tpm_t
Feb 4 16:31:18 fs-pgd3 kernel: is tpm parport processor ablk_helper cryptd 8250_fintek pcspkr evdev ext4 crc16 mbcache jbd2 dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod sg sr_mod cdrom sd_mod ahci libahci fan video libata thermal_sys ttm megaraid_sas crc32c_intel drm_kms_helper igb scsi_mod drm ehci_pci ehci_hcd usbcore dca ptp button pps_core i2c_algo_bit usb_common
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051009] CPU: 2 PID: 29234 Comm: smbd Not tainted 4.1.0-ucs167-amd64 #1 Debian 4.1.6-1.167.201601252247
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051010] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.03.02.0003.070120151022 07/01/2015
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051011] task: ffff8803df57cca0 ti: ffff880438208000 task.ti: ffff880438208000
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051012] RIP: 0010:[] [] _raw_spin_lock+0x2a/0x50
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051017] RSP: 0018:ffff88043820be40 EFLAGS: 00000202
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051018] RAX: 0000000000004117 RBX: 0000000000000002 RCX: 00000000000013d2
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051018] RDX: 00000000000013c8 RSI: 00000000000013c8 RDI: ffff88043c27fb28
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051019] RBP: 0000000000000028 R08: 000000000000001a R09: 0000000000000000
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051020] R10: ffffffffffffffff R11: 4500003433323932 R12: ffff88043820be08
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051020] R13: 0000000056b36e95 R14: ffffffff811ebf0e R15: ffff8803b893f4b0
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051021] FS: 00007febf42ad720(0000) GS:ffff88043f480000(0000) knlGS:0000000000000000
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051023] CR2: 000055960f9be168 CR3: 00000004399d6000 CR4: 00000000001406e0
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051023] Stack:
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051024] ffffffff8153b343 ffffffff8166ce60 ffff88043820be64 ffff88043cc07000
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051025] 000000289a95ecf0 ffff88010ab32c18 ffff88009a95ecc0 000000000000006e
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051026] 00007ffd761fd360 00007ffd761fd360 00007ffd761fd240 0000000000000006
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051027] Call Trace:
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051031] [] ? unix_dgram_connect+0x93/0x200
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051034] [] ? SYSC_connect+0xe8/0x100
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051035] [] ? system_call_fast_compare_end+0xc/0x6b
Feb 4 16:31:18 fs-pgd3 kernel: [241472.051036] Code: 90 0f 1f 44 00 00 b8 00 00 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 01 c3 0f b7 f2 b8 00 80 00 00 0f b7 0f 41 89 c8 41 31 d0 <41> 81 e0 fe ff 00 00 74 10 f3 90 83 e8 01 75 e7 0f 1f 80 00 00
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062551] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [smbd:29234]
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062554] Modules linked in: arc4 ecb md4 hmac nls_utf8 cifs dns_resolver ppdev lp xt_addrtype xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 bridge stp llc overlay ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc quota_v2 quota_tree joydev hid_generic usbhid hid uas usb_storage x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper iTCO_wdt iTCO_vendor_support lpc_ich mfd_core xhci_pci xhci_hcd ie31200_edac winbond_cir ipmi_si int3402_thermal int3400_thermal int3403_thermal ipmi_msghandler edac_core i2c_i801 battery int340x_thermal_zone rc_core acpi_thermal_rel parport_pc tpm_t
Feb 4 16:31:46 fs-pgd3 kernel: is tpm parport processor ablk_helper cryptd 8250_fintek pcspkr evdev ext4 crc16 mbcache jbd2 dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod sg sr_mod cdrom sd_mod ahci libahci fan video libata thermal_sys ttm megaraid_sas crc32c_intel drm_kms_helper igb scsi_mod drm ehci_pci ehci_hcd usbcore dca ptp button pps_core i2c_algo_bit usb_common
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062599] CPU: 2 PID: 29234 Comm: smbd Tainted: G L 4.1.0-ucs167-amd64 #1 Debian 4.1.6-1.167.201601252247
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062600] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.03.02.0003.070120151022 07/01/2015
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062601] task: ffff8803df57cca0 ti: ffff880438208000 task.ti: ffff880438208000
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062602] RIP: 0010:[] [] _raw_spin_lock+0x21/0x50
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062607] RSP: 0018:ffff88043820be40 EFLAGS: 00000202
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062608] RAX: 000000000000284a RBX: 0000000000000002 RCX: 00000000000013d2
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062608] RDX: 00000000000013c8 RSI: 00000000000013c8 RDI: ffff88043c27fb28
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062609] RBP: 0000000000000028 R08: 000000000000001a R09: 0000000000000000
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062610] R10: ffffffffffffffff R11: 4500003433323932 R12: ffff88043820be08
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062610] R13: 0000000056b36e95 R14: ffffffff811ebf0e R15: ffff8803b893f4b0
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062611] FS: 00007febf42ad720(0000) GS:ffff88043f480000(0000) knlGS:0000000000000000
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062612] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062613] CR2: 000055960f9be168 CR3: 00000004399d6000 CR4: 00000000001406e0
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062613] Stack:
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062614] ffffffff8153b343 ffffffff8166ce60 ffff88043820be64 ffff88043cc07000
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062615] 000000289a95ecf0 ffff88010ab32c18 ffff88009a95ecc0 000000000000006e
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062616] 00007ffd761fd360 00007ffd761fd360 00007ffd761fd240 0000000000000006
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062618] Call Trace:
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062621] [] ? unix_dgram_connect+0x93/0x200
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062624] [] ? SYSC_connect+0xe8/0x100
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062626] [] ? system_call_fast_compare_end+0xc/0x6b
Feb 4 16:31:46 fs-pgd3 kernel: [241500.062626] Code: f6 0f 1f 80 00 00 00 00 c3 90 0f 1f 44 00 00 b8 00 00 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 01 c3 0f b7 f2 b8 00 80 00 00 <0f> b7 0f 41 89 c8 41 31 d0 41 81 e0 fe ff 00 00 74 10 f3 90 83
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697710] INFO: rcu_sched self-detected stall on CPU { 2} (t=21003 jiffies g=4714608 c=4714607 q=30466)
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697713] Task dump for CPU 2:
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697714] smbd R running task 0 29234 29221 0x00000008
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697716] 0000000000000003 ffffffff81851340 ffffffff810d3c84 000000000047f070
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697718] ffff88043f497100 ffffffff81851340 ffffffff81851340 ffffffff818f6c60
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697719] ffffffff810d7659 0000000000000000 0000000000000000 00003d0906785140
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697720] Call Trace:
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697721] [] ? rcu_dump_cpu_stacks+0x84/0xc0
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697728] [] ? rcu_check_callbacks+0x449/0x740
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697729] [] ? update_wall_time+0x23d/0x670
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697732] [] ? tick_sched_do_timer+0x40/0x40
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697733] [] ? update_process_times+0x34/0x70
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697735] [] ? tick_sched_handle.isra.12+0x2c/0x70
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697736] [] ? tick_sched_timer+0x49/0x80
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697737] [] ? __run_hrtimer+0x6d/0x1b0
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697739] [] ? read_tsc+0x5/0x10
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697741] [] ? hrtimer_interrupt+0xed/0x210
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697743] [] ? smp_apic_timer_interrupt+0x39/0x50
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697745] [] ? apic_timer_interrupt+0x6e/0x80
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697746] [] ? _raw_spin_lock+0x35/0x50
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697749] [] ? unix_dgram_connect+0x93/0x200
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697752] [] ? SYSC_connect+0xe8/0x100
Feb 4 16:31:53 fs-pgd3 kernel: [241507.697754] [] ? system_call_fast_compare_end+0xc/0x6b
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075797] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [smbd:29234]
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075800] Modules linked in: arc4 ecb md4 hmac nls_utf8 cifs dns_resolver ppdev lp xt_addrtype xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 bridge stp llc overlay ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc quota_v2 quota_tree joydev hid_generic usbhid hid uas usb_storage x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper iTCO_wdt iTCO_vendor_support lpc_ich mfd_core xhci_pci xhci_hcd ie31200_edac winbond_cir ipmi_si int3402_thermal int3400_thermal int3403_thermal ipmi_msghandler edac_core i2c_i801 battery int340x_thermal_zone rc_core acpi_thermal_rel parport_pc tpm_t
Feb 4 16:32:18 fs-pgd3 kernel: is tpm parport processor ablk_helper cryptd 8250_fintek pcspkr evdev ext4 crc16 mbcache jbd2 dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod sg sr_mod cdrom sd_mod ahci libahci fan video libata thermal_sys ttm megaraid_sas crc32c_intel drm_kms_helper igb scsi_mod drm ehci_pci ehci_hcd usbcore dca ptp button pps_core i2c_algo_bit usb_common
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075876] CPU: 2 PID: 29234 Comm: smbd Tainted: G L 4.1.0-ucs167-amd64 #1 Debian 4.1.6-1.167.201601252247
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075877] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.03.02.0003.070120151022 07/01/2015
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075878] task: ffff8803df57cca0 ti: ffff880438208000 task.ti: ffff880438208000
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075879] RIP: 0010:[] [] _raw_spin_lock+0x21/0x50
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075884] RSP: 0018:ffff88043820be40 EFLAGS: 00000206
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075884] RAX: 00000000000060cc RBX: 0000000000000002 RCX: 00000000000013da
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075885] RDX: 00000000000013c8 RSI: 00000000000013c8 RDI: ffff88043c27fb28
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075886] RBP: 0000000000000028 R08: 0000000000000012 R09: 0000000000000000
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075886] R10: ffffffffffffffff R11: 4500003433323932 R12: ffff88043820be08
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075887] R13: 0000000056b36e95 R14: ffffffff811ebf0e R15: ffff8803b893f4b0
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075888] FS: 00007febf42ad720(0000) GS:ffff88043f480000(0000) knlGS:0000000000000000
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075889] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075889] CR2: 000055960f9be168 CR3: 00000004399d6000 CR4: 00000000001406e0
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075890] Stack:
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075891] ffffffff8153b343 ffffffff8166ce60 ffff88043820be64 ffff88043cc07000
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075892] 000000289a95ecf0 ffff88010ab32c18 ffff88009a95ecc0 000000000000006e
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075893] 00007ffd761fd360 00007ffd761fd360 00007ffd761fd240 0000000000000006
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075894] Call Trace:
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075898] [] ? unix_dgram_connect+0x93/0x200
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075901] [] ? SYSC_connect+0xe8/0x100
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075903] [] ? system_call_fast_compare_end+0xc/0x6b
Feb 4 16:32:18 fs-pgd3 kernel: [241532.075903] Code: f6 0f 1f 80 00 00 00 00 c3 90 0f 1f 44 00 00 b8 00 00 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 01 c3 0f b7 f2 b8 00 80 00 00 <0f> b7 0f 41 89 c8 41 31 d0 41 81 e0 fe ff 00 00 74 10 f3 90 83
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087387] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [smbd:29234]
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087401] Modules linked in: arc4 ecb md4 hmac nls_utf8 cifs dns_resolver ppdev lp xt_addrtype xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 bridge stp llc overlay ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc quota_v2 quota_tree joydev hid_generic usbhid hid uas usb_storage x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper iTCO_wdt iTCO_vendor_support lpc_ich mfd_core xhci_pci xhci_hcd ie31200_edac winbond_cir ipmi_si int3402_thermal int3400_thermal int3403_thermal ipmi_msghandler edac_core i2c_i801 battery int340x_thermal_zone rc_core acpi_thermal_rel parport_pc tpm_t
Feb 4 16:32:46 fs-pgd3 kernel: is tpm parport processor ablk_helper cryptd 8250_fintek pcspkr evdev ext4 crc16 mbcache jbd2 dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod sg sr_mod cdrom sd_mod ahci libahci fan video libata thermal_sys ttm megaraid_sas crc32c_intel drm_kms_helper igb scsi_mod drm ehci_pci ehci_hcd usbcore dca ptp button pps_core i2c_algo_bit usb_common
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087456] CPU: 2 PID: 29234 Comm: smbd Tainted: G L 4.1.0-ucs167-amd64 #1 Debian 4.1.6-1.167.201601252247
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087457] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.03.02.0003.070120151022 07/01/2015
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087457] task: ffff8803df57cca0 ti: ffff880438208000 task.ti: ffff880438208000
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087458] RIP: 0010:[] [] _raw_spin_lock+0x24/0x50
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087463] RSP: 0018:ffff88043820be40 EFLAGS: 00000202
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087463] RAX: 000000000000053b RBX: 0000000000000002 RCX: 00000000000013da
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087464] RDX: 00000000000013c8 RSI: 00000000000013c8 RDI: ffff88043c27fb28
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087465] RBP: 0000000000000028 R08: 0000000000000012 R09: 0000000000000000
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087465] R10: ffffffffffffffff R11: 4500003433323932 R12: ffff88043820be08
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087466] R13: 0000000056b36e95 R14: ffffffff811ebf0e R15: ffff8803b893f4b0
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087467] FS: 00007febf42ad720(0000) GS:ffff88043f480000(0000) knlGS:0000000000000000
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087467] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087468] CR2: 000055960f9be168 CR3: 00000004399d6000 CR4: 00000000001406e0
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087469] Stack:
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087470] ffffffff8153b343 ffffffff8166ce60 ffff88043820be64 ffff88043cc07000
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087471] 000000289a95ecf0 ffff88010ab32c18 ffff88009a95ecc0 000000000000006e
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087472] 00007ffd761fd360 00007ffd761fd360 00007ffd761fd240 0000000000000006
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087473] Call Trace:
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087476] [] ? unix_dgram_connect+0x93/0x200
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087480] [] ? SYSC_connect+0xe8/0x100
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087481] [] ? system_call_fast_compare_end+0xc/0x6b
Feb 4 16:32:46 fs-pgd3 kernel: [241560.087482] Code: 80 00 00 00 00 c3 90 0f 1f 44 00 00 b8 00 00 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 01 c3 0f b7 f2 b8 00 80 00 00 0f b7 0f <41> 89 c8 41 31 d0 41 81 e0 fe ff 00 00 74 10 f3 90 83 e8 01 75
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735793] INFO: rcu_sched self-detected stall on CPU { 2} (t=36756 jiffies g=4714608 c=4714607 q=38591)
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735797] Task dump for CPU 2:
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735798] smbd R running task 0 29234 29221 0x00000008
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735800] 0000000000000003 ffffffff81851340 ffffffff810d3c84 000000000047f070
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735801] ffff88043f497100 ffffffff81851340 ffffffff81851340 ffffffff818f6c60
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735802] ffffffff810d7659 0000000000000000 0000000000000000 00003d0906785140
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735804] Call Trace:
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735805] [] ? rcu_dump_cpu_stacks+0x84/0xc0
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735811] [] ? rcu_check_callbacks+0x449/0x740
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735813] [] ? update_wall_time+0x23d/0x670
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735815] [] ? tick_sched_do_timer+0x40/0x40
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735816] [] ? update_process_times+0x34/0x70
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735818] [] ? tick_sched_handle.isra.12+0x2c/0x70
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735819] [] ? tick_sched_timer+0x49/0x80
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735820] [] ? __run_hrtimer+0x6d/0x1b0
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735822] [] ? read_tsc+0x5/0x10
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735824] [] ? hrtimer_interrupt+0xed/0x210
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735826] [] ? smp_apic_timer_interrupt+0x39/0x50
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735828] [] ? apic_timer_interrupt+0x6e/0x80
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735828] [] ? _raw_spin_lock+0x21/0x50
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735832] [] ? unix_dgram_connect+0x93/0x200
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735835] [] ? SYSC_connect+0xe8/0x100
Feb 4 16:32:56 fs-pgd3 kernel: [241570.735836] [] ? system_call_fast_compare_end+0xc/0x6b
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102290] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [smbd:29234]
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102293] Modules linked in: arc4 ecb md4 hmac nls_utf8 cifs dns_resolver ppdev lp xt_addrtype xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 bridge stp llc overlay ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc quota_v2 quota_tree joydev hid_generic usbhid hid uas usb_storage x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper iTCO_wdt iTCO_vendor_support lpc_ich mfd_core xhci_pci xhci_hcd ie31200_edac winbond_cir ipmi_si int3402_thermal int3400_thermal int3403_thermal ipmi_msghandler edac_core i2c_i801 battery int340x_thermal_zone rc_core acpi_thermal_rel parport_pc tpm_t
Feb 4 16:33:22 fs-pgd3 kernel: is tpm parport processor ablk_helper cryptd 8250_fintek pcspkr evdev ext4 crc16 mbcache jbd2 dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod sg sr_mod cdrom sd_mod ahci libahci fan video libata thermal_sys ttm megaraid_sas crc32c_intel drm_kms_helper igb scsi_mod drm ehci_pci ehci_hcd usbcore dca ptp button pps_core i2c_algo_bit usb_common
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102339] CPU: 2 PID: 29234 Comm: smbd Tainted: G L 4.1.0-ucs167-amd64 #1 Debian 4.1.6-1.167.201601252247
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102340] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.03.02.0003.070120151022 07/01/2015
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102341] task: ffff8803df57cca0 ti: ffff880438208000 task.ti: ffff880438208000
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102342] RIP: 0010:[] [] _raw_spin_lock+0x33/0x50
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102347] RSP: 0018:ffff88043820be40 EFLAGS: 00000202
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102348] RAX: 0000000000006657 RBX: 0000000000000002 RCX: 00000000000013e2
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102349] RDX: 00000000000013c8 RSI: 00000000000013c8 RDI: ffff88043c27fb28
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102349] RBP: 0000000000000028 R08: 000000000000002a R09: 0000000000000000
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102350] R10: ffffffffffffffff R11: 4500003433323932 R12: ffff88043820be08
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102350] R13: 0000000056b36e95 R14: ffffffff811ebf0e R15: ffff8803b893f4b0
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102351] FS: 00007febf42ad720(0000) GS:ffff88043f480000(0000) knlGS:0000000000000000
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102352] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102353] CR2: 000055960f9be168 CR3: 00000004399d6000 CR4: 00000000001406e0
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102354] Stack:
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102354] ffffffff8153b343 ffffffff8166ce60 ffff88043820be64 ffff88043cc07000
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102356] 000000289a95ecf0 ffff88010ab32c18 ffff88009a95ecc0 000000000000006e
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102357] 00007ffd761fd360 00007ffd761fd360 00007ffd761fd240 0000000000000006
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102358] Call Trace:
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102361] [] ? unix_dgram_connect+0x93/0x200
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102365] [] ? SYSC_connect+0xe8/0x100
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102366] [] ? system_call_fast_compare_end+0xc/0x6b
Feb 4 16:33:22 fs-pgd3 kernel: [241596.102367] Code: 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 01 c3 0f b7 f2 b8 00 80 00 00 0f b7 0f 41 89 c8 41 31 d0 41 81 e0 fe ff 00 00 74 10 90 83 e8 01 75 e7 0f 1f 80 00 00 00 00 eb d9 0f b7 f1 e8 08
Feb 4 16:33:50 fs-pgd3 kernel: [241624.113880] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [smbd:29234]
Feb 4 16:33:50 fs-pgd3 kernel: [241624.113882] Modules linked in: arc4 ecb md4 hmac nls_utf8 cifs dns_resolver ppdev lp xt_addrtype xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 bridge stp llc overlay ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc quota_v2 quota_tree joydev hid_generic usbhid hid uas usb_storage x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper iTCO_wdt iTCO_vendor_support lpc_ich mfd_core xhci_pci xhci_hcd ie31200_edac winbond_cir ipmi_si int3402_thermal int3400_thermal int3403_thermal ipmi_msghandler edac_core i2c_i801 battery int340x_thermal_zone rc_core acpi_thermal_rel parport_pc tpm_t
Feb 4 16:33:50 fs-pgd3 kernel: is tpm parport processor ablk_helper cryptd 8250_fintek pcspkr evdev ext4 crc16 mbcache jbd2 dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod sg sr_mod cdrom sd_mod ahci libahci fan video libata thermal_sys ttm megaraid_sas crc32c_intel drm_kms_helper igb scsi_mod drm ehci_pci ehci_hcd usbcore dca ptp button pps_core i2c_algo_bit usb_common
A8969FD1-D213-E511-BC66-000E0C68D362.tar.gz (10.6 KB)


#2

Der Server hat heute wieder Dienst für Dienst nacheinander eingestellt, scheint mir so, dass sich ein Prozessorkern nach dem anderen verabschiedet…
Habe heute alle verfügbaren Patches eingespielt und ihn zum Testen mit Kernel 3.16 gebootet.

-> Linux fs-pgd3 3.16.0-ucs135-amd64 #1 SMP Debian 3.16.7-ckt11-1~bpo70+1.135.201507161851 (2015-07-1 x86_64 GNU/Linux

Anbei die Ausgabe von DMIDECODE

Weiss hier jemand dauerhafte Abhilfe?
dmidecode.txt (17 KB)


#3

Noch nicht, wir arbeiten dran: [bug]40558[/bug].

Workaround linux-image-4.1.0-ucs163-amd64 installieren und den Kernel beim Boot auswählen.


#4

Request for testers

We’ve built a patched kernel, but we don’t know yet if it fixes the problem.
Willing testers can include the repository containing the patched kernel and should test, if the bug still occurs. Feedback is very much appreciated.

ucr set repository/online/component/glibc{=yes,/parts=unmaintained} case "$(uname -m)" in x86_64) univention-install linux-image-4.1.0-ucs170-amd64 ;; i?86) univention-install linux-image-4.1.0-ucs170-686-pae ;; esac reboot
The kernel is not signed for UEFI/secured-boot!


#5

Hat es womöglich mit dem gepatchten Kernel zu tun, dass nun das neueste Update nicht durchläuft?

uname -a:
Linux ucs-srv 4.1.0-ucs170-amd64 #1 SMP Debian 4.1.6-1.170.201602051653 (2016-02-05) x86_64 GNU/Linux

/var/log/univention/updater.log:

Starting dist-upgrade at Mi 10. Feb 18:35:32 CET 2016
Paketlisten werden gelesen…
Abhängigkeitsbaum wird aufgebaut…
Statusinformationen werden eingelesen…
Die folgenden Pakete werden aktualisiert (Upgrade):
linux-libc-dev
1 aktualisiert, 0 neu installiert, 0 zu entfernen und 0 nicht aktualisiert.
Es müssen 1.043 kB an Archiven heruntergeladen werden.
Nach dieser Operation werden 0 B Plattenplatz zusätzlich benutzt.
WARNUNG: Die folgenden Pakete können nicht authentifiziert werden!
linux-libc-dev
E: Es gab Probleme und -y wurde ohne --force-yes verwendet.

ERROR: An error occurred during update. Please check the logfiles.
Mi 10. Feb 18:35:33 CET 2016

Starting univention-upgrade. Current UCS version is 4.1-0 errata100

Checking for local repository: skipped
Checking for release updates: none
Checking for package updates: found
Please rerun command without --check argument to install.


#6

nach dem entfernen des ‘glibc’ repositories stand das betreffende Update nicht mehr in der Liste, hat sich also erledigt.


#7

Habe diesen Effekt auch bei zwei Fujitsu Primergy RX 2530M1. Offenbar tritt der Fehler erst nach dem Update auf Version 4.1-0 errata100 auf. Jedes Mal ist smbd involviert:

kernel:[77511.209102] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [smbd:3935]

Inzwischen wird ein weiteres Update angeboten. Wird das Problem damit gelöst ?

Gruß
Martin


#8

Einer der beiden Server ist soeben mit allen sechs CPUs hängend stehen geblieben. Dabei waren auch andere Prozesse als smbd involviert. Was kann ich tun, um dieses Problem zu lösen ?

Martin


#9

den Server neu booten und dabei im Boot-Menü den Kernel 4.1.0-ucs163… oder älter auswählen. So habe ich es vorläufig in den Griff bekommen.
Ein neues Kernel-Update ist offensichtlich in Arbeit und soll das Problem in Kürze beheben.


#10

mit 4.1-0 errata111 ist das Problem wahrscheinlich noch nicht behoben, denn da ist bis heute noch kein neuer Kernel dabei gewesen.


#11

Wir haben das Update gerade veröffentlicht: errata.software-univention.de/ucs/4.1/114.html


#12

Danke.

Martin