[ClusterLabs] Pacemaker crashed and produce a coredump file

Strahil Nikolov hunter86_bg at yahoo.com
Thu Jul 30 04:23:49 EDT 2020


Early  systemd bugs caused dbus  issues  and session files  not being cleaned  up properly.  At least EL 7.4  or  older  were  affected.

What is your OS and version?

P.S.: I know your pain. I am still fighting to explain that without planned downtime, the end users will definitely get unplanned downtime.

Best Regards,
Strahil Nikolov

На 29 юли 2020 г. 12:46:16 GMT+03:00, lkxjtu <lkxjtu at 163.com> написа:
>Hi Reid Wahl,
>
>
>There are more log informations below. The reason seems to be that
>communication with DBUS timed out. Any suggestions?
>
>
>1672712 Jul 24 21:20:17 [3945305] B0610011       lrmd:     info:
>pcmk_dbus_timeout_dispatch:    Timeout 0x147bbd0 expired
>1672713 Jul 24 21:20:17 [3945305] B0610011       lrmd:     info:
>pcmk_dbus_find_error:  LoadUnit error
>'org.freedesktop.DBus.Error.NoReply': Did not receive a reply.        
>Possible causes include: the remote application did not send a reply,
>the message bus security policy blocked the reply, the reply timeout
>expired, or the n        etwork connection was broken.
>1672714 Jul 24 21:20:17 [3945305] B0610011       lrmd:    error:
>systemd_loadunit_result:       Unexepcted DBus type, expected o in 's'
>instead of s
>1672715 Jul 24 21:20:17 [3945305] B0610011       lrmd:    error:
>crm_abort:     systemd_unit_exec_with_unit: Triggered fatal assert at
>systemd.c:514 : unit
>1672716 2020-07-24T21:20:17.701484+08:00 B0610011 lrmd[3945305]:   
>error: systemd_loadunit_result: Unexepcted DBus type, expected o in 's'
>instead of s
>1672717 2020-07-24T21:20:17.701517+08:00 B0610011 lrmd[3945305]:   
>error: crm_abort: systemd_unit_exec_with_unit: Triggered fatal assert
>at systemd.c:514 : unit
>1672718 Jul 24 21:20:17 [3945306] B0610011       crmd:    error:
>crm_ipc_read:  Connection to lrmd failed
>
>
>
>> Hi,
>>
>> It looks like this is a bug that was fixed in later releases. The
>`path`
>> variable was a null pointer when it was passed to
>> `systemd_unit_exec_with_unit` as the `unit` argument. Commit 62a0d26a
>>
><https://github.com/ClusterLabs/pacemaker/commit/62a0d26a8f85fbcee9b56524ea3f1ae0171cbe52#diff-00b989f66499e2081134c17c06d2b359R201>
>> adds a null check to the `path` variable before using it to call
>`systemd_unit_exec_with_unit`.
>>
>> I believe pacemaker-1.1.15-11.el7 is the first RHEL pacemaker release
>that
>> contains the fix. Can you upgrade and see if the issue is resolved?


More information about the Users mailing list