[ClusterLabs] Antw: [EXT] Re: Pacemaker crashed and produce a coredump file
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Thu Jul 30 04:39:38 EDT 2020
>>> Strahil Nikolov <hunter86_bg at yahoo.com> schrieb am 30.07.2020 um 10:23 in
Nachricht <B346C97F-F4CB-4CE1-85EF-EA1EB8CFD788 at yahoo.com>:
> Early systemd bugs caused dbus issues and session files not being
cleaned
> up properly. At least EL 7.4 or older were affected.
>
> What is your OS and version?
>
> P.S.: I know your pain. I am still fighting to explain that without planned
> downtime, the end users will definitely get unplanned downtime.
Well it seems the only way to ensure no unplanned downtime on the long run ist
to test (and fix) it periodically, probably causing unplanned downtime
periodically, but (if things go well) with decreasing probability ;-)
>
> Best Regards,
> Strahil Nikolov
>
> На 29 юли 2020 г. 12:46:16 GMT+03:00, lkxjtu <lkxjtu at 163.com> написа:
>>Hi Reid Wahl,
>>
>>
>>There are more log informations below. The reason seems to be that
>>communication with DBUS timed out. Any suggestions?
>>
>>
>>1672712 Jul 24 21:20:17 [3945305] B0610011 lrmd: info:
>>pcmk_dbus_timeout_dispatch: Timeout 0x147bbd0 expired
>>1672713 Jul 24 21:20:17 [3945305] B0610011 lrmd: info:
>>pcmk_dbus_find_error: LoadUnit error
>>'org.freedesktop.DBus.Error.NoReply': Did not receive a reply.
>>Possible causes include: the remote application did not send a reply,
>>the message bus security policy blocked the reply, the reply timeout
>>expired, or the n etwork connection was broken.
>>1672714 Jul 24 21:20:17 [3945305] B0610011 lrmd: error:
>>systemd_loadunit_result: Unexepcted DBus type, expected o in 's'
>>instead of s
>>1672715 Jul 24 21:20:17 [3945305] B0610011 lrmd: error:
>>crm_abort: systemd_unit_exec_with_unit: Triggered fatal assert at
>>systemd.c:514 : unit
>>1672716 2020-07-24T21:20:17.701484+08:00 B0610011 lrmd[3945305]:
>>error: systemd_loadunit_result: Unexepcted DBus type, expected o in 's'
>>instead of s
>>1672717 2020-07-24T21:20:17.701517+08:00 B0610011 lrmd[3945305]:
>>error: crm_abort: systemd_unit_exec_with_unit: Triggered fatal assert
>>at systemd.c:514 : unit
>>1672718 Jul 24 21:20:17 [3945306] B0610011 crmd: error:
>>crm_ipc_read: Connection to lrmd failed
>>
>>
>>
>>> Hi,
>>>
>>> It looks like this is a bug that was fixed in later releases. The
>>`path`
>>> variable was a null pointer when it was passed to
>>> `systemd_unit_exec_with_unit` as the `unit` argument. Commit 62a0d26a
>>>
>><https://github.com/ClusterLabs/pacemaker/commit/62a0d26a8f85fbcee9b56524ea3f1
> ae0171cbe52#diff-00b989f66499e2081134c17c06d2b359R201>
>>> adds a null check to the `path` variable before using it to call
>>`systemd_unit_exec_with_unit`.
>>>
>>> I believe pacemaker-1.1.15-11.el7 is the first RHEL pacemaker release
>>that
>>> contains the fix. Can you upgrade and see if the issue is resolved?
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list