[ClusterLabs] Antw: [EXT] Re: Q: wrong "unexpected shutdown of DC" detected
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Thu Jan 28 05:23:22 EST 2021
Ken,
thanks for analyzing the logs! See comments inline...
>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 27.01.2021 um 19:55 in
Nachricht
<644fc719a2e8870c332db859bcdef275d986249a.camel at redhat.com>:
> On Wed, 2021‑01‑27 at 12:36 +0100, Ulrich Windl wrote:
...
>> Jan 27 10:43:48 h16 pacemaker‑execd[25960]: warning:
>> prm_CFS_VMI_stop_0[11502] timed out after 90000ms
>> Jan 27 10:43:48 h16 pacemaker‑execd[25960]: notice: prm_CFS_VMI stop
>> (call 129, PID 11502) exited with status 1 (execution time 90007ms,
>> queue time 0ms)
>> Jan 27 10:43:48 h16 pacemaker‑controld[25963]: error: Result of stop
>> operation for prm_CFS_VMI on h16: Timed Out
>
> This stop timeout is why h16 correctly needs to be fenced. The only
> question is why the stop timed out.
The resouirce is OCFS2, needing DLM. DLM in turn wants a quorum, right?
So: No quorum, no action -> timeout. Is that right?
...
>> Finally: ;‑)
>>
>> Jan 27 11:35:14 h19 pacemaker‑fenced[2099]: notice: Versions did not
>> change in patch 0.250.39
>> Jan 27 11:36:43 h19 pacemaker‑fenced[2099]: notice: Operation
>> 'reboot' targeting h18 on h16 for
>> pacemaker‑controld.7467 at h16.46c6f6cc: OK
>> Jan 27 11:36:43 h19 pacemaker‑fenced[2099]: error:
>> stonith_construct_reply: Triggered assert at fenced_commands.c:2363 :
>> request != NULL
You did not comment on that; is that expected behavior? ;-)
>> Jan 27 11:36:43 h19 pacemaker‑fenced[2099]: warning: Can't create a
>> sane reply
>>
>> Regards,
>> Ulrich
> ‑‑
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list