[ClusterLabs] Antw: [EXT] Re: Q: wrong "unexpected shutdown of DC" detected

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Jan 28 05:23:22 EST 2021


Ken,

thanks for analyzing the logs! See comments inline...

>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 27.01.2021 um 19:55 in
Nachricht
<644fc719a2e8870c332db859bcdef275d986249a.camel at redhat.com>:
> On Wed, 2021‑01‑27 at 12:36 +0100, Ulrich Windl wrote:
...
>> Jan 27 10:43:48 h16 pacemaker‑execd[25960]:  warning:
>> prm_CFS_VMI_stop_0[11502] timed out after 90000ms
>> Jan 27 10:43:48 h16 pacemaker‑execd[25960]:  notice: prm_CFS_VMI stop
>> (call 129, PID 11502) exited with status 1 (execution time 90007ms,
>> queue time 0ms)
>> Jan 27 10:43:48 h16 pacemaker‑controld[25963]:  error: Result of stop
>> operation for prm_CFS_VMI on h16: Timed Out
> 
> This stop timeout is why h16 correctly needs to be fenced. The only
> question is why the stop timed out.

The resouirce is OCFS2, needing DLM. DLM in turn wants a quorum, right?
So: No quorum, no action -> timeout. Is that right?

...
>> Finally: ;‑)
>> 
>> Jan 27 11:35:14 h19 pacemaker‑fenced[2099]:  notice: Versions did not
>> change in patch 0.250.39
>> Jan 27 11:36:43 h19 pacemaker‑fenced[2099]:  notice: Operation
>> 'reboot' targeting h18 on h16 for 
>> pacemaker‑controld.7467 at h16.46c6f6cc: OK
>> Jan 27 11:36:43 h19 pacemaker‑fenced[2099]:  error:
>> stonith_construct_reply: Triggered assert at fenced_commands.c:2363 :
>> request != NULL

You did not comment on that; is that expected behavior? ;-)

>> Jan 27 11:36:43 h19 pacemaker‑fenced[2099]:  warning: Can't create a
>> sane reply
>> 
>> Regards,
>> Ulrich
> ‑‑ 
> Ken Gaillot <kgaillot at redhat.com>
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list