[ClusterLabs] Antw: [EXT] Re: Q: wrong "unexpected shutdown of DC" detected
Ken Gaillot
kgaillot at redhat.com
Thu Jan 28 10:03:18 EST 2021
On Thu, 2021-01-28 at 11:23 +0100, Ulrich Windl wrote:
> Ken,
>
> thanks for analyzing the logs! See comments inline...
>
> > > > Ken Gaillot <kgaillot at redhat.com> schrieb am 27.01.2021 um
> > > > 19:55 in
>
> Nachricht
> <644fc719a2e8870c332db859bcdef275d986249a.camel at redhat.com>:
> > On Wed, 2021‑01‑27 at 12:36 +0100, Ulrich Windl wrote:
>
> ...
> > > Jan 27 10:43:48 h16 pacemaker‑execd[25960]: warning:
> > > prm_CFS_VMI_stop_0[11502] timed out after 90000ms
> > > Jan 27 10:43:48 h16 pacemaker‑execd[25960]: notice: prm_CFS_VMI
> > > stop
> > > (call 129, PID 11502) exited with status 1 (execution time
> > > 90007ms,
> > > queue time 0ms)
> > > Jan 27 10:43:48 h16 pacemaker‑controld[25963]: error: Result of
> > > stop
> > > operation for prm_CFS_VMI on h16: Timed Out
> >
> > This stop timeout is why h16 correctly needs to be fenced. The only
> > question is why the stop timed out.
>
> The resouirce is OCFS2, needing DLM. DLM in turn wants a quorum,
> right?
> So: No quorum, no action -> timeout. Is that right?
>
> ...
> > > Finally: ;‑)
> > >
> > > Jan 27 11:35:14 h19 pacemaker‑fenced[2099]: notice: Versions did
> > > not
> > > change in patch 0.250.39
> > > Jan 27 11:36:43 h19 pacemaker‑fenced[2099]: notice: Operation
> > > 'reboot' targeting h18 on h16 for
> > > pacemaker‑controld.7467 at h16.46c6f6cc: OK
> > > Jan 27 11:36:43 h19 pacemaker‑fenced[2099]: error:
> > > stonith_construct_reply: Triggered assert at
> > > fenced_commands.c:2363 :
> > > request != NULL
>
> You did not comment on that; is that expected behavior? ;-)
Sort of ;)
This was changed to a more reasonable log warning in the 2.0.5 release:
Missing request information for client notifications for operation
with result <N> (initiated before we came up?)
It can happen (and is perfectly OK) when a node is coming up while some
fencing operation is already in-flight. Ideally we'd synchronize in-
flight operation information when a node comes up, but it wouldn't
really change anything, it would just allow us to tell that situation
from an actual error when this message comes up.
> > > Jan 27 11:36:43 h19 pacemaker‑fenced[2099]: warning: Can't
> > > create a
> > > sane reply
> > >
> > > Regards,
> > > Ulrich
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list