[ClusterLabs] Pending Fencing Actions shown in pcs status
Steffen Vinther Sørensen
svinther at gmail.com
Tue Jan 12 05:23:53 EST 2021
Hello Hideo.
I am overwhelmed by how serious this group is taking good care of issues.
For your information, the 'pending fencing action' status disappeared after
bringing the nodes offline, and during that I found some gfs2 errors that
were fixed by fsck.gfs2, and since then my cluster has been acting very
stable.
If I can provide more info let me know.
/Steffen
On Tue, Jan 12, 2021 at 3:45 AM <renayama19661014 at ybb.ne.jp> wrote:
> Hi Steffen,
>
> I've been experimenting with it since last weekend, but I haven't been
> able to reproduce the same situation.
> It seems that the cause is that the reproduction method cannot be limited.
>
> Can I attach a problem log?
>
> Best Regards,
> Hideo Yamauchi.
>
>
> ----- Original Message -----
> > From: Klaus Wenninger <kwenning at redhat.com>
> > To: Steffen Vinther Sørensen <svinther at gmail.com>; Cluster Labs - All
> topics related to open-source clustering welcomed <users at clusterlabs.org>
> > Cc:
> > Date: 2021/1/7, Thu 21:42
> > Subject: Re: [ClusterLabs] Pending Fencing Actions shown in pcs status
> >
> > On 1/7/21 1:13 PM, Steffen Vinther Sørensen wrote:
> >> Hi Klaus,
> >>
> >> Yes then the status does sync to the other nodes. Also it looks like
> >> there are some hostname resolving problems in play here, maybe causing
> >> problems, here is my notes from restarting pacemaker etc.
> > Don't think there are hostname resolving problems.
> > The messages you are seeing, that look as if, are caused
> > by using -EHOSTUNREACH as error-code to fail a pending
> > fence action when a node that is just coming up sees
> > a pending action that is claimed to be handled by himself.
> > Back then I chose that error-code as there was none
> > that really matched available right away and it was
> > urgent for some reason so that introduction of something
> > new was to risky at that state.
> > Probably would make sense to introduce something that
> > is more descriptive.
> > Back then the issue was triggered by fenced crashing and
> > being restarted - so not a node-restart but just fenced
> > restarting.
> > And it looks as if building the failed-message failed somehow.
> > So that could be the reason why the pending action persists.
> > Would be something else then what we solved with Bug 5401.
> > But what triggers the logs below might as well just be a
> > follow-up issue after the Bug 5401 thing.
> > Will try to find time for a deeper look later today.
> >
> > Klaus
> >>
> >> pcs cluster standby kvm03-node02.avigol-gcs.dk
> >> pcs cluster stop kvm03-node02.avigol-gcs.dk
> >> pcs status
> >>
> >> Pending Fencing Actions:
> >> * reboot of kvm03-node02.avigol-gcs.dk pending: client=crmd.37819,
> >> origin=kvm03-node03.avigol-gcs.dk
> >>
> >> # From logs on all 3 nodes:
> >> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: warning: received
> >> pending action we are supposed to be the owner but it's not in our
> >> records -> fail it
> >> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: error: Operation
> >> 'reboot' targeting kvm03-node02.avigol-gcs.dk on <no-one> for
> >> crmd.37819 at kvm03-node03.avigol-gcs.dk.56a3018c: No route to host
> >> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: error:
> >> stonith_construct_reply: Triggered assert at commands.c:2406 : request
> >> != NULL
> >> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: warning: Can't create
> >> a sane reply
> >> Jan 07 12:48:18 kvm03-node03 crmd[37819]: notice: Peer
> >> kvm03-node02.avigol-gcs.dk was not terminated (reboot) by <anyone> on
> >> behalf of crmd.37819: No route to host
> >>
> >> pcs cluster start kvm03-node02.avigol-gcs.dk
> >> pcs status (now outputs the same on all 3 nodes)
> >>
> >> Failed Fencing Actions:
> >> * reboot of kvm03-node02.avigol-gcs.dk failed: delegate=,
> >> client=crmd.37819, origin=kvm03-node03.avigol-gcs.dk,
> >> last-failed='Thu Jan 7 12:48:18 2021'
> >>
> >>
> >> pcs cluster unstandby kvm03-node02.avigol-gcs.dk
> >>
> >> # Now libvirtd refuses to start
> >>
> >> Jan 07 12:51:44 kvm03-node02 dnsmasq[20884]: read /etc/hosts - 8
> addresses
> >> Jan 07 12:51:44 kvm03-node02 dnsmasq[20884]: read
> >> /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
> >> Jan 07 12:51:44 kvm03-node02 dnsmasq-dhcp[20884]: read
> >> /var/lib/libvirt/dnsmasq/default.hostsfile
> >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
> >> 11:51:44.729+0000: 24160: info : libvirt version: 4.5.0, package:
> >> 36.el7_9.3 (CentOS BuildSystem <http://bugs.centos.org >,
> >> 2020-11-16-16:25:20, x86-01.bsys.centos.org)
> >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
> >> 11:51:44.729+0000: 24160: info : hostname: kvm03-node02
> >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
> >> 11:51:44.729+0000: 24160: error : qemuMonitorOpenUnix:392 : failed to
> >> connect to monitor socket: Connection refused
> >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
> >> 11:51:44.729+0000: 24159: error : qemuMonitorOpenUnix:392 : failed to
> >> connect to monitor socket: Connection refused
> >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
> >> 11:51:44.730+0000: 24161: error : qemuMonitorOpenUnix:392 : failed to
> >> connect to monitor socket: Connection refused
> >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
> >> 11:51:44.730+0000: 24162: error : qemuMonitorOpenUnix:392 : failed to
> >> connect to monitor socket: Connection refused
> >>
> >> pcs status
> >>
> >> Failed Resource Actions:
> >> * libvirtd_start_0 on kvm03-node02.avigol-gcs.dk 'unknown error'
> > (1):
> >> call=142, status=complete, exitreason='',
> >> last-rc-change='Thu Jan 7 12:51:44 2021', queued=0ms,
> > exec=2157ms
> >>
> >> Failed Fencing Actions:
> >> * reboot of kvm03-node02.avigol-gcs.dk failed: delegate=,
> >> client=crmd.37819, origin=kvm03-node03.avigol-gcs.dk,
> >> last-failed='Thu Jan 7 12:48:18 2021'
> >>
> >>
> >> # from /etc/hosts on all 3 nodes:
> >>
> >> 172.31.0.31 kvm03-node01 kvm03-node01.avigol-gcs.dk
> >> 172.31.0.32 kvm03-node02 kvm03-node02.avigol-gcs.dk
> >> 172.31.0.33 kvm03-node03 kvm03-node03.avigol-gcs.dk
> >>
> >> On Thu, Jan 7, 2021 at 11:15 AM Klaus Wenninger <kwenning at redhat.com>
> > wrote:
> >>> Hi Steffen,
> >>>
> >>> If you just see the leftover pending-action on one node
> >>> it would be interesting if restarting of pacemaker on
> >>> one of the other nodes does sync it to all of the
> >>> nodes.
> >>>
> >>> Regards,
> >>> Klaus
> >>>
> >>> On 1/7/21 9:54 AM, renayama19661014 at ybb.ne.jp wrote:
> >>>> Hi Steffen,
> >>>>
> >>>>> Unfortunately not sure about the exact scenario. But I have
> > been doing
> >>>>> some recent experiments with node standby/unstandby stop/start.
> > This
> >>>>> to get procedures right for updating node rpms etc.
> >>>>>
> >>>>> Later I noticed the uncomforting "pending fencing
> > actions" status msg.
> >>>> Okay!
> >>>>
> >>>> Repeat the standby and unstandby steps in the same way to check.
> >>>> We will start checking after tomorrow, so I think it will take some
> > time until next week.
> >>>>
> >>>>
> >>>> Many thanks,
> >>>> Hideo Yamauchi.
> >>>>
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>>> From: "renayama19661014 at ybb.ne.jp"
> > <renayama19661014 at ybb.ne.jp>
> >>>>> To: Reid Wahl <nwahl at redhat.com>; Cluster Labs - All
> > topics related to open-source clustering welcomed <users at clusterlabs.org
> >
> >>>>> Cc:
> >>>>> Date: 2021/1/7, Thu 17:51
> >>>>> Subject: Re: [ClusterLabs] Pending Fencing Actions shown in pcs
> > status
> >>>>>
> >>>>> Hi Steffen,
> >>>>> Hi Reid,
> >>>>>
> >>>>> The fencing history is kept inside stonith-ng and is not
> > written to cib.
> >>>>> However, getting the entire cib and getting it sent will help
> > you to reproduce
> >>>>> the problem.
> >>>>>
> >>>>> Best Regards,
> >>>>> Hideo Yamauchi.
> >>>>>
> >>>>>
> >>>>> ----- Original Message -----
> >>>>>> From: Reid Wahl <nwahl at redhat.com>
> >>>>>> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics
> > related to
> >>>>> open-source clustering welcomed <users at clusterlabs.org>
> >>>>>> Date: 2021/1/7, Thu 17:39
> >>>>>> Subject: Re: [ClusterLabs] Pending Fencing Actions shown in
> > pcs status
> >>>>>>
> >>>>>>
> >>>>>> Hi, Steffen. Those attachments don't contain the CIB.
> > They contain the
> >>>>> `pcs config` output. You can get the cib with `pcs cluster cib
> >>
> >>>>> $(hostname).cib.xml`.
> >>>>>> Granted, it's possible that this fence action
> > information wouldn't
> >>>>> be in the CIB at all. It might be stored in fencer memory.
> >>>>>> On Thu, Jan 7, 2021 at 12:26 AM
> > <renayama19661014 at ybb.ne.jp> wrote:
> >>>>>>
> >>>>>> Hi Steffen,
> >>>>>>>> Here CIB settings attached (pcs config show) for
> > all 3 of my nodes
> >>>>>>>> (all 3 seems 100% identical), node03 is the DC.
> >>>>>>> Thank you for the attachment.
> >>>>>>>
> >>>>>>> What is the scenario when this situation occurs?
> >>>>>>> In what steps did the problem appear when fencing was
> > performed (or
> >>>>> failed)?
> >>>>>>> Best Regards,
> >>>>>>> Hideo Yamauchi.
> >>>>>>>
> >>>>>>>
> >>>>>>> ----- Original Message -----
> >>>>>>>> From: Steffen Vinther Sørensen
> > <svinther at gmail.com>
> >>>>>>>> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All
> > topics related
> >>>>> to open-source clustering welcomed
> > <users at clusterlabs.org>
> >>>>>>>> Cc:
> >>>>>>>> Date: 2021/1/7, Thu 17:05
> >>>>>>>> Subject: Re: [ClusterLabs] Pending Fencing Actions
> > shown in pcs
> >>>>> status
> >>>>>>>> Hi Hideo,
> >>>>>>>>
> >>>>>>>> Here CIB settings attached (pcs config show) for
> > all 3 of my nodes
> >>>>>>>> (all 3 seems 100% identical), node03 is the DC.
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> Steffen
> >>>>>>>>
> >>>>>>>> On Thu, Jan 7, 2021 at 8:06 AM
> > <renayama19661014 at ybb.ne.jp>
> >>>>> wrote:
> >>>>>>>>> Hi Steffen,
> >>>>>>>>> Hi Reid,
> >>>>>>>>>
> >>>>>>>>> I also checked the Centos source rpm and it
> > seems to include a
> >>>>> fix for the
> >>>>>>>> problem.
> >>>>>>>>> As Steffen suggested, if you share your CIB
> > settings, I might
> >>>>> know
> >>>>>>>> something.
> >>>>>>>>> If this issue is the same as the fix, the
> > display will only be
> >>>>> displayed on
> >>>>>>>> the DC node and will not affect the operation.
> >>>>>>>>> The pending actions shown will remain for a
> > long time, but
> >>>>> will not have a
> >>>>>>>> negative impact on the cluster.
> >>>>>>>>> Best Regards,
> >>>>>>>>> Hideo Yamauchi.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ----- Original Message -----
> >>>>>>>>> > From: Reid Wahl <nwahl at redhat.com>
> >>>>>>>>> > To: Cluster Labs - All topics related to
> > open-source
> >>>>> clustering
> >>>>>>>> welcomed <users at clusterlabs.org>
> >>>>>>>>> > Cc:
> >>>>>>>>> > Date: 2021/1/7, Thu 15:58
> >>>>>>>>> > Subject: Re: [ClusterLabs] Pending
> > Fencing Actions shown
> >>>>> in pcs status
> >>>>>>>>> >
> >>>>>>>>> > It's supposedly fixed in that
> > version.
> >>>>>>>>> > -
> > https://bugzilla.redhat.com/show_bug.cgi?id=1787749
> >>>>>>>>> > -
> > https://access.redhat.com/solutions/4713471
> >>>>>>>>> >
> >>>>>>>>> > So you may be hitting a different issue
> > (unless
> >>>>> there's a bug in
> >>>>>>>> the
> >>>>>>>>> > pcmk 1.1 backport of the fix).
> >>>>>>>>> >
> >>>>>>>>> > I may be a little bit out of my area of
> > knowledge here,
> >>>>> but can you
> >>>>>>>>> > share the CIBs from nodes 1 and 3? Maybe
> > Hideo, Klaus, or
> >>>>> Ken has some
> >>>>>>>>> > insight.
> >>>>>>>>> >
> >>>>>>>>> > On Wed, Jan 6, 2021 at 10:53 PM Steffen
> > Vinther Sørensen
> >>>>>>>>> > <svinther at gmail.com> wrote:
> >>>>>>>>> >>
> >>>>>>>>> >> Hi Hideo,
> >>>>>>>>> >>
> >>>>>>>>> >> If the fix is not going to make it
> > into the CentOS7
> >>>>> pacemaker
> >>>>>>>> version,
> >>>>>>>>> >> I guess the stable approach to take
> > advantage of it
> >>>>> is to build
> >>>>>>>> the
> >>>>>>>>> >> cluster on another OS than CentOS7
> > ? A little late
> >>>>> for that in
> >>>>>>>> this
> >>>>>>>>> >> case though :)
> >>>>>>>>> >>
> >>>>>>>>> >> Regards
> >>>>>>>>> >> Steffen
> >>>>>>>>> >>
> >>>>>>>>> >>
> >>>>>>>>> >>
> >>>>>>>>> >>
> >>>>>>>>> >> On Thu, Jan 7, 2021 at 7:27 AM
> >>>>> <renayama19661014 at ybb.ne.jp>
> >>>>>>>> wrote:
> >>>>>>>>> >> >
> >>>>>>>>> >> > Hi Steffen,
> >>>>>>>>> >> >
> >>>>>>>>> >> > The fix pointed out by Reid is
> > affecting it.
> >>>>>>>>> >> >
> >>>>>>>>> >> > Since the fencing action
> > requested by the DC
> >>>>> node exists
> >>>>>>>> only in the
> >>>>>>>>> > DC node, such an event occurs.
> >>>>>>>>> >> > You will need to take
> > advantage of the modified
> >>>>> pacemaker to
> >>>>>>>> resolve
> >>>>>>>>> > the issue.
> >>>>>>>>> >> >
> >>>>>>>>> >> > Best Regards,
> >>>>>>>>> >> > Hideo Yamauchi.
> >>>>>>>>> >> >
> >>>>>>>>> >> >
> >>>>>>>>> >> >
> >>>>>>>>> >> > ----- Original Message -----
> >>>>>>>>> >> > > From: Reid Wahl
> > <nwahl at redhat.com>
> >>>>>>>>> >> > > To: Cluster Labs - All
> > topics related to
> >>>>> open-source
> >>>>>>>> clustering
> >>>>>>>>> > welcomed <users at clusterlabs.org>
> >>>>>>>>> >> > > Cc:
> >>>>>>>>> >> > > Date: 2021/1/7, Thu 15:07
> >>>>>>>>> >> > > Subject: Re:
> > [ClusterLabs] Pending Fencing
> >>>>> Actions
> >>>>>>>> shown in pcs
> >>>>>>>>> > status
> >>>>>>>>> >> > >
> >>>>>>>>> >> > > Hi, Steffen. Are your
> > cluster nodes all
> >>>>> running the
> >>>>>>>> same
> >>>>>>>>> > Pacemaker
> >>>>>>>>> >> > > versions? This looks like
> > Bug 5401[1],
> >>>>> which is fixed
> >>>>>>>> by upstream
> >>>>>>>>> >> > > commit df71a07[2].
> > I'm a little bit
> >>>>> confused about
> >>>>>>>> why it
> >>>>>>>>> > only shows
> >>>>>>>>> >> > > up on one out of three
> > nodes though.
> >>>>>>>>> >> > >
> >>>>>>>>> >> > > [1]
> >>>>> https://bugs.clusterlabs.org/show_bug.cgi?id=5401
> >>>>>>>>> >> > > [2]
> >>>>>>>>
> > https://github.com/ClusterLabs/pacemaker/commit/df71a07
> >>>>>>>>> >> > >
> >>>>>>>>> >> > > On Tue, Jan 5, 2021 at
> > 8:31 AM Steffen
> >>>>> Vinther Sørensen
> >>>>>>>>> >> > >
> > <svinther at gmail.com> wrote:
> >>>>>>>>> >> > >>
> >>>>>>>>> >> > >> Hello
> >>>>>>>>> >> > >>
> >>>>>>>>> >> > >> node 1 is showing
> > this in 'pcs
> >>>>> status'
> >>>>>>>>> >> > >>
> >>>>>>>>> >> > >> Pending Fencing
> > Actions:
> >>>>>>>>> >> > >> * reboot of
> >>>>> kvm03-node02.avigol-gcs.dk pending:
> >>>>>>>>> > client=crmd.37819,
> >>>>>>>>> >> > >>
> > origin=kvm03-node03.avigol-gcs.dk
> >>>>>>>>> >> > >>
> >>>>>>>>> >> > >> node 2 and node 3
> > outputs no such
> >>>>> thing (node 3 is
> >>>>>>>> DC)
> >>>>>>>>> >> > >>
> >>>>>>>>> >> > >> Google is not much
> > help, how to
> >>>>> investigate this
> >>>>>>>> further and
> >>>>>>>>> > get rid
> >>>>>>>>> >> > >> of such terrifying
> > status message ?
> >>>>>>>>> >> > >>
> >>>>>>>>> >> > >> Regards
> >>>>>>>>> >> > >> Steffen
> >>>>>>>>> >> > >>
> >>>>> _______________________________________________
> >>>>>>>>> >> > >> Manage your
> > subscription:
> >>>>>>>>> >> > >>
> >>>>>>>>
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>>>>>> >> > >>
> >>>>>>>>> >> > >> ClusterLabs home:
> >>>>> https://www.clusterlabs.org/
> >>>>>>>>> >> > >>
> >>>>>>>>> >> > >
> >>>>>>>>> >> > >
> >>>>>>>>> >> > > --
> >>>>>>>>> >> > > Regards,
> >>>>>>>>> >> > >
> >>>>>>>>> >> > > Reid Wahl, RHCA
> >>>>>>>>> >> > > Senior Software
> > Maintenance Engineer, Red
> >>>>> Hat
> >>>>>>>>> >> > > CEE - Platform Support
> > Delivery -
> >>>>> ClusterHA
> >>>>>>>>> >> > >
> >>>>>>>>> >> > >
> >>>>> _______________________________________________
> >>>>>>>>> >> > > Manage your subscription:
> >>>>>>>>> >> > >
> >>>>> https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>>>>>> >> > >
> >>>>>>>>> >> > > ClusterLabs home:
> >>>>> https://www.clusterlabs.org/
> >>>>>>>>> >> > >
> >>>>>>>>> >> >
> >>>>>>>>> >> >
> > _______________________________________________
> >>>>>>>>> >> > Manage your subscription:
> >>>>>>>>> >> >
> >>>>> https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>>>>>> >> >
> >>>>>>>>> >> > ClusterLabs home:
> > https://www.clusterlabs.org/
> >>>>>>>>> >>
> > _______________________________________________
> >>>>>>>>> >> Manage your subscription:
> >>>>>>>>> >>
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>>>>>> >>
> >>>>>>>>> >> ClusterLabs home:
> > https://www.clusterlabs.org/
> >>>>>>>>> >
> >>>>>>>>> >
> >>>>>>>>> >
> >>>>>>>>> > --
> >>>>>>>>> > Regards,
> >>>>>>>>> >
> >>>>>>>>> > Reid Wahl, RHCA
> >>>>>>>>> > Senior Software Maintenance Engineer,
> > Red Hat
> >>>>>>>>> > CEE - Platform Support Delivery -
> > ClusterHA
> >>>>>>>>> >
> >>>>>>>>> >
> > _______________________________________________
> >>>>>>>>> > Manage your subscription:
> >>>>>>>>> >
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>>>>>> >
> >>>>>>>>> > ClusterLabs home:
> > https://www.clusterlabs.org/
> >>>>>>>>> >
> >>>>>>>>>
> >>>>>>>>>
> > _______________________________________________
> >>>>>>>>> Manage your subscription:
> >>>>>>>>>
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>>>>>>
> >>>>>>>>> ClusterLabs home:
> > https://www.clusterlabs.org/
> >>>>>>> _______________________________________________
> >>>>>>> Manage your subscription:
> >>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>>>>
> >>>>>>> ClusterLabs home: https://www.clusterlabs.org/
> >>>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Reid Wahl, RHCA
> >>>>>>
> >>>>>> Senior Software Maintenance Engineer, Red Hat
> >>>>>> CEE - Platform Support Delivery - ClusterHA
> >>>>>>
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Manage your subscription:
> >>>>> https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>>
> >>>>> ClusterLabs home: https://www.clusterlabs.org/
> >>>>>
> >>>> _______________________________________________
> >>>> Manage your subscription:
> >>>> https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>> ClusterLabs home: https://www.clusterlabs.org/
> >>> _______________________________________________
> >>> Manage your subscription:
> >>> https://lists.clusterlabs.org/mailman/listinfo/users
> >>>
> >>> ClusterLabs home: https://www.clusterlabs.org/
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210112/98ceab32/attachment-0001.htm>
More information about the Users
mailing list