[ClusterLabs] Pending Fencing Actions shown in pcs status

Steffen Vinther Sørensen svinther at gmail.com
Thu Jan 7 07:13:43 EST 2021


Hi Klaus,

Yes then the status does sync to the other nodes. Also it looks like
there are some hostname resolving problems in play here, maybe causing
problems,  here is my notes from restarting pacemaker etc.


pcs cluster standby kvm03-node02.avigol-gcs.dk
pcs cluster stop kvm03-node02.avigol-gcs.dk
pcs status

Pending Fencing Actions:
* reboot of kvm03-node02.avigol-gcs.dk pending: client=crmd.37819,
origin=kvm03-node03.avigol-gcs.dk

# From logs on all 3 nodes:
Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]:  warning: received
pending action we are supposed to be the owner but it's not in our
records -> fail it
Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]:    error: Operation
'reboot' targeting kvm03-node02.avigol-gcs.dk on <no-one> for
crmd.37819 at kvm03-node03.avigol-gcs.dk.56a3018c: No route to host
Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]:    error:
stonith_construct_reply: Triggered assert at commands.c:2406 : request
!= NULL
Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]:  warning: Can't create
a sane reply
Jan 07 12:48:18 kvm03-node03 crmd[37819]:   notice: Peer
kvm03-node02.avigol-gcs.dk was not terminated (reboot) by <anyone> on
behalf of crmd.37819: No route to host

pcs cluster start kvm03-node02.avigol-gcs.dk
pcs status (now outputs the same on all 3 nodes)

Failed Fencing Actions:
* reboot of kvm03-node02.avigol-gcs.dk failed: delegate=,
client=crmd.37819, origin=kvm03-node03.avigol-gcs.dk,
    last-failed='Thu Jan  7 12:48:18 2021'


pcs cluster unstandby kvm03-node02.avigol-gcs.dk

# Now libvirtd refuses to start

Jan 07 12:51:44 kvm03-node02 dnsmasq[20884]: read /etc/hosts - 8 addresses
Jan 07 12:51:44 kvm03-node02 dnsmasq[20884]: read
/var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Jan 07 12:51:44 kvm03-node02 dnsmasq-dhcp[20884]: read
/var/lib/libvirt/dnsmasq/default.hostsfile
Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
11:51:44.729+0000: 24160: info : libvirt version: 4.5.0, package:
36.el7_9.3 (CentOS BuildSystem <http://bugs.centos.org>,
2020-11-16-16:25:20, x86-01.bsys.centos.org)
Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
11:51:44.729+0000: 24160: info : hostname: kvm03-node02
Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
11:51:44.729+0000: 24160: error : qemuMonitorOpenUnix:392 : failed to
connect to monitor socket: Connection refused
Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
11:51:44.729+0000: 24159: error : qemuMonitorOpenUnix:392 : failed to
connect to monitor socket: Connection refused
Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
11:51:44.730+0000: 24161: error : qemuMonitorOpenUnix:392 : failed to
connect to monitor socket: Connection refused
Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
11:51:44.730+0000: 24162: error : qemuMonitorOpenUnix:392 : failed to
connect to monitor socket: Connection refused

pcs status

Failed Resource Actions:
* libvirtd_start_0 on kvm03-node02.avigol-gcs.dk 'unknown error' (1):
call=142, status=complete, exitreason='',
    last-rc-change='Thu Jan  7 12:51:44 2021', queued=0ms, exec=2157ms

Failed Fencing Actions:
* reboot of kvm03-node02.avigol-gcs.dk failed: delegate=,
client=crmd.37819, origin=kvm03-node03.avigol-gcs.dk,
    last-failed='Thu Jan  7 12:48:18 2021'


# from /etc/hosts on all 3 nodes:

172.31.0.31    kvm03-node01 kvm03-node01.avigol-gcs.dk
172.31.0.32    kvm03-node02 kvm03-node02.avigol-gcs.dk
172.31.0.33    kvm03-node03 kvm03-node03.avigol-gcs.dk

On Thu, Jan 7, 2021 at 11:15 AM Klaus Wenninger <kwenning at redhat.com> wrote:
>
> Hi Steffen,
>
> If you just see the leftover pending-action on one node
> it would be interesting if restarting of pacemaker on
> one of the other nodes does sync it to all of the
> nodes.
>
> Regards,
> Klaus
>
> On 1/7/21 9:54 AM, renayama19661014 at ybb.ne.jp wrote:
> > Hi Steffen,
> >
> >> Unfortunately not sure about the exact scenario. But I have been doing
> >> some recent experiments with node standby/unstandby stop/start. This
> >> to get procedures right for updating node rpms etc.
> >>
> >> Later I noticed the uncomforting "pending fencing actions" status msg.
> > Okay!
> >
> > Repeat the standby and unstandby steps in the same way to check.
> > We will start checking after tomorrow, so I think it will take some time until next week.
> >
> >
> > Many thanks,
> > Hideo Yamauchi.
> >
> >
> >
> > ----- Original Message -----
> >> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
> >> To: Reid Wahl <nwahl at redhat.com>; Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> >> Cc:
> >> Date: 2021/1/7, Thu 17:51
> >> Subject: Re: [ClusterLabs] Pending Fencing Actions shown in pcs status
> >>
> >> Hi Steffen,
> >> Hi Reid,
> >>
> >> The fencing history is kept inside stonith-ng and is not written to cib.
> >> However, getting the entire cib and getting it sent will help you to reproduce
> >> the problem.
> >>
> >> Best Regards,
> >> Hideo Yamauchi.
> >>
> >>
> >> ----- Original Message -----
> >>> From: Reid Wahl <nwahl at redhat.com>
> >>> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to
> >> open-source clustering welcomed <users at clusterlabs.org>
> >>> Date: 2021/1/7, Thu 17:39
> >>> Subject: Re: [ClusterLabs] Pending Fencing Actions shown in pcs status
> >>>
> >>>
> >>> Hi, Steffen. Those attachments don't contain the CIB. They contain the
> >> `pcs config` output. You can get the cib with `pcs cluster cib >
> >> $(hostname).cib.xml`.
> >>>
> >>> Granted, it's possible that this fence action information wouldn't
> >> be in the CIB at all. It might be stored in fencer memory.
> >>>
> >>> On Thu, Jan 7, 2021 at 12:26 AM <renayama19661014 at ybb.ne.jp> wrote:
> >>>
> >>> Hi Steffen,
> >>>>>  Here CIB settings attached (pcs config show) for all 3 of my nodes
> >>>>>  (all 3 seems 100% identical), node03 is the DC.
> >>>>
> >>>> Thank you for the attachment.
> >>>>
> >>>> What is the scenario when this situation occurs?
> >>>> In what steps did the problem appear when fencing was performed (or
> >> failed)?
> >>>>
> >>>> Best Regards,
> >>>> Hideo Yamauchi.
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>>>  From: Steffen Vinther Sørensen <svinther at gmail.com>
> >>>>>  To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related
> >> to open-source clustering welcomed <users at clusterlabs.org>
> >>>>>  Cc:
> >>>>>  Date: 2021/1/7, Thu 17:05
> >>>>>  Subject: Re: [ClusterLabs] Pending Fencing Actions shown in pcs
> >> status
> >>>>>  Hi Hideo,
> >>>>>
> >>>>>  Here CIB settings attached (pcs config show) for all 3 of my nodes
> >>>>>  (all 3 seems 100% identical), node03 is the DC.
> >>>>>
> >>>>>  Regards
> >>>>>  Steffen
> >>>>>
> >>>>>  On Thu, Jan 7, 2021 at 8:06 AM <renayama19661014 at ybb.ne.jp>
> >> wrote:
> >>>>>>   Hi Steffen,
> >>>>>>   Hi Reid,
> >>>>>>
> >>>>>>   I also checked the Centos source rpm and it seems to include a
> >> fix for the
> >>>>>  problem.
> >>>>>>   As Steffen suggested, if you share your CIB settings, I might
> >> know
> >>>>>  something.
> >>>>>>   If this issue is the same as the fix, the display will only be
> >> displayed on
> >>>>>  the DC node and will not affect the operation.
> >>>>>>   The pending actions shown will remain for a long time, but
> >> will not have a
> >>>>>  negative impact on the cluster.
> >>>>>>   Best Regards,
> >>>>>>   Hideo Yamauchi.
> >>>>>>
> >>>>>>
> >>>>>>   ----- Original Message -----
> >>>>>>   > From: Reid Wahl <nwahl at redhat.com>
> >>>>>>   > To: Cluster Labs - All topics related to open-source
> >> clustering
> >>>>>  welcomed <users at clusterlabs.org>
> >>>>>>   > Cc:
> >>>>>>   > Date: 2021/1/7, Thu 15:58
> >>>>>>   > Subject: Re: [ClusterLabs] Pending Fencing Actions shown
> >> in pcs status
> >>>>>>   >
> >>>>>>   > It's supposedly fixed in that version.
> >>>>>>   >   - https://bugzilla.redhat.com/show_bug.cgi?id=1787749
> >>>>>>   >   - https://access.redhat.com/solutions/4713471
> >>>>>>   >
> >>>>>>   > So you may be hitting a different issue (unless
> >> there's a bug in
> >>>>>  the
> >>>>>>   > pcmk 1.1 backport of the fix).
> >>>>>>   >
> >>>>>>   > I may be a little bit out of my area of knowledge here,
> >> but can you
> >>>>>>   > share the CIBs from nodes 1 and 3? Maybe Hideo, Klaus, or
> >> Ken has some
> >>>>>>   > insight.
> >>>>>>   >
> >>>>>>   > On Wed, Jan 6, 2021 at 10:53 PM Steffen Vinther Sørensen
> >>>>>>   > <svinther at gmail.com> wrote:
> >>>>>>   >>
> >>>>>>   >>  Hi Hideo,
> >>>>>>   >>
> >>>>>>   >>  If the fix is not going to make it into the CentOS7
> >> pacemaker
> >>>>>  version,
> >>>>>>   >>  I guess the stable approach to take advantage of it
> >> is to build
> >>>>>  the
> >>>>>>   >>  cluster on another OS than CentOS7 ? A little late
> >> for that in
> >>>>>  this
> >>>>>>   >>  case though :)
> >>>>>>   >>
> >>>>>>   >>  Regards
> >>>>>>   >>  Steffen
> >>>>>>   >>
> >>>>>>   >>
> >>>>>>   >>
> >>>>>>   >>
> >>>>>>   >>  On Thu, Jan 7, 2021 at 7:27 AM
> >> <renayama19661014 at ybb.ne.jp>
> >>>>>  wrote:
> >>>>>>   >>  >
> >>>>>>   >>  > Hi Steffen,
> >>>>>>   >>  >
> >>>>>>   >>  > The fix pointed out by Reid is affecting it.
> >>>>>>   >>  >
> >>>>>>   >>  > Since the fencing action requested by the DC
> >> node exists
> >>>>>  only in the
> >>>>>>   > DC node, such an event occurs.
> >>>>>>   >>  > You will need to take advantage of the modified
> >> pacemaker to
> >>>>>  resolve
> >>>>>>   > the issue.
> >>>>>>   >>  >
> >>>>>>   >>  > Best Regards,
> >>>>>>   >>  > Hideo Yamauchi.
> >>>>>>   >>  >
> >>>>>>   >>  >
> >>>>>>   >>  >
> >>>>>>   >>  > ----- Original Message -----
> >>>>>>   >>  > > From: Reid Wahl <nwahl at redhat.com>
> >>>>>>   >>  > > To: Cluster Labs - All topics related to
> >> open-source
> >>>>>  clustering
> >>>>>>   > welcomed <users at clusterlabs.org>
> >>>>>>   >>  > > Cc:
> >>>>>>   >>  > > Date: 2021/1/7, Thu 15:07
> >>>>>>   >>  > > Subject: Re: [ClusterLabs] Pending Fencing
> >> Actions
> >>>>>  shown in pcs
> >>>>>>   > status
> >>>>>>   >>  > >
> >>>>>>   >>  > > Hi, Steffen. Are your cluster nodes all
> >> running the
> >>>>>  same
> >>>>>>   > Pacemaker
> >>>>>>   >>  > > versions? This looks like Bug 5401[1],
> >> which is fixed
> >>>>>  by upstream
> >>>>>>   >>  > > commit df71a07[2]. I'm a little bit
> >> confused about
> >>>>>  why it
> >>>>>>   > only shows
> >>>>>>   >>  > > up on one out of three nodes though.
> >>>>>>   >>  > >
> >>>>>>   >>  > > [1]
> >> https://bugs.clusterlabs.org/show_bug.cgi?id=5401
> >>>>>>   >>  > > [2]
> >>>>>  https://github.com/ClusterLabs/pacemaker/commit/df71a07
> >>>>>>   >>  > >
> >>>>>>   >>  > > On Tue, Jan 5, 2021 at 8:31 AM Steffen
> >> Vinther Sørensen
> >>>>>>   >>  > > <svinther at gmail.com> wrote:
> >>>>>>   >>  > >>
> >>>>>>   >>  > >>  Hello
> >>>>>>   >>  > >>
> >>>>>>   >>  > >>  node 1 is showing this in 'pcs
> >> status'
> >>>>>>   >>  > >>
> >>>>>>   >>  > >>  Pending Fencing Actions:
> >>>>>>   >>  > >>  * reboot of
> >> kvm03-node02.avigol-gcs.dk pending:
> >>>>>>   > client=crmd.37819,
> >>>>>>   >>  > >>  origin=kvm03-node03.avigol-gcs.dk
> >>>>>>   >>  > >>
> >>>>>>   >>  > >>  node 2 and node 3 outputs no such
> >> thing (node 3 is
> >>>>>  DC)
> >>>>>>   >>  > >>
> >>>>>>   >>  > >>  Google is not much help, how to
> >> investigate this
> >>>>>  further and
> >>>>>>   > get rid
> >>>>>>   >>  > >>  of such terrifying status message ?
> >>>>>>   >>  > >>
> >>>>>>   >>  > >>  Regards
> >>>>>>   >>  > >>  Steffen
> >>>>>>   >>  > >>
> >> _______________________________________________
> >>>>>>   >>  > >>  Manage your subscription:
> >>>>>>   >>  > >>
> >>>>>  https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>>>   >>  > >>
> >>>>>>   >>  > >>  ClusterLabs home:
> >> https://www.clusterlabs.org/
> >>>>>>   >>  > >>
> >>>>>>   >>  > >
> >>>>>>   >>  > >
> >>>>>>   >>  > > --
> >>>>>>   >>  > > Regards,
> >>>>>>   >>  > >
> >>>>>>   >>  > > Reid Wahl, RHCA
> >>>>>>   >>  > > Senior Software Maintenance Engineer, Red
> >> Hat
> >>>>>>   >>  > > CEE - Platform Support Delivery -
> >> ClusterHA
> >>>>>>   >>  > >
> >>>>>>   >>  > >
> >> _______________________________________________
> >>>>>>   >>  > > Manage your subscription:
> >>>>>>   >>  > >
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>>>   >>  > >
> >>>>>>   >>  > > ClusterLabs home:
> >> https://www.clusterlabs.org/
> >>>>>>   >>  > >
> >>>>>>   >>  >
> >>>>>>   >>  > _______________________________________________
> >>>>>>   >>  > Manage your subscription:
> >>>>>>   >>  >
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>>>   >>  >
> >>>>>>   >>  > ClusterLabs home: https://www.clusterlabs.org/
> >>>>>>   >>  _______________________________________________
> >>>>>>   >>  Manage your subscription:
> >>>>>>   >>  https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>>>   >>
> >>>>>>   >>  ClusterLabs home: https://www.clusterlabs.org/
> >>>>>>   >
> >>>>>>   >
> >>>>>>   >
> >>>>>>   > --
> >>>>>>   > Regards,
> >>>>>>   >
> >>>>>>   > Reid Wahl, RHCA
> >>>>>>   > Senior Software Maintenance Engineer, Red Hat
> >>>>>>   > CEE - Platform Support Delivery - ClusterHA
> >>>>>>   >
> >>>>>>   > _______________________________________________
> >>>>>>   > Manage your subscription:
> >>>>>>   > https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>>>   >
> >>>>>>   > ClusterLabs home: https://www.clusterlabs.org/
> >>>>>>   >
> >>>>>>
> >>>>>>   _______________________________________________
> >>>>>>   Manage your subscription:
> >>>>>>   https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>>>
> >>>>>>   ClusterLabs home: https://www.clusterlabs.org/
> >>>> _______________________________________________
> >>>> Manage your subscription:
> >>>> https://lists.clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>> ClusterLabs home: https://www.clusterlabs.org/
> >>>>
> >>> --
> >>>
> >>> Regards,
> >>>
> >>> Reid Wahl, RHCA
> >>>
> >>> Senior Software Maintenance Engineer, Red Hat
> >>> CEE - Platform Support Delivery - ClusterHA
> >>>
> >>>
> >> _______________________________________________
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> ClusterLabs home: https://www.clusterlabs.org/
> >>
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/


More information about the Users mailing list