[ClusterLabs] Pending Fencing Actions shown in pcs status
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Mon Jan 11 21:45:01 EST 2021
Hi Steffen,
I've been experimenting with it since last weekend, but I haven't been able to reproduce the same situation.
It seems that the cause is that the reproduction method cannot be limited.
Can I attach a problem log?
Best Regards,
Hideo Yamauchi.
----- Original Message -----
> From: Klaus Wenninger <kwenning at redhat.com>
> To: Steffen Vinther Sørensen <svinther at gmail.com>; Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Cc:
> Date: 2021/1/7, Thu 21:42
> Subject: Re: [ClusterLabs] Pending Fencing Actions shown in pcs status
>
> On 1/7/21 1:13 PM, Steffen Vinther Sørensen wrote:
>> Hi Klaus,
>>
>> Yes then the status does sync to the other nodes. Also it looks like
>> there are some hostname resolving problems in play here, maybe causing
>> problems, here is my notes from restarting pacemaker etc.
> Don't think there are hostname resolving problems.
> The messages you are seeing, that look as if, are caused
> by using -EHOSTUNREACH as error-code to fail a pending
> fence action when a node that is just coming up sees
> a pending action that is claimed to be handled by himself.
> Back then I chose that error-code as there was none
> that really matched available right away and it was
> urgent for some reason so that introduction of something
> new was to risky at that state.
> Probably would make sense to introduce something that
> is more descriptive.
> Back then the issue was triggered by fenced crashing and
> being restarted - so not a node-restart but just fenced
> restarting.
> And it looks as if building the failed-message failed somehow.
> So that could be the reason why the pending action persists.
> Would be something else then what we solved with Bug 5401.
> But what triggers the logs below might as well just be a
> follow-up issue after the Bug 5401 thing.
> Will try to find time for a deeper look later today.
>
> Klaus
>>
>> pcs cluster standby kvm03-node02.avigol-gcs.dk
>> pcs cluster stop kvm03-node02.avigol-gcs.dk
>> pcs status
>>
>> Pending Fencing Actions:
>> * reboot of kvm03-node02.avigol-gcs.dk pending: client=crmd.37819,
>> origin=kvm03-node03.avigol-gcs.dk
>>
>> # From logs on all 3 nodes:
>> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: warning: received
>> pending action we are supposed to be the owner but it's not in our
>> records -> fail it
>> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: error: Operation
>> 'reboot' targeting kvm03-node02.avigol-gcs.dk on <no-one> for
>> crmd.37819 at kvm03-node03.avigol-gcs.dk.56a3018c: No route to host
>> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: error:
>> stonith_construct_reply: Triggered assert at commands.c:2406 : request
>> != NULL
>> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: warning: Can't create
>> a sane reply
>> Jan 07 12:48:18 kvm03-node03 crmd[37819]: notice: Peer
>> kvm03-node02.avigol-gcs.dk was not terminated (reboot) by <anyone> on
>> behalf of crmd.37819: No route to host
>>
>> pcs cluster start kvm03-node02.avigol-gcs.dk
>> pcs status (now outputs the same on all 3 nodes)
>>
>> Failed Fencing Actions:
>> * reboot of kvm03-node02.avigol-gcs.dk failed: delegate=,
>> client=crmd.37819, origin=kvm03-node03.avigol-gcs.dk,
>> last-failed='Thu Jan 7 12:48:18 2021'
>>
>>
>> pcs cluster unstandby kvm03-node02.avigol-gcs.dk
>>
>> # Now libvirtd refuses to start
>>
>> Jan 07 12:51:44 kvm03-node02 dnsmasq[20884]: read /etc/hosts - 8 addresses
>> Jan 07 12:51:44 kvm03-node02 dnsmasq[20884]: read
>> /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
>> Jan 07 12:51:44 kvm03-node02 dnsmasq-dhcp[20884]: read
>> /var/lib/libvirt/dnsmasq/default.hostsfile
>> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
>> 11:51:44.729+0000: 24160: info : libvirt version: 4.5.0, package:
>> 36.el7_9.3 (CentOS BuildSystem <http://bugs.centos.org >,
>> 2020-11-16-16:25:20, x86-01.bsys.centos.org)
>> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
>> 11:51:44.729+0000: 24160: info : hostname: kvm03-node02
>> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
>> 11:51:44.729+0000: 24160: error : qemuMonitorOpenUnix:392 : failed to
>> connect to monitor socket: Connection refused
>> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
>> 11:51:44.729+0000: 24159: error : qemuMonitorOpenUnix:392 : failed to
>> connect to monitor socket: Connection refused
>> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
>> 11:51:44.730+0000: 24161: error : qemuMonitorOpenUnix:392 : failed to
>> connect to monitor socket: Connection refused
>> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07
>> 11:51:44.730+0000: 24162: error : qemuMonitorOpenUnix:392 : failed to
>> connect to monitor socket: Connection refused
>>
>> pcs status
>>
>> Failed Resource Actions:
>> * libvirtd_start_0 on kvm03-node02.avigol-gcs.dk 'unknown error'
> (1):
>> call=142, status=complete, exitreason='',
>> last-rc-change='Thu Jan 7 12:51:44 2021', queued=0ms,
> exec=2157ms
>>
>> Failed Fencing Actions:
>> * reboot of kvm03-node02.avigol-gcs.dk failed: delegate=,
>> client=crmd.37819, origin=kvm03-node03.avigol-gcs.dk,
>> last-failed='Thu Jan 7 12:48:18 2021'
>>
>>
>> # from /etc/hosts on all 3 nodes:
>>
>> 172.31.0.31 kvm03-node01 kvm03-node01.avigol-gcs.dk
>> 172.31.0.32 kvm03-node02 kvm03-node02.avigol-gcs.dk
>> 172.31.0.33 kvm03-node03 kvm03-node03.avigol-gcs.dk
>>
>> On Thu, Jan 7, 2021 at 11:15 AM Klaus Wenninger <kwenning at redhat.com>
> wrote:
>>> Hi Steffen,
>>>
>>> If you just see the leftover pending-action on one node
>>> it would be interesting if restarting of pacemaker on
>>> one of the other nodes does sync it to all of the
>>> nodes.
>>>
>>> Regards,
>>> Klaus
>>>
>>> On 1/7/21 9:54 AM, renayama19661014 at ybb.ne.jp wrote:
>>>> Hi Steffen,
>>>>
>>>>> Unfortunately not sure about the exact scenario. But I have
> been doing
>>>>> some recent experiments with node standby/unstandby stop/start.
> This
>>>>> to get procedures right for updating node rpms etc.
>>>>>
>>>>> Later I noticed the uncomforting "pending fencing
> actions" status msg.
>>>> Okay!
>>>>
>>>> Repeat the standby and unstandby steps in the same way to check.
>>>> We will start checking after tomorrow, so I think it will take some
> time until next week.
>>>>
>>>>
>>>> Many thanks,
>>>> Hideo Yamauchi.
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: "renayama19661014 at ybb.ne.jp"
> <renayama19661014 at ybb.ne.jp>
>>>>> To: Reid Wahl <nwahl at redhat.com>; Cluster Labs - All
> topics related to open-source clustering welcomed <users at clusterlabs.org>
>>>>> Cc:
>>>>> Date: 2021/1/7, Thu 17:51
>>>>> Subject: Re: [ClusterLabs] Pending Fencing Actions shown in pcs
> status
>>>>>
>>>>> Hi Steffen,
>>>>> Hi Reid,
>>>>>
>>>>> The fencing history is kept inside stonith-ng and is not
> written to cib.
>>>>> However, getting the entire cib and getting it sent will help
> you to reproduce
>>>>> the problem.
>>>>>
>>>>> Best Regards,
>>>>> Hideo Yamauchi.
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: Reid Wahl <nwahl at redhat.com>
>>>>>> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics
> related to
>>>>> open-source clustering welcomed <users at clusterlabs.org>
>>>>>> Date: 2021/1/7, Thu 17:39
>>>>>> Subject: Re: [ClusterLabs] Pending Fencing Actions shown in
> pcs status
>>>>>>
>>>>>>
>>>>>> Hi, Steffen. Those attachments don't contain the CIB.
> They contain the
>>>>> `pcs config` output. You can get the cib with `pcs cluster cib
>>
>>>>> $(hostname).cib.xml`.
>>>>>> Granted, it's possible that this fence action
> information wouldn't
>>>>> be in the CIB at all. It might be stored in fencer memory.
>>>>>> On Thu, Jan 7, 2021 at 12:26 AM
> <renayama19661014 at ybb.ne.jp> wrote:
>>>>>>
>>>>>> Hi Steffen,
>>>>>>>> Here CIB settings attached (pcs config show) for
> all 3 of my nodes
>>>>>>>> (all 3 seems 100% identical), node03 is the DC.
>>>>>>> Thank you for the attachment.
>>>>>>>
>>>>>>> What is the scenario when this situation occurs?
>>>>>>> In what steps did the problem appear when fencing was
> performed (or
>>>>> failed)?
>>>>>>> Best Regards,
>>>>>>> Hideo Yamauchi.
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>> From: Steffen Vinther Sørensen
> <svinther at gmail.com>
>>>>>>>> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All
> topics related
>>>>> to open-source clustering welcomed
> <users at clusterlabs.org>
>>>>>>>> Cc:
>>>>>>>> Date: 2021/1/7, Thu 17:05
>>>>>>>> Subject: Re: [ClusterLabs] Pending Fencing Actions
> shown in pcs
>>>>> status
>>>>>>>> Hi Hideo,
>>>>>>>>
>>>>>>>> Here CIB settings attached (pcs config show) for
> all 3 of my nodes
>>>>>>>> (all 3 seems 100% identical), node03 is the DC.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Steffen
>>>>>>>>
>>>>>>>> On Thu, Jan 7, 2021 at 8:06 AM
> <renayama19661014 at ybb.ne.jp>
>>>>> wrote:
>>>>>>>>> Hi Steffen,
>>>>>>>>> Hi Reid,
>>>>>>>>>
>>>>>>>>> I also checked the Centos source rpm and it
> seems to include a
>>>>> fix for the
>>>>>>>> problem.
>>>>>>>>> As Steffen suggested, if you share your CIB
> settings, I might
>>>>> know
>>>>>>>> something.
>>>>>>>>> If this issue is the same as the fix, the
> display will only be
>>>>> displayed on
>>>>>>>> the DC node and will not affect the operation.
>>>>>>>>> The pending actions shown will remain for a
> long time, but
>>>>> will not have a
>>>>>>>> negative impact on the cluster.
>>>>>>>>> Best Regards,
>>>>>>>>> Hideo Yamauchi.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ----- Original Message -----
>>>>>>>>> > From: Reid Wahl <nwahl at redhat.com>
>>>>>>>>> > To: Cluster Labs - All topics related to
> open-source
>>>>> clustering
>>>>>>>> welcomed <users at clusterlabs.org>
>>>>>>>>> > Cc:
>>>>>>>>> > Date: 2021/1/7, Thu 15:58
>>>>>>>>> > Subject: Re: [ClusterLabs] Pending
> Fencing Actions shown
>>>>> in pcs status
>>>>>>>>> >
>>>>>>>>> > It's supposedly fixed in that
> version.
>>>>>>>>> > -
> https://bugzilla.redhat.com/show_bug.cgi?id=1787749
>>>>>>>>> > -
> https://access.redhat.com/solutions/4713471
>>>>>>>>> >
>>>>>>>>> > So you may be hitting a different issue
> (unless
>>>>> there's a bug in
>>>>>>>> the
>>>>>>>>> > pcmk 1.1 backport of the fix).
>>>>>>>>> >
>>>>>>>>> > I may be a little bit out of my area of
> knowledge here,
>>>>> but can you
>>>>>>>>> > share the CIBs from nodes 1 and 3? Maybe
> Hideo, Klaus, or
>>>>> Ken has some
>>>>>>>>> > insight.
>>>>>>>>> >
>>>>>>>>> > On Wed, Jan 6, 2021 at 10:53 PM Steffen
> Vinther Sørensen
>>>>>>>>> > <svinther at gmail.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Hi Hideo,
>>>>>>>>> >>
>>>>>>>>> >> If the fix is not going to make it
> into the CentOS7
>>>>> pacemaker
>>>>>>>> version,
>>>>>>>>> >> I guess the stable approach to take
> advantage of it
>>>>> is to build
>>>>>>>> the
>>>>>>>>> >> cluster on another OS than CentOS7
> ? A little late
>>>>> for that in
>>>>>>>> this
>>>>>>>>> >> case though :)
>>>>>>>>> >>
>>>>>>>>> >> Regards
>>>>>>>>> >> Steffen
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Thu, Jan 7, 2021 at 7:27 AM
>>>>> <renayama19661014 at ybb.ne.jp>
>>>>>>>> wrote:
>>>>>>>>> >> >
>>>>>>>>> >> > Hi Steffen,
>>>>>>>>> >> >
>>>>>>>>> >> > The fix pointed out by Reid is
> affecting it.
>>>>>>>>> >> >
>>>>>>>>> >> > Since the fencing action
> requested by the DC
>>>>> node exists
>>>>>>>> only in the
>>>>>>>>> > DC node, such an event occurs.
>>>>>>>>> >> > You will need to take
> advantage of the modified
>>>>> pacemaker to
>>>>>>>> resolve
>>>>>>>>> > the issue.
>>>>>>>>> >> >
>>>>>>>>> >> > Best Regards,
>>>>>>>>> >> > Hideo Yamauchi.
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> > ----- Original Message -----
>>>>>>>>> >> > > From: Reid Wahl
> <nwahl at redhat.com>
>>>>>>>>> >> > > To: Cluster Labs - All
> topics related to
>>>>> open-source
>>>>>>>> clustering
>>>>>>>>> > welcomed <users at clusterlabs.org>
>>>>>>>>> >> > > Cc:
>>>>>>>>> >> > > Date: 2021/1/7, Thu 15:07
>>>>>>>>> >> > > Subject: Re:
> [ClusterLabs] Pending Fencing
>>>>> Actions
>>>>>>>> shown in pcs
>>>>>>>>> > status
>>>>>>>>> >> > >
>>>>>>>>> >> > > Hi, Steffen. Are your
> cluster nodes all
>>>>> running the
>>>>>>>> same
>>>>>>>>> > Pacemaker
>>>>>>>>> >> > > versions? This looks like
> Bug 5401[1],
>>>>> which is fixed
>>>>>>>> by upstream
>>>>>>>>> >> > > commit df71a07[2].
> I'm a little bit
>>>>> confused about
>>>>>>>> why it
>>>>>>>>> > only shows
>>>>>>>>> >> > > up on one out of three
> nodes though.
>>>>>>>>> >> > >
>>>>>>>>> >> > > [1]
>>>>> https://bugs.clusterlabs.org/show_bug.cgi?id=5401
>>>>>>>>> >> > > [2]
>>>>>>>>
> https://github.com/ClusterLabs/pacemaker/commit/df71a07
>>>>>>>>> >> > >
>>>>>>>>> >> > > On Tue, Jan 5, 2021 at
> 8:31 AM Steffen
>>>>> Vinther Sørensen
>>>>>>>>> >> > >
> <svinther at gmail.com> wrote:
>>>>>>>>> >> > >>
>>>>>>>>> >> > >> Hello
>>>>>>>>> >> > >>
>>>>>>>>> >> > >> node 1 is showing
> this in 'pcs
>>>>> status'
>>>>>>>>> >> > >>
>>>>>>>>> >> > >> Pending Fencing
> Actions:
>>>>>>>>> >> > >> * reboot of
>>>>> kvm03-node02.avigol-gcs.dk pending:
>>>>>>>>> > client=crmd.37819,
>>>>>>>>> >> > >>
> origin=kvm03-node03.avigol-gcs.dk
>>>>>>>>> >> > >>
>>>>>>>>> >> > >> node 2 and node 3
> outputs no such
>>>>> thing (node 3 is
>>>>>>>> DC)
>>>>>>>>> >> > >>
>>>>>>>>> >> > >> Google is not much
> help, how to
>>>>> investigate this
>>>>>>>> further and
>>>>>>>>> > get rid
>>>>>>>>> >> > >> of such terrifying
> status message ?
>>>>>>>>> >> > >>
>>>>>>>>> >> > >> Regards
>>>>>>>>> >> > >> Steffen
>>>>>>>>> >> > >>
>>>>> _______________________________________________
>>>>>>>>> >> > >> Manage your
> subscription:
>>>>>>>>> >> > >>
>>>>>>>>
> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>> >> > >>
>>>>>>>>> >> > >> ClusterLabs home:
>>>>> https://www.clusterlabs.org/
>>>>>>>>> >> > >>
>>>>>>>>> >> > >
>>>>>>>>> >> > >
>>>>>>>>> >> > > --
>>>>>>>>> >> > > Regards,
>>>>>>>>> >> > >
>>>>>>>>> >> > > Reid Wahl, RHCA
>>>>>>>>> >> > > Senior Software
> Maintenance Engineer, Red
>>>>> Hat
>>>>>>>>> >> > > CEE - Platform Support
> Delivery -
>>>>> ClusterHA
>>>>>>>>> >> > >
>>>>>>>>> >> > >
>>>>> _______________________________________________
>>>>>>>>> >> > > Manage your subscription:
>>>>>>>>> >> > >
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>> >> > >
>>>>>>>>> >> > > ClusterLabs home:
>>>>> https://www.clusterlabs.org/
>>>>>>>>> >> > >
>>>>>>>>> >> >
>>>>>>>>> >> >
> _______________________________________________
>>>>>>>>> >> > Manage your subscription:
>>>>>>>>> >> >
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>> >> >
>>>>>>>>> >> > ClusterLabs home:
> https://www.clusterlabs.org/
>>>>>>>>> >>
> _______________________________________________
>>>>>>>>> >> Manage your subscription:
>>>>>>>>> >>
> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>> >>
>>>>>>>>> >> ClusterLabs home:
> https://www.clusterlabs.org/
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > Regards,
>>>>>>>>> >
>>>>>>>>> > Reid Wahl, RHCA
>>>>>>>>> > Senior Software Maintenance Engineer,
> Red Hat
>>>>>>>>> > CEE - Platform Support Delivery -
> ClusterHA
>>>>>>>>> >
>>>>>>>>> >
> _______________________________________________
>>>>>>>>> > Manage your subscription:
>>>>>>>>> >
> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>> >
>>>>>>>>> > ClusterLabs home:
> https://www.clusterlabs.org/
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
> _______________________________________________
>>>>>>>>> Manage your subscription:
>>>>>>>>>
> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>>
>>>>>>>>> ClusterLabs home:
> https://www.clusterlabs.org/
>>>>>>> _______________________________________________
>>>>>>> Manage your subscription:
>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>
>>>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>>>>
>>>>>> --
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Reid Wahl, RHCA
>>>>>>
>>>>>> Senior Software Maintenance Engineer, Red Hat
>>>>>> CEE - Platform Support Delivery - ClusterHA
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Manage your subscription:
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>>
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> ClusterLabs home: https://www.clusterlabs.org/
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
More information about the Users
mailing list