[ClusterLabs] Antw: Re: Antw: [EXT] Failed fencing monitor process (fence_vmware_soap) RHEL 8

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Fri Jun 19 02:27:47 EDT 2020


>>> Howard <hmoneta at gmail.com> schrieb am 19.06.2020 um 00:13 in Nachricht
<CAO51vj7RniJZ60kAkmcgqZTouBjBwgkaKDqDnqSvh32+JDbd7Q at mail.gmail.com>:
> Thanks for all the help so far.  With your assistance, I'm very close to
> stable.
> 
> Made the following changes to the vmfence stonith resource:
> 
> Meta Attrs: failure-timeout=30m migration-threshold=10
>   Operations: monitor interval=60s (vmfence-monitor-interval-60s)
> 
> If I understand this correctly, it will check if the fencing device is
> online every 60 seconds. It will try 10 times and then mark the node
> ineligible.  After 30 minutes it will start trying again.

Did you add "meta failure-timeout=30m" to the stonith resource?

Maybe you could also set the stonith timeout to a higher value, the threshold
to a lower value (like 3), and also the failure-timeout to a higher value (like
several hours or days).

(The idea is that if you have like one failure every second day you don't want
the resocre to be disabled after a week or two, because the failure count
accumulated)

Of course while testing you may use lower values for the impatient ;-)

Regards,
Ulrich

> 
> On Thu, Jun 18, 2020 at 12:29 PM Ken Gaillot <kgaillot at redhat.com> wrote:
> 
>> On Thu, 2020-06-18 at 21:32 +0300, Andrei Borzenkov wrote:
>> > 18.06.2020 18:24, Ken Gaillot пишет:
>> > > Note that a failed start of a stonith device will not prevent the
>> > > cluster from using that device for fencing. It just prevents the
>> > > cluster from monitoring the device.
>> > >
>> >
>> > My understanding is that if stonith resource cannot run anywhere, it
>> > also won't be used for stonith. When failcount exceeds threshold,
>> > resource is banned from node. If it happens on all nodes, resource
>> > cannot run anywhere and so won't be used for stonith. Start failure
>> > automatically sets failcount to INFINITY.
>> >
>> > Or do I misunderstand something?
>>
>> I had to test to confirm, but a stonith resource stopped due to
>> failures can indeed be used. Only stonith resources stopped via
>> location constraints (bans) or target-role=Stopped are prevented from
>> being used.
>> --
>> Ken Gaillot <kgaillot at redhat.com>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>





More information about the Users mailing list