[ClusterLabs] Antw: [EXT] Failed fencing monitor process (fence_vmware_soap) RHEL 8

Andrei Borzenkov arvidjaar at gmail.com
Sat Jun 20 01:47:51 EDT 2020


19.06.2020 01:13, Howard пишет:
> Thanks for all the help so far.  With your assistance, I'm very close to
> stable.
> 
> Made the following changes to the vmfence stonith resource:
> 
> Meta Attrs: failure-timeout=30m migration-threshold=10
>   Operations: monitor interval=60s (vmfence-monitor-interval-60s)
> 
> If I understand this correctly, it will check if the fencing device is
> online every 60 seconds. It will try 10 times and then mark the node
> ineligible.

No. That's the main problem - stonith resource failure on a node does
not affect whether this node can be selected to perform stonith. Node
becomes ineligible for *monitoring* operation, that's all.

Resource could be marked as failed on all nodes and still fencing will
be attempted.

That is very counter-intuitive, OTOH this allows fencing to work even in
case of transient issues.

I wonder if pacemaker will cycle through available nodes though.
Consider three node cluster nodeA, nodeB, nodeC. nodeA is lost, nodeB is
selected to but cannot perform stonith for whatever reasons. Will
pacemaker retry on nodeC? Under which conditions (number of retries on
nodeB, whatever)? If nodeC fails too, will pacemaker restart cycle from
the beginning?

Also does stonith resource failure on a node affect selecting this node
to perform stonith? Is there any sort of priority list? If yes, how is
it ordered?

>  After 30 minutes it will start trying again.
>

... resume monitoring. Nothing more.


More information about the Users mailing list