[ClusterLabs] STONITH forever?

Ken Gaillot kgaillot at redhat.com
Tue Apr 10 14:52:17 UTC 2018


On Tue, 2018-04-10 at 07:26 +0000, Stefan Schlösser wrote:
> Hi,
>  
> I have a 3 node setup on ubuntu 16.04. Corosync/Pacemaker services
> are not started automatically.
>  
> If I put all 3 nodes to offline mode, with 1 node in an „unclean“
> state I get a never ending STONITH.
>  
> What happens is that the STONITH causes a reboot of the unclean node.
>  
> 1) I would have thought with all nodes in standby no STONITH can
> occur. Why does it?

Standby prevents a node from running resources, but it still
participates in quorum voting. I suspect *starting* a node in standby
mode would prevent it from using fence devices, but *changing* a node
to standby will have no effect on whether it can fence.

> 2) Why does it keep on killing the unclean node?

Good question. The DC's logs will have the most useful information --
each pengine run should say why fencing is being scheduled.

>  
> The only way to stop it, is to temporarily disable stonith and bring
> the unclean node back online manually, and the enable it again.
>  
> Here is a log extract of node c killing node a:
> Apr 10 09:08:30 [2276] xxx-c stonith-ng:   notice: log_operation:  
> Operation 'reboot' [2428] (call 5 from crmd.2175) for host 'xxx-a'
> with device 'stonith_a' returned: 0 (OK)
> Apr 10 09:08:30 [2276] xxx-c stonith-ng:   notice: remote_op_done: 
> Operation reboot of xxx-a by xxx-c for crmd.2175 at xxx-b.20531831: OK
> Apr 10 09:08:30 [2275] xxx-c        cib:     info:
> cib_process_request:     Completed cib_modify operation for section
> status: OK (rc=0, origin=xxx-b/crmd/83, version=0.164.37)
> Apr 10 09:08:30 [2275] xxx-c        cib:     info:
> cib_process_request:     Completed cib_delete operation for section
> //node_state[@uname='xxx-a']/lrm: OK (rc=0, origin=xxx-b/crmd/84,
> version=0.164.37)
> Apr 10 09:08:30 [2275] xxx-c        cib:     info:
> cib_process_request:     Completed cib_delete operation for section
> //node_state[@uname='xxx-a']/transient_attributes: OK (rc=0,
> origin=xxx-b/crmd/85, version=0.164.37)
> Apr 10 09:08:30 [2275] xxx-c        cib:     info:
> cib_process_request:     Completed cib_modify operation for section
> status: OK (rc=0, origin=xxx-b/crmd/86, version=0.164.37)
> Apr 10 09:08:30 [2275] xxx-c        cib:     info:
> cib_process_request:     Completed cib_delete operation for section
> //node_state[@uname='xxx-a']/lrm: OK (rc=0, origin=xxx-b/crmd/87,
> version=0.164.37)
> Apr 10 09:08:30 [2275] xxx-c        cib:     info:
> cib_process_request:     Completed cib_delete operation for section
> //node_state[@uname='xxx-a']/transient_attributes: OK (rc=0,
> origin=xxx-b/crmd/88, version=0.164.37)
>  
> This the repeats forevermore ...
>  
> Thanks for any hints,
>  
> cheers,
>  
> Stefan
-- 
Ken Gaillot <kgaillot at redhat.com>


More information about the Users mailing list