[ClusterLabs] Antw: Fencing one node kill others

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Jan 2 03:30:38 EST 2017


Hi!

Seeing the detailed log of events would be helpful. Despite of that we had a similar issue with using multicast (and after adding a new node to an existing cluster). Switching to UDPU helped in our case, but unless we see the details, it's all just guessing...

Ulrich
P.S. A good new year to everyone!

>>> Alfonso Ali <alfonso.ali at gmail.com> schrieb am 30.12.2016 um 21:40 in Nachricht
<CANeoTMcuNGw_T9e4WNEEK-nmHnV-NwiX2Ck0UBDnVeuoiC=r8A at mail.gmail.com>:
> Hi,
> 
> I have a four node cluster that uses iLo as fencing agent. When i simulate
> a node crash (either killing corosync or echo c > /proc/sysrq-trigger) the
> node is marked as UNCLEAN and requested to be restarted by the stonith
> agent, but everytime that happens another node in the cluster is also
> marked as UNCLEAN and rebooted as well. After the nodes are rebooted they
> are marked as online again and cluster resume operation without problem.
> 
> I have reviewed corosync and pacemaker logs but found nothing that explain
> why the other node is also rebooted.
> 
> Any hint of what to check or what to look for would be appreciated.
> 
> -----------------Cluster conf----------------------------------
>  node 1239211542: e1b12 \
> attributes standby=off
> node 1239211543: e1b13
> node 1239211581: e1b03 \
> attributes standby=off
> node 1239211582: e1b07 \
> attributes standby=off
> primitive fence-e1b03 stonith:fence_ilo \
> params ipaddr=e1b03-ilo login=fence_agent passwd=XXX ssl_insecure=1 \
> op monitor interval=300 timeout=120 \
> meta migration-threshold=2 target-role=Started
> primitive fence-e1b07 stonith:fence_ilo \
> params ipaddr=e1b07-ilo login=fence_agent passwd=XXX ssl_insecure=1 \
> op monitor interval=300 timeout=120 \
> meta migration-threshold=2 target-role=Started
> primitive fence-e1b12 stonith:fence_ilo \
> params ipaddr=e1b12-ilo login=fence_agent passwd=XXX ssl_insecure=1 \
> op monitor interval=300 timeout=120 \
> meta migration-threshold=2 target-role=Started
> primitive fence-e1b13 stonith:fence_ilo \
> params ipaddr=e1b13-ilo login=fence_agent passwd=XXX ssl_insecure=1 \
> op monitor interval=300 timeout=120 \
> meta migration-threshold=2 target-role=Started
> ..... extra resources ......
> location l-f-e1b03 fence-e1b03 \
> rule -inf: #uname eq e1b03 \
> rule 10000: #uname eq e1b07
> location l-f-e1b07 fence-e1b07 \
> rule -inf: #uname eq e1b07 \
> rule 10000: #uname eq e1b03
> location l-f-e1b12 fence-e1b12 \
> rule -inf: #uname eq e1b12 \
> rule 10000: #uname eq e1b13
> location l-f-e1b13 fence-e1b13 \
> rule -inf: #uname eq e1b13 \
> rule 10000: #uname eq e1b12
> property cib-bootstrap-options: \
> have-watchdog=false \
> dc-version=1.1.15-e174ec8 \
> cluster-infrastructure=corosync \
> stonith-enabled=true \
> cluster-name=test-cluster \
> no-quorum-policy=freeze \
> last-lrm-refresh=1483125286
> ----------------------------------------------------------------------------
> ------------
> 
> Regards,
>   Ali








More information about the Users mailing list