[ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action.

Scott Greenlese swgreenl at us.ibm.com
Thu Feb 2 15:14:49 EST 2017


Hi folks,

I'm testing iface-bridge resource support on a Linux KVM on System Z
pacemaker cluster.

pacemaker-1.1.13-10.el7_2.ibm.1.s390x
corosync-2.3.4-7.el7_2.ibm.1.s390x

I created an iface-bridge resource, but specified a non-existent
bridge_slaves value, vlan1292  (i.e. vlan1292 doesn't exist).

[root at zs95kj VD]# date;pcs resource create br0_r1
ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op
monitor timeout="20s"  interval="10s" --disabled
Wed Feb  1 17:49:16 EST 2017
[root at zs95kj VD]#

[root at zs95kj VD]# pcs resource show |grep br0
 br0_r1 (ocf::heartbeat:iface-bridge):  FAILED zs93kjpcs1
[root at zs95kj VD]#

As you can see, the resource was created, but failed to start on the target
node zs93kppcs1.

To my surprise, the target node zs93kppcs1 was unceremoniously fenced.

pacemaker.log shows a fence (off) action initiated against that target
node, "because of resource failure(s)" :

Feb 01 17:55:56 [52941] zs95kj crm_resource: (    unpack.c:2719  )   debug:
determine_op_status:        br0_r1_stop_0 on zs93kjpcs1 returned 'not
configured' (6) instead of the expected value: 'ok' (0)
Feb 01 17:55:56 [52941] zs95kj crm_resource: (    unpack.c:2602  ) warning:
unpack_rsc_op_failure:      Processing failed op stop for br0_r1 on
zs93kjpcs1: not configured (6)
Feb 01 17:55:56 [52941] zs95kj crm_resource: (    unpack.c:3244  )   error:
unpack_rsc_op:      Preventing br0_r1 from re-starting anywhere: operation
stop failed 'not configured' (6)
Feb 01 17:55:56 [52941] zs95kj crm_resource: (    unpack.c:2719  )   debug:
determine_op_status:        br0_r1_stop_0 on zs93kjpcs1 returned 'not
configured' (6) instead of the expected value: 'ok' (0)
Feb 01 17:55:56 [52941] zs95kj crm_resource: (    unpack.c:2602  ) warning:
unpack_rsc_op_failure:      Processing failed op stop for br0_r1 on
zs93kjpcs1: not configured (6)
Feb 01 17:55:56 [52941] zs95kj crm_resource: (    unpack.c:3244  )   error:
unpack_rsc_op:      Preventing br0_r1 from re-starting anywhere: operation
stop failed 'not configured' (6)
Feb 01 17:55:56 [52941] zs95kj crm_resource: (    unpack.c:96    ) warning:
pe_fence_node:      Node zs93kjpcs1 will be fenced because of resource
failure(s)


Thankfully, I was able to successfully create a iface-bridge resource when
I changed the bridge_slaves value to an existent vlan interface.

My main concern is, why would the response to a failed bridge config
operation warrant a node fence (off) action?  Isn't it enough to just fail
the resource and try another cluster node,
or at most, give up if it can't be started / configured on any node?

Is there any way to control this harsh recovery action in the cluster?

Thanks much..


Scott Greenlese ... IBM KVM on System Z Solutions Test,  Poughkeepsie, N.Y.
  INTERNET:  swgreenl at us.ibm.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170202/ba4ed35b/attachment-0002.html>


More information about the Users mailing list