[ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action.

Ken Gaillot kgaillot at redhat.com
Mon Feb 6 10:51:24 EST 2017


On 02/06/2017 09:00 AM, Scott Greenlese wrote:
> Further explanation for my concern about --disabled not taking effect
> until after the iface-bridge was configured ...
> 
> The reason I wanted to create the iface-bridge resource "disabled", was
> to allow me the opportunity to impose
> a location constraint / rule on the resource to prevent it from being
> started on certain cluster nodes,
> where the specified slave vlan did not exist.
> 
> In my case, pacemaker assigned the resource to a cluster node where the
> specified slave vlan did not exist, which in turn
> triggered a fenced (off) action against that node (apparently, because
> the device could not be stopped, per Ken's reply earlier).
> 
> Again, my cluster is configured as "symmetric" , so I would have to "opt
> out" my new resource from
> certain cluster nodes via location constraint.
> 
> So, if this really is how --disable is designed to work, is there any
> way to impose a location constraint rule BEFORE
> the iface-bridge resource gets assigned. configured and started on a
> cluster node in a symmetrical cluster?

I would expect --disabled to behave like that already; I'm not sure
what's happening there.

But, you can add a resource and any constraints that apply to it
simultaneously. How to do this depends on whether you want to do it
interactively or scripted, and whether you prefer the low-level tools,
crm shell, or pcs.

If you want to script it via pcs, you can do pcs cluster cib $SOME_FILE,
then pcs -f $SOME_FILE <whatever commands you want>, then pcs cluster
cib-push $SOME_FILE --config.

> 
> Thanks,
> 
> Scott Greenlese ... IBM KVM on System Z - Solutions Test, Poughkeepsie, N.Y.
> INTERNET: swgreenl at us.ibm.com
> 
> 
> 
> Inactive hide details for Scott Greenlese---02/03/2017 03:23:40
> PM---Ken, Thanks for the explanation.Scott Greenlese---02/03/2017
> 03:23:40 PM---Ken, Thanks for the explanation.
> 
> From: Scott Greenlese/Poughkeepsie/IBM at IBMUS
> To: kgaillot at redhat.com, Cluster Labs - All topics related to
> open-source clustering welcomed <users at clusterlabs.org>
> Date: 02/03/2017 03:23 PM
> Subject: Re: [ClusterLabs] Failure to configure iface-bridge resource
> causes cluster node fence action.
> 
> ------------------------------------------------------------------------
> 
> 
> 
> Ken,
> 
> Thanks for the explanation.
> 
> One other thing, relating to the iface-bridge resource creation. I
> specified --disabled flag:
> 
>> [root at zs95kj VD]# date;pcs resource create br0_r1
>> ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op
>> monitor timeout="20s" interval="10s" --*disabled*
> 
> Does the bridge device have to be successfully configured by pacemaker
> before disabling the resource? It seems
> that that was the behavior, since it failed the resource and fenced the
> node instead of disabling the resource.
> Just checking with you to be sure.
> 
> Thanks again..
> 
> Scott Greenlese ... IBM KVM on System Z Solutions Test, Poughkeepsie, N.Y.
> INTERNET: swgreenl at us.ibm.com
> 
> 
> 
> Inactive hide details for Ken Gaillot ---02/02/2017 03:29:12 PM---On
> 02/02/2017 02:14 PM, Scott Greenlese wrote: > Hi folks,Ken Gaillot
> ---02/02/2017 03:29:12 PM---On 02/02/2017 02:14 PM, Scott Greenlese
> wrote: > Hi folks,
> 
> From: Ken Gaillot <kgaillot at redhat.com>
> To: users at clusterlabs.org
> Date: 02/02/2017 03:29 PM
> Subject: Re: [ClusterLabs] Failure to configure iface-bridge resource
> causes cluster node fence action.
> ------------------------------------------------------------------------
> 
> 
> 
> On 02/02/2017 02:14 PM, Scott Greenlese wrote:
>> Hi folks,
>>
>> I'm testing iface-bridge resource support on a Linux KVM on System Z
>> pacemaker cluster.
>>
>> pacemaker-1.1.13-10.el7_2.ibm.1.s390x
>> corosync-2.3.4-7.el7_2.ibm.1.s390x
>>
>> I created an iface-bridge resource, but specified a non-existent
>> bridge_slaves value, vlan1292 (i.e. vlan1292 doesn't exist).
>>
>> [root at zs95kj VD]# date;pcs resource create br0_r1
>> ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op
>> monitor timeout="20s" interval="10s" --disabled
>> Wed Feb 1 17:49:16 EST 2017
>> [root at zs95kj VD]#
>>
>> [root at zs95kj VD]# pcs resource show |grep br0
>> br0_r1 (ocf::heartbeat:iface-bridge): FAILED zs93kjpcs1
>> [root at zs95kj VD]#
>>
>> As you can see, the resource was created, but failed to start on the
>> target node zs93kppcs1.
>>
>> To my surprise, the target node zs93kppcs1 was unceremoniously fenced.
>>
>> pacemaker.log shows a fence (off) action initiated against that target
>> node, "because of resource failure(s)" :
>>
>> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2719 ) debug:
>> determine_op_status: br0_r1_stop_0 on zs93kjpcs1 returned 'not
>> configured' (6) instead of the expected value: 'ok' (0)
>> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2602 ) warning:
>> unpack_rsc_op_failure: Processing failed op stop for br0_r1 on
>> zs93kjpcs1: not configured (6)
>> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:3244 ) error:
>> unpack_rsc_op: Preventing br0_r1 from re-starting anywhere: operation
>> stop failed 'not configured' (6)
>> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2719 ) debug:
>> determine_op_status: br0_r1_stop_0 on zs93kjpcs1 returned 'not
>> configured' (6) instead of the expected value: 'ok' (0)
>> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2602 ) warning:
>> unpack_rsc_op_failure: Processing failed op stop for br0_r1 on
>> zs93kjpcs1: not configured (6)
>> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:3244 ) error:
>> unpack_rsc_op: Preventing br0_r1 from re-starting anywhere: operation
>> stop failed 'not configured' (6)
>> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:96 ) warning:
>> pe_fence_node: Node zs93kjpcs1 will be fenced because of resource
> failure(s)
>>
>>
>> Thankfully, I was able to successfully create a iface-bridge resource
>> when I changed the bridge_slaves value to an existent vlan interface.
>>
>> My main concern is, why would the response to a failed bridge config
>> operation warrant a node fence (off) action? Isn't it enough to just
>> fail the resource and try another cluster node,
>> or at most, give up if it can't be started / configured on any node?
>>
>> Is there any way to control this harsh recovery action in the cluster?
>>
>> Thanks much..
>>
>>
>> Scott Greenlese ... IBM KVM on System Z Solutions Test, Poughkeepsie, N.Y.
>> INTERNET: swgreenl at us.ibm.com
> 
> It's actually the stop operation failure that leads to the fence.
> 
> If a resource fails to stop, fencing is the only way pacemaker can
> recover the resource elsewhere. Consider a database master -- if it
> doesn't stop, starting the master elsewhere could lead to severe data
> inconsistency.
> 
> You can tell pacemaker to not attempt recovery, by setting on-fail=block
> on the stop operation, so it doesn't need to fence. Obviously, that
> prevents high availability, as manual intervention is required to do
> anything further with the service.




More information about the Users mailing list