[ClusterLabs] resource on remote node not failing over

Sun Mar 15 18:10:06 EDT 2015

On Friday, March 13, 2015 11:21 PM GMT, David Vossel <dvossel at redhat.com> wrote: 

> 
> 
> ----- Original Message -----
> > Hi
> > 
> > I have a two node cluster (Centos6.6/Pacemaker 1.1.12/Cman/Corosync 1.4.7)
> > 
> > It is an asymmetric cluster (symmetric-cluster: false)
> > 
> > There are two VM's (running Centos) in the cluster acting as remote
> > nodes (lx16mx & lx17mx). These are on drbd backing storage.
> > Other VM's (running other services) will be added in due course.
> > 
> > Bind named service is installed/configured on both remote nodes, will
> > start up on both nodes, and is being managed by an RA (ocf:heartbeat:named).
> > I have location constraints to allow the resource to run on these nodes
> > and I can move this resource between these nodes.
> >
> > However, if I disable the named service on the live node (by renaming
> > the main config file and shutting down service), the resource does not
> > failover to the other remote node.
> 
> Is pacemaker detecting this as a failure? I would expect pacemaker to 
> attempt to recover the dns server on the node it failed on first. If the
> dns server can't be started on that node I'd expect it to fail over somewhere
> else.
> 

Pacemaker was definitely detecting it as a failure. Status of the resource was FAILED.

I have since conducted exactly the same test on a pair of nodes running Centos 7 / Pacemaker 1.1.10 (VM's configured exactly as first setup) with same constraints, etc, and saw the behaviour I was expecting i.e. the resource failed over to the other remote node.
stonith disabled in both cases.