[ClusterLabs] FLoating IP failing over but not failing back with active/active LDAP (dirsrv)

Thu Mar 10 10:38:16 EST 2016

Hi Ken,
Thanks for your response, I've now corrected the constraint order but the
behaviour is still the same, the IP does not fail over (after the first
time) unless I issue a pcs resource cleanup command on dirsrv-daemon.

Also, I'm not sure why you advise against using is-managed=false in
production. We are trying to use pacemaker purely to fail over on detection
of a failure and not to control starting or stopping of the instances. It is
essential that in normal operation we have both instances up as we are using
MMR.

Thanks,
Bernie

-----Original Message-----
From: Ken Gaillot [mailto:kgaillot at redhat.com] 
Sent: 10 March 2016 15:01
To: users at clusterlabs.org
Subject: Re: [ClusterLabs] FLoating IP failing over but not failing back
with active/active LDAP (dirsrv)

On 03/10/2016 08:48 AM, Bernie Jones wrote:
> A bit more info..
> 
>  
> 
> If, after I restart the failed dirsrv instance, I then perform a "pcs
> resource cleanup dirsrv-daemon" to clear the FAIL messages then the
failover
> will work OK.
> 
> So it's as if the cleanup is changing the status in some way..
> 
>  
> 
> From: Bernie Jones [mailto:bernie at securityconsulting.ltd.uk] 
> Sent: 10 March 2016 08:47
> To: 'Cluster Labs - All topics related to open-source clustering welcomed'
> Subject: [ClusterLabs] FLoating IP failing over but not failing back with
> active/active LDAP (dirsrv)
> 
>  
> 
> Hi all, could you advise please?
> 
>  
> 
> I'm trying to configure a floating IP with an active/active deployment of
> 389 directory server. I don't want pacemaker to manage LDAP but just to
> monitor and switch the IP as required to provide resilience. I've seen
some
> other similar threads and based my solution on those.
> 
>  
> 
> I've amended the ocf for slapd to work with 389 DS and this tests out OK
> (dirsrv).
> 
>  
> 
> I've then created my resources as below:
> 
>  
> 
> pcs resource create dirsrv-ip ocf:heartbeat:IPaddr2 ip="192.168.26.100"
> cidr_netmask="32" op monitor timeout="20s" interval="5s" op start
> interval="0" timeout="20" op stop interval="0" timeout="20"
> 
> pcs resource create dirsrv-daemon ocf:heartbeat:dirsrv op monitor
> interval="10" timeout="5" op start interval="0" timeout="5" op stop
> interval="0" timeout="5" meta "is-managed=false"

is-managed=false means the cluster will not try to start or stop the
service. It should never be used in regular production, only when doing
maintenance on the service.

> pcs resource clone dirsrv-daemon meta globally-unique="false"
> interleave="true" target-role="Started" "master-max=2"
> 
> pcs constraint colocation add dirsrv-daemon-clone with dirsrv-ip
> score=INFINITY

This constraint means that dirsrv is only allowed to run where dirsrv-ip
is. I suspect you want the reverse, dirsrv-ip with dirsrv-daemon-clone,
which means keep the IP with a working dirsrv instance.

> pcs property set no-quorum-policy=ignore

If you're using corosync 2, you generally don't need or want this.
Instead, ensure corosync.conf has two_node: 1 (which will be done
automatically if you used pcs cluster setup).

> pcs resource defaults migration-threshold=1
> 
> pcs property set stonith-enabled=false
> 
>  
> 
> On startup all looks well:
> 
>
____________________________________________________________________________
> ____________
> 
>  
> 
> Last updated: Thu Mar 10 08:28:03 2016
> 
> Last change: Thu Mar 10 08:26:14 2016
> 
> Stack: cman
> 
> Current DC: ga2.idam.com - partition with quorum
> 
> Version: 1.1.11-97629de
> 
> 2 Nodes configured
> 
> 3 Resources configured
> 
>  
> 
>  
> 
> Online: [ ga1.idam.com ga2.idam.com ]
> 
>  
> 
> dirsrv-ip   (ocf::heartbeat:IPaddr2):     Started ga1.idam.com
> 
>  Clone Set: dirsrv-daemon-clone [dirsrv-daemon]
> 
>      dirsrv-daemon      (ocf::heartbeat:dirsrv):        Started
ga2.idam.com
> (unmanaged)
> 
>      dirsrv-daemon      (ocf::heartbeat:dirsrv):        Started
ga1.idam.com
> (unmanaged)
> 
>  
> 
>  
> 
>
____________________________________________________________________________
> ____________
> 
>  
> 
> Stop dirsrv on ga1:
> 
>  
> 
> Last updated: Thu Mar 10 08:28:43 2016
> 
> Last change: Thu Mar 10 08:26:14 2016
> 
> Stack: cman
> 
> Current DC: ga2.idam.com - partition with quorum
> 
> Version: 1.1.11-97629de
> 
> 2 Nodes configured
> 
> 3 Resources configured
> 
>  
> 
>  
> 
> Online: [ ga1.idam.com ga2.idam.com ]
> 
>  
> 
> dirsrv-ip   (ocf::heartbeat:IPaddr2):     Started ga2.idam.com
> 
>  Clone Set: dirsrv-daemon-clone [dirsrv-daemon]
> 
>      dirsrv-daemon      (ocf::heartbeat:dirsrv):        Started
ga2.idam.com
> (unmanaged)
> 
>      dirsrv-daemon      (ocf::heartbeat:dirsrv):        FAILED
ga1.idam.com
> (unmanaged)
> 
>  
> 
> Failed actions:
> 
>     dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7):
call=12,
> status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms,
> exec=0ms
> 
>  
> 
> IP fails over to ga2 OK:
> 
>  
> 
>
____________________________________________________________________________
> ____________
> 
>  
> 
> Restart dirsrv on ga1
> 
>  
> 
> Last updated: Thu Mar 10 08:30:01 2016
> 
> Last change: Thu Mar 10 08:26:14 2016
> 
> Stack: cman
> 
> Current DC: ga2.idam.com - partition with quorum
> 
> Version: 1.1.11-97629de
> 
> 2 Nodes configured
> 
> 3 Resources configured
> 
>  
> 
>  
> 
> Online: [ ga1.idam.com ga2.idam.com ]
> 
>  
> 
> dirsrv-ip   (ocf::heartbeat:IPaddr2):     Started ga2.idam.com
> 
>  Clone Set: dirsrv-daemon-clone [dirsrv-daemon]
> 
>      dirsrv-daemon      (ocf::heartbeat:dirsrv):        Started
ga2.idam.com
> (unmanaged)
> 
>      dirsrv-daemon      (ocf::heartbeat:dirsrv):        Started
ga1.idam.com
> (unmanaged)
> 
>  
> 
> Failed actions:
> 
>     dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7):
call=12,
> status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms,
> exec=0ms
> 
>  
> 
>
____________________________________________________________________________
> ____________
> 
>  
> 
> Stop dirsrv on ga2:
> 
>  
> 
> Last updated: Thu Mar 10 08:31:14 2016
> 
> Last change: Thu Mar 10 08:26:14 2016
> 
> Stack: cman
> 
> Current DC: ga2.idam.com - partition with quorum
> 
> Version: 1.1.11-97629de
> 
> 2 Nodes configured
> 
> 3 Resources configured
> 
>  
> 
>  
> 
> Online: [ ga1.idam.com ga2.idam.com ]
> 
>  
> 
> dirsrv-ip   (ocf::heartbeat:IPaddr2):     Started ga2.idam.com
> 
>  Clone Set: dirsrv-daemon-clone [dirsrv-daemon]
> 
>      dirsrv-daemon      (ocf::heartbeat:dirsrv):        FAILED
ga2.idam.com
> (unmanaged)
> 
>      dirsrv-daemon      (ocf::heartbeat:dirsrv):        Started
ga1.idam.com
> (unmanaged)
> 
>  
> 
> Failed actions:
> 
>     dirsrv-daemon_monitor_10000 on ga2.idam.com 'not running' (7):
call=11,
> status=complete, last-rc-change='Thu Mar 10 08:31:12 2016', queued=0ms,
> exec=0ms
> 
>     dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7):
call=12,
> status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms,
> exec=0ms
> 
>  
> 
> But IP stays on failed node
> 
> Looking in the logs it seems that the cluster is not aware that ga1 is
> available even though the status output shows it is.
> 
>  
> 
> If I repeat the tests but with ga2 started up first the behaviour is
similar
> i.e. it fails over to ga1 but not back to ga2.
> 
>  
> 
> Many thanks,
> 
> Bernie

_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus