[Pacemaker] question about interface failover

Sat May 18 14:23:11 EDT 2013

On Fri, 2013-05-17 at 10:41 +0200, Florian Crouzat wrote:
> Le 16/05/2013 21:45, christopher barry a écrit :
> > Greetings,
> >
> > I've setup a new 2-node mysql cluster using
> > * drbd 8.3.1.3
> > * corosync 1.4.2
> > * pacemaker 117
> > on Debian Wheezy nodes.
> >
> > failover seems to be working fine for everything except the ips manually
> > configured on the interfaces.
> 
> This sentence makes no sense to me.
> The cluster will not failover something that is not clusterized (a 
> 'manually' configured IP...)
> 
> What are you trying to achieve exactly ?
> Also, could you pastebin the output of "crm_mon -Arf1" I find it more 
> easy to read.
> 
> 
> >
> > see config here:
> > http://pastebin.aquilenet.fr/?9eb51f6fb7d65fda#/YvSiYFocOzogAmPU9g
> > +g09RcJvhHbgrY1JuN7D+gA4=
> >
> > If I bring down an interface, when the cluster restarts it, it only
> > starts it with the vip - the original ip and route have been removed.
> 
> Makes sense if you added the 'original' IP manually...
> You should have non-VIP in /etc/sysconfig/network/ifcfg-*
> But then again, please precise what you are trying to achieve.
> 
> >
> > not sure what to do to make sure the permanent ip and the routes get
> > restored. I'm not all that versed on the cluster commandline yet, and
> > I'm using LCMC for most of my usage.
> 
> 

(@howard2.rjmetrics.com)-(14:00 / Sat May 18)
[-][~]# crm_mon -Arf1
============
Last updated: Sat May 18 14:00:27 2013
Last change: Thu May 16 17:33:07 2013 via crm_attribute on
howard3.rjmetrics.com
Stack: openais
Current DC: howard3.rjmetrics.com - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
6 Resources configured.
============

Online: [ howard3.rjmetrics.com howard2.rjmetrics.com ]

Full list of resources:

 Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]
     Masters: [ howard2.rjmetrics.com ]
     Slaves: [ howard3.rjmetrics.com ]
 Resource Group: g_mysql
     p_fs_mysql	(ocf::heartbeat:Filesystem):	Started
howard2.rjmetrics.com
     ClusterPrivateIP	(ocf::heartbeat:IPaddr2):	Started
howard2.rjmetrics.com
     ClusterPublicIP	(ocf::heartbeat:IPaddr2):	Started
howard2.rjmetrics.com
     p_mysql	(ocf::heartbeat:mysql):	Started howard2.rjmetrics.com

Node Attributes:
* Node howard3.rjmetrics.com:
    + master-p_drbd_mysql:0           	: 1000      
* Node howard2.rjmetrics.com:
    + master-p_drbd_mysql:1           	: 10000     

Migration summary:
* Node howard3.rjmetrics.com: 
   p_drbd_mysql:1: migration-threshold=1000000 fail-count=1
* Node howard2.rjmetrics.com: 
   ClusterPublicIP: migration-threshold=1000000 fail-count=1

Failed actions:
    p_drbd_mysql:1_promote_0 (node=howard3.rjmetrics.com, call=29,
rc=-2, status=Timed Out): unknown exec error
    ClusterPublicIP_monitor_30000 (node=howard2.rjmetrics.com, call=122,
rc=7, status=complete): not running

howard2 and howard3 are the two clustered servers.

During testing, when I ifdown either eth0 or eth1, the cluster starts
the vip back up, but the other non-vip IPs and routes do not get
started. I'm running Debian, so these are configured
in /etc/network/interfaces. Saying 'manually' configured was misleading
on my part, sorry about that.

eth0 is the public interface, and eth1 is the private interface. eth2
and eth3 are bonded as bond0, use jumbo frames, and are crossover cabled
between the nodes.

The test I was doing was to pull cables from eth0 and eth1, which hung
the cluster. My assumption is that I need to add more configuration
elements to manage the other IPs and also setup some ping hosts that
when unreachable will initiate failover. What would help me I think is
an example config or pointers to how to add these elements.

On another note, the test made the drbd link disconnect, with both disks
now marked as standalone in the lcmc gui. Right-clicking the disks or
the conenction does not allow any action other than view logs, which
say:

May 16 17:33:08 howard3 kernel: [781360.146362] block drbd0: Split-Brain
detected but unresolved, dropping connection!
May 16 17:33:08 howard3 kernel: [781360.146451] block drbd0: helper
command: /sbin/drbdadm split-brain minor-0
May 16 17:33:08 howard3 kernel: [781360.149042] block drbd0: helper
command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
May 16 17:33:08 howard3 kernel: [781360.149051] block drbd0:
conn( WFReportParams -> Disconnecting ) 
May 16 17:33:08 howard3 kernel: [781360.149060] block drbd0: error
receiving ReportState, l: 4!
May 16 17:33:08 howard3 kernel: [781360.149154] block drbd0: asender
terminated
May 16 17:33:08 howard3 kernel: [781360.149159] block drbd0: Terminating
drbd0_asender
May 16 17:33:08 howard3 kernel: [781360.149609] block drbd0: Connection
closed
May 16 17:33:08 howard3 kernel: [781360.149619] block drbd0:
conn( Disconnecting -> StandAlone ) 
May 16 17:33:08 howard3 kernel: [781360.149811] block drbd0: receiver
terminated
May 16 17:33:08 howard3 kernel: [781360.149815] block drbd0: Terminating
drbd0_receiver

I'm really not sure how to proceed. Please let me know any additional
information you may need.

Thanks for your time Florian, it's much appreciated.

Regards,
-C