[ClusterLabs] DRBD promotion based on ping

Fri Mar 17 10:23:38 EDT 2017

Hello,

This is the status (when not failed):

Last updated: Fri Mar 17 16:17:44 2017        Last change: Fri Mar 17
14:21:24 2017 by root via cibadmin on db-main
Stack: corosync
Current DC: db-main (version 1.1.14-70404b0) - partition with quorum
2 nodes and 8 resources configured

Online: [ db-main db-slave ]

 Resource Group: SERVICES
     FSDATA    (ocf::heartbeat:Filesystem):    Started db-main
     IP    (ocf::heartbeat:IPaddr2):    Started db-main
     MYSQLD    (ocf::heartbeat:mysql):    Started db-main
 Master/Slave Set: DRBD_MASTER [DRBD0]
     Masters: [ db-main ]
     Slaves: [ db-slave ]
 Clone Set: CL_PING [PING]
     Started: [ db-main db-slave ]

I am not sure how to check (except for the above) whether the ping is
really running.

What might be important to mention is that the servers have a direct link
between them, so when I cut the ICMP by iptables, the cluster communication
continues to work. My feeling is that my configuration has means to demote
db-main, but nothing to promote db-slave, but i'm not sure how to add it.

When I drop the ICMP traffic (so the main node fails), it looks like this:

Last updated: Fri Mar 17 16:21:02 2017        Last change: Fri Mar 17
14:21:24 2017 by root via cibadmin on db-main
Stack: corosync
Current DC: db-main (version 1.1.14-70404b0) - partition with quorum
2 nodes and 8 resources configured

Online: [ db-main db-slave ]

 Master/Slave Set: DRBD_MASTER [DRBD0]
     Slaves: [ db-main db-slave ] !!! no master here, only 2 slaves
 Clone Set: CL_PING [PING]
     Started: [ db-main db-slave ]

Just to make it clear, normal failover works (so if instead of cutting
ICMP, i reboot db-main, db-slave takes over correctly and completely. Also,
if I remove the iptables ICMP rule, db-main starts back the services.

On Fri, Mar 17, 2017 at 3:22 PM, Klaus Wenninger <kwenning at redhat.com>
wrote:

> On 03/17/2017 01:17 PM, Victor wrote:
> > Hello,
> >
> > I have implemented the following pacemaker configuration, and I have a
> > problem which I don't understand (and all my net searches were in
> > vain, probably not looking for the correct keywords). If the ping
> > fails on the Master node, it moves into Slave, but the other node also
> > remains a slave (is not promoted). Can somebody tell me what I'm doing
> > wrong? I have also tried to add a second rule to the location: "rule
> > $role=Master ping: defined ping", so my location had two rules instead
> > of one, but it still didn't work.
>
> Did you check if the clones of PING are running on all your nodes?
> How do the node-attributes on the nodes look like?
>
> >
> > node 1084803074: db-main \
> >         attributes standby=off
> > node 1084803195: db-slave \
> >         attributes standby=off
> > primitive DRBD0 ocf:linbit:drbd \
> >         params drbd_resource=drbd0 \
> >         op monitor role=Master interval=15s \
> >         op monitor role=Slave interval=30s \
> >         op start interval=0 timeout=240s \
> >         op stop interval=0 timeout=100s
> > primitive FSDATA Filesystem \
> >         params device="/dev/drbd0" directory="/data" fstype=ext4 \
> >         meta target-role=Started
> > primitive IP IPaddr2 \
> >         params ip=5.35.208.178 cidr_netmask=32 nic=eth0
> > primitive MYSQLD mysql \
> >         params binary="/usr/sbin/mysqld" config="/etc/mysql/my.cnf"
> > datadir="/var/lib/mysql" pid="/var/run/mysqld/mysqld.pid"
> > socket="/var/run/mysqld/mysqld.sock" user=mysql group=mysql \
> >         op start timeout=120s interval=0 \
> >         op stop timeout=120s interval=0 \
> >         op monitor interval=20s timeout=30s
> > primitive PING ocf:pacemaker:ping \
> >         params name=ping multiplier=1000 host_list="192.168.1.1
> > 192.168.1.2" \
> >         op monitor interval=15s timeout=60s start
> > group SERVICES FSDATA IP MYSQLD
> > ms DRBD_MASTER DRBD0 \
> >         meta notify=true master-max=1 master-node-max=1 clone-max=2
> > clone-node-max=1 target-role=Master
> > clone CL_PING PING \
> >         meta interleave=true
> > location LOC_DRBD_MASTER_ON_PING DRBD_MASTER \
> >         rule $role=Master -inf: not_defined ping or ping number:lte 0
> > order SRV_ORDER Mandatory: DRBD_MASTER:promote SERVICES:start
> > colocation SRV_RULE inf: DRBD_MASTER:Master SERVICES
> > property cib-bootstrap-options: \
> >         have-watchdog=false \
> >         dc-version=1.1.14-70404b0 \
> >         cluster-infrastructure=corosync \
> >         cluster-name=debian \
> >         stonith-enabled=false \
> >         no-quorum-policy=ignore
> > rsc_defaults rsc-options: \
> >         resource-stickiness=100
> >
> > Thanks,
> > Victor
> >
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170317/b98c1baf/attachment-0003.html>