[ClusterLabs] Antw: group resources not grouped ?!?

Wed Oct 7 14:46:41 UTC 2015

>>> zulucloud <zulucloud at mailbox.org> schrieb am 07.10.2015 um 16:12 in Nachricht
<5615284E.8050406 at mailbox.org>:
> Hi,
> i got a problem i don't understand, maybe someone can give me a hint.
> 
> My 2-node cluster (named ali and baba) is configured to run mysql, an IP 
> for mysql and the filesystem resource (on drbd master) together as a 
> GROUP. After doing some crash-tests i ended up having filesystem and 
> mysql running happily on one host (ali), and the related IP on the other 
> (baba) .... although, the IP's not really up and running, crm_mon just 
> SHOWS it as started there. In fact it's nowhere up, neither on ali nor 
> on baba.

Then it's most likely a bug in the resource agent. To make sure, try "crm resource reprobe" and be patient after that for some seconds. Then recheck the displayed status.

> 
> crm_mon shows that pacemaker tried to start it on baba, but gave up 
> after fail-count=1000000.

This could mean: Multiple start attempty failed, as did stop attempts, so the cluster thinks it might be running. It looks very much like a configuration problem to me.

> 
> Q1: why doesn't pacemaker put the IP on ali, where all the rest of it's 
> group lives?

See the log files in detail.

> Q2: why doesn't pacemaker try to start the IP on ali, after max 
> failcount had been reached on baba?

Do you have fencing enabled?

> Q3: why is crm_mon showing the IP as "started", when it's down after 
> 100000 tries?

See above.

> 
> Thanks :)

8-)

> 
> 
> config (some parts removed):
> -------------------------------
> node ali
> node baba
> 
> primitive res_drbd ocf:linbit:drbd \
> 	params drbd_resource="r0" \
> 	op stop interval="0" timeout="100" \
> 	op start interval="0" timeout="240" \
> 	op promote interval="0" timeout="90" \
> 	op demote interval="0" timeout="90" \
> 	op notify interval="0" timeout="90" \
> 	op monitor interval="40" role="Slave" timeout="20" \
> 	op monitor interval="20" role="Master" timeout="20"
> primitive res_fs ocf:heartbeat:Filesystem \
> 	params device="/dev/drbd0" directory="/drbd_mnt" fstype="ext4" \
> 	op monitor interval="30s"
> primitive res_hamysql_ip ocf:heartbeat:IPaddr2 \
> 	params ip="XXX.XXX.XXX.224" nic="eth0" cidr_netmask="23" \
> 	op monitor interval="10s" timeout="20s" depth="0"
> primitive res_mysql lsb:mysql \
> 	op start interval="0" timeout="15" \
> 	op stop interval="0" timeout="15" \
> 	op monitor start-delay="30" interval="15" time-out="15"
> 
> group gr_mysqlgroup res_fs res_mysql res_hamysql_ip \
> 	meta target-role="Started"
> ms ms_drbd res_drbd \
> 	meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true"
> 
> colocation col_fs_on_drbd_master inf: res_fs:Started ms_drbd:Master
> 
> order ord_drbd_master_then_fs inf: ms_drbd:promote res_fs:start
> 
> property $id="cib-bootstrap-options" \
> 	dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
> 	cluster-infrastructure="openais" \
> 	stonith-enabled="false" \
> 	no-quorum-policy="ignore" \
> 	expected-quorum-votes="2" \
> 	last-lrm-refresh="1438857246"
> 
> 
> crm_mon -rnf (some parts removed):
> ---------------------------------
> Node ali: online
>          res_fs  (ocf::heartbeat:Filesystem) Started
>          res_mysql       (lsb:mysql) Started
>          res_drbd:0      (ocf::linbit:drbd) Master
> Node baba: online
>          res_hamysql_ip  (ocf::heartbeat:IPaddr2) Started
>          res_drbd:1      (ocf::linbit:drbd) Slave
> 
> Inactive resources:
> 
> Migration summary:
> 
> * Node baba:
>     res_hamysql_ip: migration-threshold=1000000 fail-count=1000000
> 
> Failed actions:
>      res_hamysql_ip_stop_0 (node=a891vl107s, call=35, rc=1, 
> status=complete): unknown error
> 
> corosync.log:
> --------------
> pengine: [1223]: WARN: should_dump_input: Ignoring requirement that 
> res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0: 
> unmanaged failed resources cannot prevent shutdown
> 
> pengine: [1223]: WARN: should_dump_input: Ignoring requirement that 
> res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0: 
> unmanaged failed resources cannot prevent shutdown
> 
> Software:
> ----------
> corosync 1.2.1-4
> pacemaker 1.0.9.1+hg15626-1
> drbd8-utils 2:8.3.7-2.1
> (for some reason it's not possible to update at this time)
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org