[Pacemaker] cluster misbehaving after update

Andrew Beekhof andrew at beekhof.net
Tue Aug 6 20:37:11 EDT 2013


Many thanks.

Fixed in:
   https://github.com/beekhof/pacemaker/commit/fab0978

Apparently there was no regression test covering this (things collocated with the group too) but there is now:
   https://github.com/beekhof/pacemaker/commit/d2be466

So you can be sure it wont break again.

On 02/08/2013, at 4:57 PM, Xzarth <xzarth at gmail.com> wrote:

> On 08/02/2013 02:16 AM, Andrew Beekhof wrote:
>> On 01/08/2013, at 10:24 PM, Xzarth <xzarth at gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> I updated from pacemaker 1.0.9 to 1.1.7
>> Distro?  Seems strange to be upgrading to a release from 1.5 years ago.
>> We're up to 1.1.10 now
>> 
> I have debian, i have one with stable (wheezy), and one with oldstable
> (squeeze), installed from backports. Behavior is same on both.
>>> After the update, cluster behaves differently than before. I have a
>>> resource with migration-treshold="1", once that resource fails
>>> everything used to migrate to another node (what i would expect).
>>> After the upgrade, once that resource fails, cluster stops any resources
>>> that depend on that resource and just hangs there. What changed, since i
>>> haven't touched the config?
>> Can you attach the result of cibadmin -Ql when the cluster is in this state?
>> 
> here it is
>>> 
>>> Here is the config:
>>> 	
>>> node $id="1bb92e1d" asttest1 \
>>> 	attributes standby="off"
>>> node $id="5e583c54" asttest2 \
>>> 	attributes standby="off"
>>> node asttest1
>>> node asttest2
>>> primitive asterisk lsb:asterisk-11.0.1 \
>>> 	op start interval="0" timeout="15s" \
>>> 	op stop interval="0" timeout="15s" \
>>> 	op monitor interval="1s" timeout="15s" start-delay="10"
>>> primitive dahdi lsb:dahdi \
>>> 	op start interval="0" timeout="15s" \
>>> 	op stop interval="0" timeout="15s" \
>>> 	op monitor interval="1s" timeout="15s"
>>> primitive drbd ocf:linbit:drbd \
>>> 	params drbd_resource="r0" \
>>> 	op monitor interval="29s" role="Master" \
>>> 	op monitor interval="31s" role="Slave"
>>> primitive fonulator lsb:fonulator \
>>> 	op start interval="0" timeout="20s" \
>>> 	op stop interval="0" timeout="20s" \
>>> 	op monitor interval="1s" timeout="20s" start-delay="30" \
>>> 	meta migration-threshold="1" failure-timeout="60s"
>>> primitive fs_drbd ocf:heartbeat:Filesystem \
>>> 	params device="/dev/drbd/by-res/r0" directory="/mnt/drbd" fstype="ext3" \
>>> 	op start interval="0" timeout="60s" start-delay="1" \
>>> 	op stop interval="0" timeout="60s" start-delay="1" \
>>> 	op monitor interval="1s" timeout="40s" start-delay="30" \
>>> 	meta is-managed="true" target-role="Started"
>>> primitive httpd lsb:apache2 \
>>> 	op start interval="0" timeout="20s" \
>>> 	op stop interval="0" timeout="20s" \
>>> 	op monitor interval="1s" timeout="20s" start-delay="10"
>>> primitive iax2_mon lsb:iax2_mon \
>>> 	op start interval="0" timeout="20s" \
>>> 	op stop interval="0" timeout="20s" \
>>> 	op monitor interval="60s" timeout="20s" start-delay="30" \
>>> 	meta failure-timeout="60s"
>>> primitive ip_voip_route_default ocf:heartbeat:Route \
>>> 	params destination="default" gateway="10.2.4.1" \
>>> 	op monitor interval="1s" timeout="20s"
>>> primitive ip_voip_route_test1 ocf:heartbeat:Route \
>>> 	params destination="X.X.X.X/32" gateway="X.X.X.X" \
>>> 	op monitor interval="1s" timeout="20s"
>>> primitive ip_voip_route_test2 ocf:heartbeat:Route \
>>> 	params destination="X.X.X.X/32" gateway="X.X.X.X.1" \
>>> 	op monitor interval="1s" timeout="20s"
>>> primitive ip_voip_eth0 ocf:heartbeat:IPaddr2 \
>>> 	params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="1" \
>>> 	op monitor interval="1s" timeout="20s"
>>> primitive ip_voip_eth1 ocf:heartbeat:IPaddr2 \
>>> 	params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="2" \
>>> 	op monitor interval="1s" timeout="20s"
>>> primitive ip_voip_eth2 ocf:heartbeat:IPaddr2 \
>>> 	params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="3" \
>>> 	op monitor interval="1s" timeout="20s"
>>> primitive ip_voip_eth3 ocf:heartbeat:IPaddr2 \
>>> 	params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="4" \
>>> 	op monitor interval="1s" timeout="20s"
>>> primitive ip_voip_eth4 ocf:heartbeat:IPaddr2 \
>>> 	params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="5" \
>>> 	op monitor interval="1s" timeout="20s"
>>> primitive ip_voip_eth5 ocf:heartbeat:IPaddr2 \
>>> 	params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="6" \
>>> 	op monitor interval="1s" timeout="20s"
>>> primitive ip_voip_eth6 ocf:heartbeat:IPaddr2 \
>>> 	params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="7" \
>>> 	op monitor interval="1s" timeout="20s"
>>> primitive ip_voip_eth8 ocf:heartbeat:IPaddr2 \
>>> 	params ip="X.X.X.X" cidr_netmask="24" nic="eth8" iflabel="1" \
>>> 	op monitor interval="1s" timeout="20s"
>>> primitive mysqld lsb:mysql \
>>> 	op monitor interval="1s" timeout="15s" start-delay="10"
>>> primitive tftp lsb:tftp-srce \
>>> 	op start interval="0" timeout="20s" \
>>> 	op stop interval="0" timeout="20s" \
>>> 	op monitor interval="60s" timeout="10s" start-delay="10"
>>> group ip_voip_addresses_p ip_voip_eth0 ip_voip_eth8 ip_voip_eth1
>>> ip_voip_eth2 ip_voip_eth3 ip_voip_eth4 ip_voip_eth5 ip_voip_eth6 \
>>> 	meta ordered="false" collocated="true" priority="8"
>>> group ip_voip_routes ip_voip_route_test1 ip_voip_route_test2 \
>>> 	meta ordered="false" collocated="true" priority="9"
>>> group voip mysqld dahdi fonulator asterisk iax2_mon httpd tftp \
>>> 	meta ordered="true" collocated="true" priority="10"
>>> ms ms_drbd drbd \
>>> 	meta master-max="1" master-node-max="1" clone-max="2"
>>> clone-node-max="1" notify="true" target-role="Master"
>>> clone cl_route ip_voip_route_default \
>>> 	meta target-role="Started"
>>> colocation fs_colocation inf: fs_drbd ms_drbd:Master
>>> colocation ip_colocation inf: ip_voip_addresses_p fs_drbd
>>> colocation ip_route_colocation inf: ip_voip_routes ip_voip_addresses_p
>>> colocation voip_colocation inf: voip ip_voip_addresses_p
>>> order fs_order inf: ms_drbd:promote fs_drbd:start
>>> order ip_order inf: fs_drbd:start ip_voip_addresses_p:start
>>> order ip_route_order inf: ip_voip_addresses_p:start ip_voip_routes:start
>>> order voip_order inf: ip_voip_routes:start voip:start
>>> property $id="cib-bootstrap-options" \
>>> 	dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>>> 	cluster-infrastructure="openais" \
>>> 	stonith-enabled="false" \
>>> 	expected-quorum-votes="2" \
>>> 	last-lrm-refresh="1375355273" \
>>> 	no-quorum-policy="ignore" \
>>> 	symmetric-cluster="true"
>>> 
>>> 
>>> And here is the state of the cluster after node fails:
>>> 	
>>> ============
>>> Last updated: Thu Aug  1 13:26:41 2013
>>> Last change: Thu Aug  1 13:07:53 2013
>>> Stack: openais
>>> Current DC: asttest1 - partition with quorum
>>> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
>>> 4 Nodes configured, 2 expected votes
>>> 24 Resources configured.
>>> ============
>>> 
>>> Online: [ asttest1 asttest2 ]
>>> OFFLINE: [ asttest1 asttest2 ]
>>> 
>>> Resource Group: voip
>>>    mysqld     (lsb:mysql):    Started asttest1
>>>    dahdi      (lsb:dahdi):    Started asttest1
>>>    fonulator  (lsb:fonulator):        Stopped
>>>    asterisk   (lsb:asterisk-11.0.1):  Stopped
>>>    iax2_mon   (lsb:iax2_mon): Stopped
>>>    httpd      (lsb:apache2):  Stopped
>>>    tftp       (lsb:tftp-srce):        Stopped
>>> Resource Group: ip_voip_routes
>>>    ip_voip_route_test1        (ocf::heartbeat:Route): Started asttest1
>>>    ip_voip_route_test2        (ocf::heartbeat:Route): Started asttest1
>>> Resource Group: ip_voip_addresses_p
>>>    ip_voip_eth0    (ocf::heartbeat:IPaddr2):       Started asttest1
>>>    ip_voip_eth8    (ocf::heartbeat:IPaddr2):       Started asttest1
>>>    ip_voip_eth1    (ocf::heartbeat:IPaddr2):       Started asttest1
>>>    ip_voip_eth2    (ocf::heartbeat:IPaddr2):       Started asttest1
>>>    ip_voip_eth3    (ocf::heartbeat:IPaddr2):       Started asttest1
>>>    ip_voip_eth4    (ocf::heartbeat:IPaddr2):       Started asttest1
>>>    ip_voip_eth5    (ocf::heartbeat:IPaddr2):       Started asttest1
>>>    ip_voip_eth6    (ocf::heartbeat:IPaddr2):       Started asttest1
>>> Clone Set: cl_route [ip_voip_route_default]
>>>    Started: [ asttest2 asttest1 ]
>>>    Stopped: [ ip_voip_route_default:2 ip_voip_route_default:3 ]
>>> fs_drbd (ocf::heartbeat:Filesystem):    Started asttest1
>>> Master/Slave Set: ms_drbd [drbd]
>>>    Masters: [ asttest1 ]
>>>    Slaves: [ asttest2 ]
>>> 
>>> Failed actions:
>>>   fonulator_monitor_1000 (node=asttest1, call=85, rc=7,
>>> status=complete): not running
>>> 
>>> 
> <cibadmin_Ql.txt>_______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list