[Pacemaker] Problem : By colocations limitation, the resource appointment of the combination does not become effective.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Thu Mar 4 19:42:53 EST 2010
Hi All,
We test complicated colocation appointment.
We did resource appointment to start by limitation of colocation together.
But, the resource that set limitation starts when the resource that we appointed does not start in a
certain procedure.
We did the following appointment.
<rsc_colocation id="rsc_colocation01-1" rsc="UMgroup01" with-rsc="clnPingd" score="1000"/>
When clnPingd did not start, we met with the phenomenon that UMgroup01 started.
The procedure to generate a phenomenon is as follows.
STEP1) Start corosync.
STEP2) Send cib.xml to Pacemaker.
STEP3) A cluster is stable.
[root at srv01 ~]# crm_mon -1
============
Last updated: Wed Mar 3 13:21:21 2010
Stack: openais
Current DC: srv01 - partition with quorum
Version: 1.0.7-6e1815972fc236825bf3658d7f8451d33227d420
4 Nodes configured, 4 expected votes
13 Resources configured.
============
Online: [ srv01 srv02 srv03 srv04 ]
Resource Group: UMgroup01
UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
UmIPaddr (ocf::heartbeat:Dummy): Started srv01
UmDummy01 (ocf::heartbeat:Dummy): Started srv01
UmDummy02 (ocf::heartbeat:Dummy): Started srv01
Resource Group: OVDBgroup02-1
prmExPostgreSQLDB1 (ocf::heartbeat:Dummy): Started srv01
prmFsPostgreSQLDB1-1 (ocf::heartbeat:Dummy): Started srv01
prmFsPostgreSQLDB1-2 (ocf::heartbeat:Dummy): Started srv01
prmFsPostgreSQLDB1-3 (ocf::heartbeat:Dummy): Started srv01
prmIpPostgreSQLDB1 (ocf::heartbeat:Dummy): Started srv01
prmApPostgreSQLDB1 (ocf::heartbeat:Dummy): Started srv01
Resource Group: OVDBgroup02-2
prmExPostgreSQLDB2 (ocf::heartbeat:Dummy): Started srv02
prmFsPostgreSQLDB2-1 (ocf::heartbeat:Dummy): Started srv02
prmFsPostgreSQLDB2-2 (ocf::heartbeat:Dummy): Started srv02
prmFsPostgreSQLDB2-3 (ocf::heartbeat:Dummy): Started srv02
prmIpPostgreSQLDB2 (ocf::heartbeat:Dummy): Started srv02
prmApPostgreSQLDB2 (ocf::heartbeat:Dummy): Started srv02
Resource Group: OVDBgroup02-3
prmExPostgreSQLDB3 (ocf::heartbeat:Dummy): Started srv03
prmFsPostgreSQLDB3-1 (ocf::heartbeat:Dummy): Started srv03
prmFsPostgreSQLDB3-2 (ocf::heartbeat:Dummy): Started srv03
prmFsPostgreSQLDB3-3 (ocf::heartbeat:Dummy): Started srv03
prmIpPostgreSQLDB3 (ocf::heartbeat:Dummy): Started srv03
prmApPostgreSQLDB3 (ocf::heartbeat:Dummy): Started srv03
Resource Group: grpStonith1
prmStonithN1 (stonith:external/ssh): Started srv04
Resource Group: grpStonith2
prmStonithN2 (stonith:external/ssh): Started srv01
Resource Group: grpStonith3
prmStonithN3 (stonith:external/ssh): Started srv02
Resource Group: grpStonith4
prmStonithN4 (stonith:external/ssh): Started srv03
Clone Set: clnUMgroup01
Started: [ srv01 srv04 ]
Clone Set: clnPingd
Started: [ srv01 srv02 srv03 srv04 ]
Clone Set: clnDiskd1
Started: [ srv01 srv02 srv03 srv04 ]
Clone Set: clnG3dummy1
Started: [ srv01 srv02 srv03 srv04 ]
Clone Set: clnG3dummy2
Started: [ srv01 srv02 srv03 srv04 ]
STEP4) Camouflage a stop error of pingd of the srv01 node.
pingd_stop() {
exit $OCF_ERR_GENERIC
if [ -f $OCF_RESKEY_pidfile ]; then
pid=`cat $OCF_RESKEY_pidfile`
fi
STEP5) Stop a clnPingd clone.
[root at srv01 ~]# crm
crm(live)# resource
crm(live)resource# stop clnPingd
[root at srv01 ~]# crm_mon -1 -f
============
Last updated: Wed Mar 3 13:24:16 2010
Stack: openais
Current DC: srv01 - partition with quorum
Version: 1.0.7-6e1815972fc236825bf3658d7f8451d33227d420
4 Nodes configured, 4 expected votes
13 Resources configured.
============
Online: [ srv01 srv02 srv03 srv04 ]
Resource Group: UMgroup01
UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
UmIPaddr (ocf::heartbeat:Dummy): Started srv01
UmDummy01 (ocf::heartbeat:Dummy): Started srv01
UmDummy02 (ocf::heartbeat:Dummy): Started srv01
Resource Group: OVDBgroup02-1
prmExPostgreSQLDB1 (ocf::heartbeat:Dummy): Started srv01
prmFsPostgreSQLDB1-1 (ocf::heartbeat:Dummy): Started srv01
prmFsPostgreSQLDB1-2 (ocf::heartbeat:Dummy): Started srv01
prmFsPostgreSQLDB1-3 (ocf::heartbeat:Dummy): Started srv01
prmIpPostgreSQLDB1 (ocf::heartbeat:Dummy): Started srv01
prmApPostgreSQLDB1 (ocf::heartbeat:Dummy): Started srv01
Resource Group: grpStonith1
prmStonithN1 (stonith:external/ssh): Started srv04
Resource Group: grpStonith2
prmStonithN2 (stonith:external/ssh): Started srv01
Resource Group: grpStonith3
prmStonithN3 (stonith:external/ssh): Started srv02
Resource Group: grpStonith4
prmStonithN4 (stonith:external/ssh): Started srv03
Clone Set: clnUMgroup01
Started: [ srv01 srv04 ]
Clone Set: clnDiskd1
Started: [ srv01 srv02 srv03 srv04 ]
Clone Set: clnG3dummy1
Started: [ srv01 srv02 srv03 srv04 ]
Clone Set: clnG3dummy2
Started: [ srv01 srv02 srv03 srv04 ]
Migration summary:
* Node srv02:
* Node srv04:
* Node srv03:
* Node srv01:
clnPrmPingd:0: migration-threshold=10 fail-count=1000000
STEP6) Return a revision of pingd.
pingd_stop() {
# exit $OCF_ERR_GENERIC
if [ -f $OCF_RESKEY_pidfile ]; then
pid=`cat $OCF_RESKEY_pidfile`
fi
STEP7) Reboot a srv01 node.
STEP8) Wait for completion of STONITH.(STONITH is completed by a retry)
[root at srv02 ~]# crm_mon -1
============
Last updated: Wed Mar 3 13:34:12 2010
Stack: openais
Current DC: srv02 - partition with quorum
Version: 1.0.7-6e1815972fc236825bf3658d7f8451d33227d420
4 Nodes configured, 4 expected votes
13 Resources configured.
============
Online: [ srv02 srv03 srv04 ]
OFFLINE: [ srv01 ]
Resource Group: grpStonith1
prmStonithN1 (stonith:external/ssh): Started srv04
Resource Group: grpStonith2
prmStonithN2 (stonith:external/ssh): Started srv03
Resource Group: grpStonith3
prmStonithN3 (stonith:external/ssh): Started srv02
Resource Group: grpStonith4
prmStonithN4 (stonith:external/ssh): Started srv03
Clone Set: clnUMgroup01
Started: [ srv04 ]
Stopped: [ clnUmResource:0 ]
Clone Set: clnDiskd1
Started: [ srv02 srv03 srv04 ]
Stopped: [ clnPrmDiskd1:0 ]
Clone Set: clnG3dummy1
Started: [ srv02 srv03 srv04 ]
Stopped: [ clnG3dummy01:0 ]
Clone Set: clnG3dummy2
Started: [ srv02 srv03 srv04 ]
Stopped: [ clnG3dummy02:0 ]
STEP9) Start corosync in srv01 which rebooted.
[root at srv02 ~]# crm_mon -1
============
Last updated: Wed Mar 3 13:37:57 2010
Stack: openais
Current DC: srv02 - partition with quorum
Version: 1.0.7-6e1815972fc236825bf3658d7f8451d33227d420
4 Nodes configured, 4 expected votes
13 Resources configured.
============
Online: [ srv01 srv02 srv03 srv04 ]
Resource Group: UMgroup01
UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
UmIPaddr (ocf::heartbeat:Dummy): Started srv01
UmDummy01 (ocf::heartbeat:Dummy): Started srv01
UmDummy02 (ocf::heartbeat:Dummy): Started srv01
Resource Group: OVDBgroup02-1
prmExPostgreSQLDB1 (ocf::heartbeat:Dummy): Started srv01
prmFsPostgreSQLDB1-1 (ocf::heartbeat:Dummy): Started srv01
prmFsPostgreSQLDB1-2 (ocf::heartbeat:Dummy): Started srv01
prmFsPostgreSQLDB1-3 (ocf::heartbeat:Dummy): Started srv01
prmIpPostgreSQLDB1 (ocf::heartbeat:Dummy): Started srv01
prmApPostgreSQLDB1 (ocf::heartbeat:Dummy): Started srv01
Resource Group: grpStonith1
prmStonithN1 (stonith:external/ssh): Started srv04
Resource Group: grpStonith2
prmStonithN2 (stonith:external/ssh): Started srv03
Resource Group: grpStonith3
prmStonithN3 (stonith:external/ssh): Started srv02
Resource Group: grpStonith4
prmStonithN4 (stonith:external/ssh): Started srv03
Clone Set: clnUMgroup01
Started: [ srv01 srv04 ]
Clone Set: clnDiskd1
Started: [ srv01 srv02 srv03 srv04 ]
Clone Set: clnG3dummy1
Started: [ srv01 srv02 srv03 srv04 ]
Clone Set: clnG3dummy2
Started: [ srv01 srv02 srv03 srv04 ]
[root at srv02 ~]# ptest -L -s | grep UmVIPcheck
group_color: UmVIPcheck allocation score on srv01: 300
group_color: UmVIPcheck allocation score on srv02: -1000000
group_color: UmVIPcheck allocation score on srv03: -1000000
group_color: UmVIPcheck allocation score on srv04: -1000000
native_color: UmVIPcheck allocation score on srv01: 1600
native_color: UmVIPcheck allocation score on srv02: -1000000
native_color: UmVIPcheck allocation score on srv03: -1000000
native_color: UmVIPcheck allocation score on srv04: -1000000
But clnPingd does not start in srv01, but UMgroup01 starts after this.
* Because there was colocation limitation, we did not expect start of UMgroup01.
Is there an error for my setting?
Or is it a bug?
Or is this right movement?
I attached the thing which added a pengine directory of srv02 to a result of hb_report.
But, I delete it and attach it because a file is big as for the information of srv01,srv03,srv04.
Best Regards,
Hideo Yamauchi.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb_report996.tar.gz
Type: application/x-gzip-compressed
Size: 256355 bytes
Desc: 2393974864-hb_report996.tar.gz
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100305/1bfaaa37/attachment-0002.bin>
More information about the Pacemaker
mailing list