[Pacemaker] Clones restart on node recovery

Andrew Beekhof andrew at beekhof.net
Thu Jun 10 03:24:00 EDT 2010


On Wed, Jun 9, 2010 at 6:46 AM, jraditchkov at gmail.com
<jraditchkov at gmail.com> wrote:
> Hi hopefully osmeone can help. I have little experience with pacemaker
> and possibly I do something wrong.
>
> I have the follwoing design:
>
> Two hardware nodes
> Part of the services are 100% redundant on both nodes - we use Clones
> for them they are redundant and essential for the system to run at
> least one resource
> The rest of services must failover from one node to another (HB, DP,
> WEB) and are used as individual resources
>
> The configuration mostly works:
> 1. we can start the cluster; it initializes OK and services start - OK
> 2. when we bring node1 or node2 to standby services properly failover
> and within the clones only the active node is active - everything
> works great - OK
> 3. We experience a problem when we bring online the stadby node up.
> For some reason the all clones restart themselves rather than only the
> failed resources in the clone (although the order constraints are set
> to advisory).

Can you create a bug for this and include a hb_report of scenario 3 please?

>
> The system does not restart the resources which are not in clones,
> only the Clone sets.
> In the logs we see that resources are shuffled for no apparent reason
> between the host nodes which makes them restart:
>    Move resource MYSQL:1   (Started node01 -> node02)
>
> We believe it should only start the resource on the new online node
> rather than restarting the Clone and movin the resource to the other
> node.
>
> Please, help. Is there something we are doing wrong conceptually?
>
> Below are some debugs of what is happening as well as our config file
> at the bottom.
>
>
> I. INITIAL STATE - both machines are online (OK)
> ================================================================================
> ============
> Last updated: Wed Jun  9 02:55:36 2010
> Stack: openais
> Current DC: node01 - partition with quorum
> Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
> 2 Nodes configured, 2 expected votes
> 8 Resources configured.
> ============
>
> Online: [ node01 node02 ]
>
> HB      (ocf::rs:MyRA):   Started node01
> DP      (ocf::rs:MyRA):   Started node01
> WEB     (ocf::rs:MyRA):   Started node01
>  Clone Set: MYSQL-CLONE
>     Started: [ node02 node01 ]
>  Clone Set: NDBD-CLONE
>     Started: [ node02 node01 ]
>  Clone Set: NDB_MGMD-CLONE
>     Started: [ node02 node01 ]
>  Clone Set: DN-CLONE
>     Started: [ node02 node01 ]
>  Clone Set: RS-CLONE
>     Started: [ node02 node01 ]
>
>
>
>
>
> II. BRING SECOND NODE TO STANDBY - standby node2 (OK)
> ================================================================================
>
> Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource HB       (Started node01)
> Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource DP       (Started node01)
> Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource WEB      (Started node01)
> Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
> resource MYSQL:0   (node02)
> Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource MYSQL:1  (Started node01)
> Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
> resource NDBD:0    (node02)
> Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDBD:1   (Started node01)
> Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
> resource NDB_MGMD:0        (node02)
> Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDB_MGMD:1       (Started node01)
> Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
> resource DN:0    (node02)
> Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource DN:1   (Started node01)
> Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
> resource RS:0     (node02)
> Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource RS:1    (Started node01)
>
>
> ============
> Last updated: Wed Jun  9 04:20:28 2010
> Stack: openais
> Current DC: node01 - partition with quorum
> Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
> 2 Nodes configured, 2 expected votes
> 8 Resources configured.
> ============
>
> Node node02: standby
> Online: [ node01 ]
>
> HB      (ocf::rs:MyRA):   Started node01
> DP      (ocf::rs:MyRA):   Started node01
> WEB     (ocf::rs:MyRA):   Started node01
>  Clone Set: MYSQL-CLONE
>     Started: [ node01 ]
>     Stopped: [ MYSQL:0 ]
>  Clone Set: NDBD-CLONE
>     Started: [ node01 ]
>     Stopped: [ NDBD:0 ]
>  Clone Set: NDB_MGMD-CLONE
>     Started: [ node01 ]
>     Stopped: [ NDB_MGMD:0 ]
>  Clone Set: DN-CLONE
>     Started: [ node01 ]
>     Stopped: [ DN:0 ]
>  Clone Set: RS-CLONE
>     Started: [ node01 ]
>     Stopped: [ RS:0 ]
>
>
> III. online node2
> ================================================================================
> Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource HB       (Started node01)
> Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource DP       (Started node01)
> Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource WEB      (Started node01)
> Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> MYSQL:0   (node01)
> Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource MYSQL:1   (Started node01 -> node02)
> Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> NDBD:0    (node01)
> Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource NDBD:1    (Started node01 -> node02)
> Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> NDB_MGMD:0        (node01)
> Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource NDB_MGMD:1        (Started node01 -> node02)
> Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> DN:0    (node01)
> Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource DN:1    (Started node01 -> node02)
> Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> RS:0     (node01)
> Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource RS:1     (Started node01 -> node02)
> Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource HB       (Started node01)
> Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource DP       (Started node01)
> Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource WEB      (Started node01)
> Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> MYSQL:0   (node02)
> Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> MYSQL:1   (node01)
> Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource NDBD:0    (Started node01 -> node02)
> Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource NDBD:1    (Started node02 -> node01)
> Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource NDB_MGMD:0        (Started node01 -> node02)
> Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource NDB_MGMD:1        (Started node02 -> node01)
> Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> DN:0    (node02)
> Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> DN:1    (node01)
> Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
> resource RS:0     (node01)
> Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Recover
> resource RS:1  (Started node02)
> Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource HB       (Started node01)
> Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource DP       (Started node01)
> Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource WEB      (Started node01)
> Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> MYSQL:0   (node02)
> Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> MYSQL:1   (node01)
> Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDBD:0   (Started node02)
> Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDBD:1   (Started node01)
> Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDB_MGMD:0       (Started node02)
> Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDB_MGMD:1       (Started node01)
> Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> DN:0    (node02)
> Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> DN:1    (node01)
> Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource RS:0    (Stopped)
> Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Recover
> resource RS:1  (Started node02)
> Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource HB       (Started node01)
> Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource DP       (Started node01)
> Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource WEB      (Started node01)
> Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource MYSQL:0  (Started node02)
> Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource MYSQL:1  (Started node01)
> Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDBD:0   (Started node02)
> Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDBD:1   (Started node01)
> Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDB_MGMD:0       (Started node02)
> Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDB_MGMD:1       (Started node01)
> Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> DN:0    (node02)
> Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> DN:1    (node01)
> Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> RS:0     (node02)
> Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource RS:1    (Stopped)
>
> ============
> Last updated: Wed Jun  9 04:31:27 2010
> Stack: openais
> Current DC: node01 - partition with quorum
> Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
> 2 Nodes configured, 2 expected votes
> 8 Resources configured.
> ============
>
> Online: [ node01 node02 ]
>
> HB      (ocf::rs:MyRA):   Started node01
> DP      (ocf::rs:MyRA):   Started node01
> WEB     (ocf::rs:MyRA):   Started node01
>  Clone Set: MYSQL-CLONE
>     Started: [ node02 node01 ]
>  Clone Set: NDBD-CLONE
>     Started: [ node02 node01 ]
>  Clone Set: NDB_MGMD-CLONE
>     Started: [ node02 node01 ]
>  Clone Set: DN-CLONE
>     Started: [ node02 node01 ]
>  Clone Set: RS-CLONE
>     Started: [ node02 ]
>     Stopped: [ RS:1 ]
>
> Failed actions:
>    RS:0_start_0 (node=node01, call=29, rc=1, status=complete): unknown error
>    RS:1_monitor_10000 (node=node02, call=26, rc=-2, status=Timed
> Out): unknown exec error
>
>
>
>
> CONFIG FILE
> ==========================
> node node01
> node node02
>
> primitive HB ocf:rs:MyRA \
>        params veid="105" \
>        params proc="hbd" \
>        op start interval="0" timeout="120" \
>        op stop interval="0" timeout="120" \
>        op monitor interval="10" timeout="10" depth="0"
>
> primitive DP ocf:rs:MyRA \
>        params veid="106" \
>        params proc="rsd" \
>        op start interval="0" timeout="120" \
>        op stop interval="0" timeout="120" \
>        op monitor interval="10" timeout="10" depth="0"
>
> primitive MYSQL ocf:rs:MyRA \
>        params veid="108" \
>        params proc="mysqld" \
>        op start interval="0" timeout="120" \
>        op stop interval="0" timeout="120" \
>        op monitor interval="10" timeout="10" depth="0"
>
> primitive NDBD ocf:rs:MyRA \
>        params veid="102" \
>        params proc="ndbd" \
>        op start interval="0" timeout="7200" \
>        op stop interval="0" timeout="300" \
>        op monitor interval="10" timeout="10" depth="0"
>
> primitive NDB_MGMD ocf:rs:MyRA \
>        params veid="101" \
>        params proc="ndb_mgmd" \
>        op start interval="0" timeout="120" \
>        op stop interval="0" timeout="120" \
>        op monitor interval="10" timeout="10" depth="0"
>
> primitive DN ocf:rs:MyRA \
>        params veid="103" \
>        params proc="dnd" \
>        op start interval="0" timeout="120" \
>        op stop interval="0" timeout="120" \
>        op monitor interval="10" timeout="10" depth="0"
>
> primitive RS ocf:rs:MyRA \
>        params veid="104" \
>        params proc="rsd" \
>        op start interval="0" timeout="120" \
>        op stop interval="0" timeout="120" \
>        op monitor interval="10" timeout="10" depth="0"
>
> primitive WEB ocf:rs:MyRA \
>        params veid="107" \
>        params proc="httpd" \
>        op start interval="0" timeout="120" \
>        op stop interval="0" timeout="120" \
>        op monitor interval="10" timeout="10" depth="0"
>
> clone MYSQL-CLONE MYSQL \
>        meta interleave="true"
> clone NDBD-CLONE NDBD \
>        meta interleave="true"
> clone NDB_MGMD-CLONE NDB_MGMD \
>        meta interleave="true"
> clone DN-CLONE DN \
>        meta interleave="true"
> clone RS-CLONE RS \
>        meta interleave="true"
>
> location HB_LOC_1 HB 200: node01
> location DP_LOC_1 DP 200: node01
> location WEB_LOC_1 WEB 200: node01
> location HB_LOC_2 HB 100: node02
> location DP_LOC_2 DP 100: node02
> location WEB_LOC_2 WEB 100: node02
>
> order NDB_MGMD-CLONE_before_NDBD-CLONE advisory: NDB_MGMD-CLONE NDBD-CLONE
> order NDBD-CLONE_before_MYSQL-CLONE advisory: NDBD-CLONE MYSQL-CLONE
> order MYSQL_CLONE_before_HB advisory: MYSQL-CLONE HB
> order MYSQL-CLONE_before_DN-CLONE advisory: MYSQL-CLONE DN-CLONE
> order MYSQL-CLONE_before_WEB advisory: MYSQL-CLONE WEB
> order HB_before_DP advisory: HB DP
> order HB_before_RS-CLONE advisory: HB RS-CLONE
>
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="2" \
>        stonith-enabled="false" \
>        no-quorum-policy="ignore" \
>        last-lrm-refresh="1273876473"
>
> rsc_defaults $id="rsc-options" \
>        resource-stickiness="0"
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>




More information about the Pacemaker mailing list