[Pacemaker] [Problem] The cluster fails in the stop of the node.

Thu Mar 29 19:55:54 EDT 2012

Hi Andrew,

> This appears to be resolved with 1.1.7, perhaps look for a patch to backport?

I confirm movement of Pacemaker 1.1.7.
And I talk about the backporting with Mr Mori.

Best Regards,
Hideo Yamauchi.

--- On Thu, 2012/3/29, Andrew Beekhof <andrew at beekhof.net> wrote:

> This appears to be resolved with 1.1.7, perhaps look for a patch to backport?
> 
> On Tue, Mar 27, 2012 at 4:46 PM,  <renayama19661014 at ybb.ne.jp> wrote:
> > Hi All,
> >
> > When we set a group resource within Master/Slave resource, we found the problem that a node could not stop.
> >
> > This problem occurs in Pacemaker1.0.11.
> >
> > We confirmed a problem in the following procedure.
> >
> > Step1) Start all nodes.
> >
> > ============
> > Last updated: Tue Mar 27 14:35:16 2012
> > Stack: Heartbeat
> > Current DC: test2 (b645c456-af78-429e-a40a-279ed063b97d) - partition WITHOUT quorum
> > Version: 1.0.12-unknown
> > 2 Nodes configured, unknown expected votes
> > 4 Resources configured.
> > ============
> >
> > Online: [ test1 test2 ]
> >
> >  Master/Slave Set: msGroup01
> >     Masters: [ test1 ]
> >     Slaves: [ test2 ]
> >  Resource Group: testGroup
> >     prmDummy1  (ocf::pacemaker:Dummy): Started test1
> >     prmDummy2  (ocf::pacemaker:Dummy): Started test1
> >  Resource Group: grpStonith1
> >     prmStonithN1       (stonith:external/ssh): Started test2
> >  Resource Group: grpStonith2
> >     prmStonithN2       (stonith:external/ssh): Started test1
> >
> > Migration summary:
> > * Node test2:
> > * Node test1:
> >
> > Step2) Stop Slave node.
> >
> > [root at test2 ~]# service heartbeat stop
> > Stopping High-Availability services: Done.
> >
> > Step3) Stop Master node. However, a loop does the Master node and does not stop.
> >
> > (snip)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: run_graph: Transition 3 (Complete=7, Pending=0, Fired=0, Skipped=0, Incomplete=23, Source=/var/lib/pengine/pe-input-3.bz2): Terminated
> > Mar 27 14:38:06 test1 crmd: [21443]: ERROR: te_graph_trigger: Transition failed: terminated
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Graph 3 (30 actions in 30 synapses): batch-limit=30 jobs, network-delay=60000ms
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Synapse 0 is pending (priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:     [Action 12]: Pending (id: testMsGroup01:0_stop_0, type: pseduo, priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 14]: Completed (id: testMsGroup01:0_demote_0, type: pseduo, priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 32]: Pending (id: msGroup01_stop_0, type: pseduo, priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Synapse 1 is pending (priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:     [Action 13]: Pending (id: testMsGroup01:0_stopped_0, type: pseduo, priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 8]: Pending (id: prmStateful1:0_stop_0, loc: test1, priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 9]: Pending (id: prmStateful2:0_stop_0, loc: test1, priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 12]: Pending (id: testMsGroup01:0_stop_0, type: pseduo, priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Synapse 2 was confirmed (priority: 0)
> > (snip)
> >
> > I attach data of hb_report.
> >
> > Best Regards,
> > Hideo Yamauchi.
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>