[Pacemaker] [Problem]The movement of the resource is not possible.

Wed Dec 1 21:21:18 EST 2010

Hi Andrew,

I send a patch to 1.1. 

Mr. Mori performs the backporting for 1.0.

Best Regards,
Hideo Yamauchi.

--- renayama19661014 at ybb.ne.jp wrote:

> Hi Andrew,
> 
> > > Can 1.0 reflect this revision?
> > > Because there is influence else, is it impossible?
> > 
> > I have no objection to it being added to 1.0, it should be safe.
> 
> Thanks.
> 
> About 1.0, I ask Mr. Mori for backporting.
> Will you revise 1.1?
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> --- Andrew Beekhof <andrew at beekhof.net> wrote:
> 
> > On Mon, Nov 29, 2010 at 5:11 AM,  <renayama19661014 at ybb.ne.jp> wrote:
> > > Hi Andrew,
> > >
> > > Sorry....
> > > My response was late.
> > >
> > >> I think the smartest thing to do here is drop the cib_scope_local flag from -f
> > >
> > > � � � �if(do_force) {
> > > � � � � � � �
> �crm_debug("Forcing...");
> > > /* � � � � � � �cib_options |=
> cib_scope_local|cib_quorum_override; */
> > > � � � � � � � �cib_options |=
> cib_quorum_override;
> > > � � � �}
> > >
> > >
> > > I confirmed movement with you according to a revision.
> > > The resource moves well.
> > >
> > > Can 1.0 reflect this revision?
> > > Because there is influence else, is it impossible?
> > 
> > I have no objection to it being added to 1.0, it should be safe.
> > 
> > >
> > > Best Regards,
> > > Hideo Yamauchi.
> > >
> > > --- Andrew Beekhof <andrew at beekhof.net> wrote:
> > >
> > >> 2010/11/8 �<renayama19661014 at ybb.ne.jp>:
> > >> > Hi,
> > >> >
> > >> > By two simple node constitution, it caused trouble(monitor error) in a resource.
> > >> >
> > >> > ============
> > >> > Last updated: Mon Nov �8 10:16:50 2010
> > >> > Stack: Heartbeat
> > >> > Current DC: srv02 (f80f87fd-cc09-43c7-80bc-8d9e96de376b) - partition WITHOUT quorum
> > >> > Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438
> > >> > 2 Nodes configured, unknown expected votes
> > >> > 1 Resources configured.
> > >> > ============
> > >> >
> > >> > Online: [ srv01 srv02 ]
> > >> >
> > >> > �Resource Group: grpDummy
> > >> > � � prmDummy1-1 � � � �(ocf::heartbeat:Dummy):
> > Started
> > > srv02
> > >> > � � prmDummy1-2 � � � �(ocf::heartbeat:Dummy):
> > Started
> > > srv02
> > >> > � � prmDummy1-3 � � � �(ocf::heartbeat:Dummy):
> > Started
> > > srv02
> > >> > � � prmDummy1-4 � � � �(ocf::heartbeat:Dummy):
> > Started
> > > srv02
> > >> >
> > >> > Migration summary:
> > >> > * Node srv02:
> > >> > * Node srv01:
> > >> > � prmDummy1-1: migration-threshold=1 fail-count=1
> > >> >
> > >> > Failed actions:
> > >> > � �prmDummy1-1_monitor_30000 (node=srv01, call=7, rc=7, status=complete):
> not
> > > running
> > >> >
> > >> >
> > >> > I carried out the next command consecutively after a resource exceeded a fail-over.
> > >> >
> > >> > [root at srv01 ~]# crm_resource -C -r prmDummy1-1 -N srv01;crm_resource -M -r grpDummy -N
> > srv01
> > >> -f -Q
> > >> >
> > >> > ============
> > >> > Last updated: Mon Nov �8 10:17:33 2010
> > >> > Stack: Heartbeat
> > >> > Current DC: srv02 (f80f87fd-cc09-43c7-80bc-8d9e96de376b) - partition WITHOUT quorum
> > >> > Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438
> > >> > 2 Nodes configured, unknown expected votes
> > >> > 1 Resources configured.
> > >> > ============
> > >> >
> > >> > Online: [ srv01 srv02 ]
> > >> >
> > >> > �Resource Group: grpDummy
> > >> > � � prmDummy1-1 � � � �(ocf::heartbeat:Dummy):
> > Started
> > > srv02
> > >> > � � prmDummy1-2 � � � �(ocf::heartbeat:Dummy):
> > Started
> > > srv02
> > >> > � � prmDummy1-3 � � � �(ocf::heartbeat:Dummy):
> > Started
> > > srv02
> > >> > � � prmDummy1-4 � � � �(ocf::heartbeat:Dummy):
> > Started
> > > srv02
> > >> >
> > >> > Migration summary:
> > >> > * Node srv02:
> > >> > * Node srv01:
> > >> >
> > >> > But, the resource does not move to a srv01 node.
> > >> >
> > >> > Does the "crm_resource -M" command have to carry it out after waiting for a S_IDLE state?
> > >> >
> > >> > Or is this phenomenon a bug?
> > >> >
> > >> > �* I attach a collection of hb_report file
> > >>
> > >> So the problem here is that not only does -f �enable logic in
> > >> move_resource(), but also
> > >>
> > >> � � � � � � � cib_options |=
> cib_scope_local|cib_quorum_override;
> > >>
> > >> Combined with the fact that crm_resource -C is not synchronous in 1.0,
> > >> if you run -M on a non-DC node, the updates hit the local cib while
> > >> the cluster is re-probing the resource(s).
> > >> This results in the two CIBs getting out of sync:
> > >> Nov �8 10:17:15 srv01 crmd: [5367]: WARN: cib_native_callback: CIB
> > >> command failed: Application of an update diff failed
> > >> Nov �8 10:17:15 srv01 crmd: [5367]: WARN: cib_native_callback: CIB
> > >> command failed: Application of an update diff failed
> > >>
> > >> and the process of re-syncing them results in the behavior you saw.
> > >>
> > >> I think the smartest thing to do here is drop the cib_scope_local flag from -f
> > >>
> > >> _______________________________________________
> > >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >>
> > >> Project Home: http://www.clusterlabs.org
> > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > >>
> > >
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > >
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: crm_resource.1343.patch
Type: application/octet-stream
Size: 372 bytes
Desc: 313313248-crm_resource.1343.patch
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101202/4b195b4a/attachment-0003.obj>