[Pacemaker] [Problem]The problem of the combination of Pacemaker and corosync1.2.7.

Mon Aug 2 03:19:58 EDT 2010

On Mon, Aug 2, 2010 at 3:17 AM,  <renayama19661014 at ybb.ne.jp> wrote:
> Hi,
>
> I confirmed movement when corosync1.2.7 combined Pacemaker.
>
> The combination is as follows.
>
>  * corosync 1.2.7
>  * Pacemaker-1-0-74392a28b7f3.tar
>  * Cluster-Resource-Agents-bfcc4e050a07.tar
>  * Reusable-Cluster-Components-8286b46c91e3.tar
>
>
> I confirmed the next movement in two nodes of a virtual machine(RHEL5.5 x84) and the real
> machine(RHEL5.5 x64).
> The resource arranged nothing.
>
> 1) When it started only in corosync, a node do not be hung up.(and when stopped)
> 2) When I put Pacemaker and corosync together and started, a node do not be hung up.(and when stopped)
>
> Only 20 number of times carried out the confirmation in each environment.(x86 and x64)
>
> Unfortunately the following problem occurred.
>  * The problem did not happen by the start only for corosync this time.(and when stopped)
>
> Problem 1) By the start of the virtual machine, a virtual machine is sometimes hungup.
>           Like a former problem, it is used nearly 100% for the CPU.
>
> Problem 2) There was the case that cannot constitute a cluster after start.
>
> Problem 3) There is a case to fail in the start of a cib process and the attrd process.
>
> Jul 30 14:25:46 x3650g attrd: [26258]: ERROR: ais_dispatch: Receiving message body failed: (2) Library
> error: Resource temporarily unavailable (11)
> Jul 30 14:25:46 x3650g attrd: [26258]: ERROR: ais_dispatch: AIS connection failed
> Jul 30 14:25:46 x3650g cib: [26256]: ERROR: ais_dispatch: Receiving message body failed: (2) Library
> error: Resource temporarily unavailable (11)
> Jul 30 14:25:46 x3650g cib: [26256]: ERROR: ais_dispatch: AIS connection failed
> Jul 30 14:25:46 x3650g attrd: [26258]: CRIT: attrd_ais_destroy: Lost connection to OpenAIS service!
> Jul 30 14:25:46 x3650g cib: [26256]: ERROR: cib_ais_destroy: AIS connection terminated
> Jul 30 14:25:46 x3650g attrd: [26258]: info: main: Exiting...
> Jul 30 14:25:46 x3650g attrd: [26258]: ERROR: attrd_cib_connection_destroy: Connection to the CIB
> terminated...
> Jul 30 14:25:46 x3650g stonithd: [26255]: ERROR: ais_dispatch: Receiving message body failed: (2)
> Library error: Success (0)
> Jul 30 14:25:46 x3650g stonithd: [26255]: ERROR: ais_dispatch
>
> Can this problem be settled in Pacemaker1.0 and corosync1.2.7?
>
> I know that a revision to replace communication with CPG in structure of new Pacemaker begins.
> When we combine corosync and use it, should we wait for a revision of CPG to be over?
> (Should we wait for Pacemaker1.1 system?)

No need to wait, the current tip of Pacemaker 1.1 is perfectly stable
(and included for RHEL6.0).
Almost all the testing has been done for 1.1.3, I've just been busy
helping out with some other projects at Red Hat and haven't had time
to do the actual release.

To make use of CPG-based communication, remove the "service" section
for pacemaker from corosync.conf and instead run:
   service pacemaker start
after starting corosync.

Once the 1.1.3 packages are out, this will be the official advice for
anyone experiencing startup/shutdown issues when using Pacemaker with
Corosync.
Calling fork() in a multi-threaded environment (corosync) is just far
too problematic.

>
> Because log is big, I contact it again after registering this problem with bugzilla.
>
> Best Regards,
> Hideo Yamauchi.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>