[Pacemaker] Openais Question problem

Andrew Beekhof andrew at beekhof.net
Tue Dec 1 04:37:56 EST 2009


On Mon, Nov 30, 2009 at 10:17 PM, Emil Popov <epopov at postpath.com> wrote:
> Hi.
>
> I have a 26 nodes Openais cluster currently.
>
> Upon joining a node to the cluster meaning running: /etc/init.d/openais
> start
>
> There is the following error observed in the DC Cluster server :
>
>
>
> do_cib_control: Could not connect to the CIB service: connection failed
>
> Nov 30 21:17:00 pp0100pun021 crmd: [27990]: WARN: do_cib_control: Couldn't
> complete CIB registration 17 times... pause and retry
>
> Basically theDC role server starts timing out with the cib… the whole
> cluster loses control then.
>
> This appears to happen whenever a node is joined from Offline state to
> standby or online.

You could try setting batch-limit to a lower value (the default is 30).
This will throttle the number of actions that are performed in
parallel (and therefor the amount of work the cib has to do at any one
time).

This option is particularly relevant when nodes start up, since we
have to check the status of all known resources on the new host.


What version of pacemaker do you have?

>
>
>
> We are using: openais-0.80.5-13.1
>
> Has anyone observed similar behavior?
>
> Multicast had been configured on the switch as specified in the Openais
> guide.
>
>
>
>
>
> Any advice is greatly appreciated.
>
>
>
>
>
> Regards
>
> Emil
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>




More information about the Pacemaker mailing list