[Pacemaker] Pacemaker 1.1.8 and corosync's cpg service?

Mon May 20 21:15:56 EDT 2013

On 21/05/2013, at 7:45 AM, Mike Edwards <pf-pacemaker at mirkwood.net> wrote:

> I'm attempting to set up a test cluster consisting of two VMs on CentOS
> 6.4, but have run up against a wall with this fairly simple config.
> 
> ### start output ###
> # service corosync start
> Starting Corosync Cluster Engine (corosync):               [  OK  ]
> # pacemakerd start -V
> non-option ARGV-elements: start 
> Could not establish pacemakerd connection: Connection refused (111)
>    info: crm_ipc_connect: 	Could not establish pacemakerd
> connection: Connection refused (111)
>    info: config_find_next: 	Processing additional service
> options...
>    info: get_config_opt: 	Found 'pacemaker' for option: name
>    info: get_config_opt: 	Found '1' for option: ver
>    info: get_cluster_type: 	Detected an active 'classic openais
> (with plugin)' cluster
>    info: read_config: 	Reading configure for stack: classic openais
> (with plugin)
>    info: config_find_next: 	Processing additional service
> options...
>    info: get_config_opt: 	Found 'pacemaker' for option: name
>    info: get_config_opt: 	Found '1' for option: ver
>    info: get_config_opt: 	Defaulting to 'no' for option: use_logd
>    info: get_config_opt: 	Defaulting to 'no' for option:
> use_mgmtd
>    info: config_find_next: 	Processing additional logging
> options...
>    info: get_config_opt: 	Found 'on' for option: debug
>    info: get_config_opt: 	Found 'yes' for option: to_logfile
>    info: get_config_opt: 	Defaulting to '/var/log/pacemaker' for
> option: logfile
>    info: get_config_opt: 	Found 'yes' for option: to_syslog
>    info: get_config_opt: 	Defaulting to 'daemon' for option:
> syslog_facility
>  notice: crm_add_logfile: 	Additional logging available in
> /var/log/pacemaker
>  notice: main: 	Starting Pacemaker 1.1.8-7.el6 (Build:
> 394e906):  generated-manpages agent-manpages ascii-docs publican-docs
> ncurses libqb-logging libqb-ipc  corosync-plugin cman
>    info: main: 	Maximum core file size is: 18446744073709551615
>    info: qb_ipcs_us_publish: 	server name: pacemakerd
>   debug: cluster_connect_cfg: 	Our nodeid: 840370698
>   debug: cluster_connect_cpg: 	Our nodeid: 840370698
>   debug: cluster_connect_cpg: 	Retrying operation after 1s
>   debug: cluster_connect_cpg: 	Retrying operation after 2s
>   debug: cluster_connect_cpg: 	Retrying operation after 3s
>   debug: cluster_connect_cpg: 	Retrying operation after 4s
>   debug: cluster_connect_cpg: 	Retrying operation after 5s
>   debug: cluster_connect_cpg: 	Retrying operation after 6s
>   debug: cluster_connect_cpg: 	Retrying operation after 7s
>   debug: cluster_connect_cpg: 	Retrying operation after 8s
>   debug: cluster_connect_cpg: 	Retrying operation after 9s
>   debug: cluster_connect_cpg: 	Retrying operation after 10s
>   debug: cluster_connect_cpg: 	Retrying operation after 11s
>   debug: cluster_connect_cpg: 	Retrying operation after 12s
>   debug: cluster_connect_cpg: 	Retrying operation after 13s
>   debug: cluster_connect_cpg: 	Retrying operation after 14s
>   debug: cluster_connect_cpg: 	Retrying operation after 15s
>   debug: cluster_connect_cpg: 	Retrying operation after 16s
>   debug: cluster_connect_cpg: 	Retrying operation after 17s
>   debug: cluster_connect_cpg: 	Retrying operation after 18s
>   debug: cluster_connect_cpg: 	Retrying operation after 19s
>   debug: cluster_connect_cpg: 	Retrying operation after 20s
>   debug: cluster_connect_cpg: 	Retrying operation after 21s
>   debug: cluster_connect_cpg: 	Retrying operation after 22s
>   debug: cluster_connect_cpg: 	Retrying operation after 23s
>   debug: cluster_connect_cpg: 	Retrying operation after 24s
>   debug: cluster_connect_cpg: 	Retrying operation after 25s
>   debug: cluster_connect_cpg: 	Retrying operation after 26s
>   debug: cluster_connect_cpg: 	Retrying operation after 27s
>   debug: cluster_connect_cpg: 	Retrying operation after 28s
>   debug: cluster_connect_cpg: 	Retrying operation after 29s
>   debug: cluster_connect_cpg: 	Retrying operation after 30s
>   error: cluster_connect_cpg: 	Could not join the CPG group
> 'pacemakerd': 6

cpg_join() is returning CS_ERR_TRY_AGAIN here.

Jan: Any idea why this might happen?  Thats a fair time to be blocked for.

>   error: main: 	Couldn't connect to Corosync's CPG service
> ### end output ###
> 
> 
> Any ideas?  My configs are below.
> 
> 
> 
> corosync.conf:
> totem {
> version: 2
> secauth: off
> cluster_name: pdtest
> transport: udpu
>        interface {
>                ringnumber: 0
>                bindnetaddr: 10.10.23.50
>                mcastport: 5405
>                broadcast: yes
>                ttl: 1
>        }
> }
> 
> nodelist {
>  node {
>        ring0_addr: 10.10.23.50
>        nodeid: 1
>       }
>  node {
>        ring0_addr: 10.10.23.51
>        nodeid: 2
>       }
> }
> 
> quorum {
>  provider: corosync_votequorum
> }
> 
> logging {
>  debug: on
>  timestamp: on
>  to_logfile: yes
>  to_syslog: yes
> }
> 
> 
> service.d/pacemaker:
> service {
>        # Load the Pacemaker Cluster Resource Manager
>        name: pacemaker
>        ver:  1
> }
> 
> 
> -- 
> 
> Mike Edwards                    |   If this email address disappears,   
> Unsolicited advertisments to    |   assume it was spammed to death.  To
> this address are not welcome.   |   reach me in that case, s/-.*@/@/
> 
> "Our progress as a nation can be no swifter than our progress in education.
> The human mind is our fundamental resource."
>  -- John F. Kennedy
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org