[Pacemaker] Corosync / Pacemaker Cluster crashing

Dan Frincu df.cluster at gmail.com
Fri Apr 20 06:30:16 EDT 2012


Hi,

On Fri, Apr 20, 2012 at 1:08 PM, Bensch, Kobus
<kobus.bensch at bauerservices.co.uk> wrote:
> Hi
>
> I have the following cluster setup:
>
> 2 physical Dell servers with RHEL6.2 with all the latest patches.
>
> Each server has 3 network connections that looks like this:
>
> BOND0 2 NIC's
>
> ETH4 for Corosync
> ETH6 for corosync
>
> This is the corosync config:
> Cocorsync.conf
> aisexec {
> group: root
> user: root
> }
>
> compatibility: whitetank
> service {
> use_mgmtd: yes
> use_logd: yes
> ver: 0
> name: pacemaker
> }
> totem {
> rrp_mode: active
> join: 180
> max_messages: 20
> vsftype: none
> token: 5000
> consensus: 6000
> secauth: on
> token_retransmits_before_loss_const: 10
> threads: 0
> #threads: 16
> version: 2
> interface {
> bindnetaddr: 10.255.1.0
> mcastaddr: 232.10.1.1
> mcastport: 5405
> ringnumber: 0
> ttl: 1
> }
> interface {
> bindnetaddr: 10.255.2.0
> mcastaddr: 232.10.2.1
> mcastport: 5405
> ringnumber: 1
> ttl: 1
> }
> clear_node_high_bit: yes
> }
> logging {
> to_logfile: yes
> to_syslog: yes
> debug: off
> timestamp: on
> logfile: /var/log/cluster/corosync.log
> to_stderr: no
> fileline: off
> syslog_facility: daemon
> }
> amf {
> mode: disabled
> }
>
> The pacemaker plugin:
> /etc/corosync/service.d/pcmk
> service {
>         # Load the Pacemaker Cluster Resource Manager
>         name: pacemaker
>         ver:  1
> }
>
> Corosync keeps crashing when I try to do anything in the crm cli. Whether it
> is moving resources, creating resources, it does not matter.
>
> The corosync config for now is very simple and looks like this:
> node lxdcv01nd01
> node lxdcv01nd02
> primitive lcdcv01 ocf:heartbeat:IPaddr2 \
> params ip="10.1.0.95" cidr_netmask="32" \
> op monitor interval="30s"
> primitive local-manage ocf:heartbeat:IPaddr2 \
> params ip="127.0.2.1" cidr_netmask="32" \
> op monitor interval="30s"
> location cli-prefer-lcdcv01 lcdcv01 \
> rule $id="cli-prefer-rule-lcdcv01" inf: #uname eq lxdcv01nd02
> location cli-prefer-local-manage local-manage \
> rule $id="cli-prefer-rule-local-manage" inf: #uname eq lxdcv01nd02
> property $id="cib-bootstrap-options" \
> dc-version="1.0.12-unknown" \

First glitch in the matrix, what version of Pacemaker are you running?
1.0.12-unknown seems fishy (self compiled maybe?)

> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore"
>
> I tried to disable various config lines but still no joy. Any help would be
> appreciated.
>
> When the server crashes I get this in the log:
> Apr 20 10:54:17 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:17 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:17 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:18 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:18 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:18 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:18 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:19 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:19 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:19 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:19 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:20 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:20 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:20 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:20 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:21 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:21 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:21 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:21 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:22 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:22 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [21259]: ERROR:
> ais_dispatch: Receiving message body failed: (2) Library error: Resource
> temporarily unavailable (11)
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [21254]:
> ERROR: ais_dispatch: Receiving message body failed: (2) Library error:
> Invalid argument (22)
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [21257]: ERROR:
> ais_dispatch: Receiving message body failed: (2) Library error: Resource
> temporarily unavailable (11)
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group cib: [21255]: ERROR:
> ais_dispatch: Receiving message body failed: (2) Library error: Resource
> temporarily unavailable (11)
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [21259]: ERROR:
> ais_dispatch: AIS connection failed
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [21254]:
> ERROR: ais_dispatch: AIS connection failed
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [21257]: ERROR:
> ais_dispatch: AIS connection failed
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [21259]: ERROR:
> crm_ais_destroy: AIS connection terminated
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group cib: [21255]: ERROR:
> ais_dispatch: AIS connection failed
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [21254]:
> ERROR: AIS connection terminated
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [21257]: CRIT:
> attrd_ais_destroy: Lost connection to OpenAIS service!
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group cib: [21255]: ERROR:
> cib_ais_destroy: AIS connection terminated
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [21257]: info:
> main: Exiting...
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [21257]: ERROR:
> attrd_cib_connection_destroy: Connection to the CIB terminated...
> Apr 20 10:54:36 corosync [MAIN  ] Corosync Cluster Engine ('1.2.7'): started
> and ready to provide service.

Corosync 1.2.7 and rrp_mode active don't work well, I say "well"
because I think they don't work at all but just to be on the safe
side, I say "don't work well".

IIRC for RHEL 6.2 Corosync is at 1.4.x which begs the question, where
did you get your packages from.

Are you running on Oracle's RHEL clone by any chance?

> Apr 20 10:54:36 corosync [MAIN  ] Corosync built-in features: nss rdma
> Apr 20 10:54:36 corosync [MAIN  ] Successfully read main configuration file
> '/etc/corosync/corosync.conf'.
> Apr 20 10:54:36 corosync [TOTEM ] Initializing transport (UDP/IP).
> Apr 20 10:54:36 corosync [TOTEM ] Initializing transmit/receive security:
> libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Apr 20 10:54:36 corosync [TOTEM ] Initializing transport (UDP/IP).
> Apr 20 10:54:36 corosync [TOTEM ] Initializing transmit/receive security:
> libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Apr 20 10:54:36 corosync [TOTEM ] The network interface [10.255.1.1] is now
> up.
> Apr 20 10:54:36 corosync [pcmk  ] info: process_ais_conf: Reading configure
> Set r/w permissions for uid=0, gid=0 on /var/log/cluster/corosync.log
> Apr 20 10:54:36 corosync [pcmk  ] info: config_find_init: Local handle:
> 5650605097994944514 for logging
> Apr 20 10:54:36 corosync [pcmk  ] info: config_find_next: Processing
> additional logging options...
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Found 'off' for
> option: debug
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Found 'yes' for
> option: to_logfile
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Found
> '/var/log/cluster/corosync.log' for option: logfile
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Found 'yes' for
> option: to_syslog
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Found 'daemon' for
> option: syslog_facility
> Apr 20 10:54:36 corosync [pcmk  ] info: config_find_init: Local handle:
> 2730409743423111171 for service
> Apr 20 10:54:36 corosync [pcmk  ] info: config_find_next: Processing
> additional service options...
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Defaulting to 'pcmk'
> for option: clustername
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Found 'yes' for
> option: use_logd
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Found 'yes' for
> option: use_mgmtd
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized
> Apr 20 10:54:36 corosync [pcmk  ] Logging: Initialized pcmk_startup
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_startup: Maximum core file size
> is: 18446744073709551615
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_startup: Service: 9
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_startup: Local hostname:
> lxdcv01nd01.bauer-uk.bauermedia.group
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_update_nodeid: Local node id:
> 16908042
> Apr 20 10:54:36 corosync [pcmk  ] info: update_member: Creating entry for
> node 16908042 born on 0
> Apr 20 10:54:36 corosync [pcmk  ] info: update_member: 0x18db8e0 Node
> 16908042 now known as lxdcv01nd01.bauer-uk.bauermedia.group (was: (null))
> Apr 20 10:54:36 corosync [pcmk  ] info: update_member: Node
> lxdcv01nd01.bauer-uk.bauermedia.group now has 1 quorum votes (was 0)
> Apr 20 10:54:36 corosync [pcmk  ] info: update_member: Node
> 16908042/lxdcv01nd01.bauer-uk.bauermedia.group is now: member
> Apr 20 10:54:36 corosync [pcmk  ] info: spawn_child: Forked child 22445 for
> process stonithd
> Apr 20 10:54:36 corosync [pcmk  ] info: spawn_child: Forked child 22446 for
> process cib
> Apr 20 10:54:36 corosync [pcmk  ] info: spawn_child: Forked child 22447 for
> process lrmd
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [21256]: info:
> lrmd is shutting down
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> WARN: Initializing connection to logging daemon failed. Logging daemon may
> not be running
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]: WARN:
> Initializing connection to logging daemon failed. Logging daemon may not be
> running
> Apr 20 10:54:36 corosync [pcmk  ] info: spawn_child: Forked child 22448 for
> process attrd
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: WARN:
> Initializing connection to logging daemon failed. Logging daemon may not be
> running
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: G_main_add_SignalHandler: Added signal handler for signal 10
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]: info:
> Signal sent to pid=21256, waiting for process to exit
> Apr 20 10:54:36 corosync [pcmk  ] info: spawn_child: Forked child 22449 for
> process pengine
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: WARN:
> Initializing connection to logging daemon failed. Logging daemon may not be
> running
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> Invoked: /usr/lib64/heartbeat/cib
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: G_main_add_SignalHandler: Added signal handler for signal 12
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group pengine: [22449]:
> WARN: Initializing connection to logging daemon failed. Logging daemon may
> not be running
> Apr 20 10:54:36 corosync [pcmk  ] info: spawn_child: Forked child 22450 for
> process crmd
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> Invoked: /usr/lib64/heartbeat/attrd
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> G_main_add_TriggerHandler: Added signal manual handler
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group pengine: [22449]:
> info: Invoked: /usr/lib64/heartbeat/pengine
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: WARN:
> Initializing connection to logging daemon failed. Logging daemon may not be
> running
> Apr 20 10:54:36 corosync [pcmk  ] info: spawn_child: Forked child 22451 for
> process mgmtd
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> main: Starting up
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> G_main_add_SignalHandler: Added signal handler for signal 17
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group pengine: [22449]:
> WARN: main: Terminating previous PE instance
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> retrieveCib: Reading cluster configuration from:
> /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/crm/cib.xml.sig)
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: crm_cluster_connect: Connecting to OpenAIS
> Apr 20 10:54:36 corosync [SERV  ] Service engine loaded: Pacemaker Cluster
> Manager 1.0.12
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> crm_cluster_connect: Connecting to OpenAIS
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group pengine: [21258]:
> WARN: process_pe_message: Received quit message, terminating
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: init_ais_connection_once: Creating connection to our AIS plugin
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> Invoked: /usr/lib64/heartbeat/crmd
> Apr 20 10:54:36 corosync [SERV  ] Service failed to load 'pacemaker'.
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> init_ais_connection_once: Creating connection to our AIS plugin
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> main: CRM Hg Version: unknown
>
> Apr 20 10:54:36 corosync [SERV  ] Service engine loaded: corosync extended
> virtual synchrony service
> Apr 20 10:54:36 corosync [SERV  ] Service engine loaded: corosync
> configuration service
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> crmd_init: Starting crmd
> Apr 20 10:54:36 corosync [SERV  ] Service engine loaded: corosync cluster
> closed process group service v1.01
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> G_main_add_SignalHandler: Added signal handler for signal 17
> Apr 20 10:54:36 corosync [SERV  ] Service engine loaded: corosync cluster
> config database access v1.01
> Apr 20 10:54:36 corosync [SERV  ] Service engine loaded: corosync profile
> loading service
> Apr 20 10:54:36 corosync [SERV  ] Service engine loaded: corosync cluster
> quorum service v0.1
> Apr 20 10:54:36 corosync [MAIN  ] Compatibility mode set to whitetank.
>  Using V1 and V2 of the synchronization engine.
> Apr 20 10:54:36 corosync [TOTEM ] The network interface [10.255.2.1] is now
> up.
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: init_ais_connection_once: AIS connection established
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> startCib: CIB Initialization completed successfully
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> crm_cluster_connect: Connecting to OpenAIS
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> init_ais_connection_once: Creating connection to our AIS plugin
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> init_ais_connection_once: AIS connection established
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_ipc: Recorded connection
> 0x18e7150 for stonithd/22445
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_ipc: Recorded connection
> 0x18eb4b0 for attrd/22448
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> get_ais_nodeid: Server details: id=16908042
> uname=lxdcv01nd01.bauer-uk.bauermedia.group cname=pcmk
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: get_ais_nodeid: Server details: id=16908042
> uname=lxdcv01nd01.bauer-uk.bauermedia.group cname=pcmk
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> crm_new_peer: Node lxdcv01nd01.bauer-uk.bauermedia.group now has id:
> 16908042
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: crm_new_peer: Node lxdcv01nd01.bauer-uk.bauermedia.group now has id:
> 16908042
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> crm_new_peer: Node 16908042 is now known as
> lxdcv01nd01.bauer-uk.bauermedia.group
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> main: Cluster connection active
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: crm_new_peer: Node 16908042 is now known as
> lxdcv01nd01.bauer-uk.bauermedia.group
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> main: Accepting attribute updates
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> main: Starting mainloop...
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> notice: /usr/lib64/heartbeat/stonithd start up successfully.
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: G_main_add_SignalHandler: Added signal handler for signal 17
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> init_ais_connection_once: AIS connection established
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_ipc: Recorded connection
> 0x18ef810 for cib/22446
> Apr 20 10:54:36 corosync [pcmk  ] info: update_member: Node
> lxdcv01nd01.bauer-uk.bauermedia.group now has process list:
> 00000000000000000000000000053312 (340754)
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_ipc: Sending membership update
> 0 to cib
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> get_ais_nodeid: Server details: id=16908042
> uname=lxdcv01nd01.bauer-uk.bauermedia.group cname=pcmk
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> crm_new_peer: Node lxdcv01nd01.bauer-uk.bauermedia.group now has id:
> 16908042
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> crm_new_peer: Node 16908042 is now known as
> lxdcv01nd01.bauer-uk.bauermedia.group
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> cib_init: Starting cib mainloop
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> ais_dispatch: Membership 0: quorum still lost
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> crm_update_peer: Node lxdcv01nd01.bauer-uk.bauermedia.group: id=16908042
> state=member (new) addr=(null) votes=1 (new) born=0 seen=0
> proc=00000000000000000000000000053312 (new)
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22455]: info:
> write_cib_contents: Archived previous version as
> /var/lib/heartbeat/crm/cib-80.raw
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22455]: info:
> write_cib_contents: Wrote version 0.89.0 of the CIB to disk (digest:
> e15d151e0fed09d1d411b21b345a8952)
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22455]: info:
> retrieveCib: Reading cluster configuration from:
> /var/lib/heartbeat/crm/cib.cZHXQX (digest:
> /var/lib/heartbeat/crm/cib.U3NqAd)
> Apr 20 10:54:36 corosync [TOTEM ] Incrementing problem counter for seqid 1
> iface 10.255.2.1 to [1 of 10]
> Apr 20 10:54:36 corosync [pcmk  ] notice: pcmk_peer_update: Transitional
> membership event on ring 208: memb=0, new=0, lost=0
> Apr 20 10:54:36 corosync [pcmk  ] notice: pcmk_peer_update: Stable
> membership event on ring 208: memb=1, new=1, lost=0
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_peer_update: NEW:
>  lxdcv01nd01.bauer-uk.bauermedia.group 16908042
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_peer_update: MEMB:
> lxdcv01nd01.bauer-uk.bauermedia.group 16908042
> Apr 20 10:54:36 corosync [TOTEM ] A processor joined or left the membership
> and a new membership was formed.
> Apr 20 10:54:36 corosync [MAIN  ] Completed service synchronization, ready
> to provide service.
> Apr 20 10:54:37 corosync [pcmk  ] notice: pcmk_peer_update: Transitional
> membership event on ring 212: memb=1, new=0, lost=0
> Apr 20 10:54:37 corosync [pcmk  ] info: pcmk_peer_update: memb:
> lxdcv01nd01.bauer-uk.bauermedia.group 16908042
> Apr 20 10:54:37 corosync [pcmk  ] notice: pcmk_peer_update: Stable
> membership event on ring 212: memb=2, new=1, lost=0
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: Creating entry for
> node 33685258 born on 212
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: Node 33685258/unknown
> is now: member
> Apr 20 10:54:37 corosync [pcmk  ] info: pcmk_peer_update: NEW:  .pending.
> 33685258
> Apr 20 10:54:37 corosync [pcmk  ] info: pcmk_peer_update: MEMB:
> lxdcv01nd01.bauer-uk.bauermedia.group 16908042
> Apr 20 10:54:37 corosync [pcmk  ] info: pcmk_peer_update: MEMB: .pending.
> 33685258
> Apr 20 10:54:37 corosync [pcmk  ] info: send_member_notification: Sending
> membership update 212 to 1 children
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: 0x18db8e0 Node
> 16908042 (lxdcv01nd01.bauer-uk.bauermedia.group) born on: 212
> Apr 20 10:54:37 corosync [TOTEM ] A processor joined or left the membership
> and a new membership was formed.
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> ais_dispatch: Membership 212: quorum still lost
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: 0x18e6ac0 Node
> 33685258 ((null)) born on: 196
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> crm_new_peer: Node <null> now has id: 33685258
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: 0x18e6ac0 Node
> 33685258 now known as lxdcv01nd02.bauer-uk.bauermedia.group (was: (null))
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> crm_update_peer: Node (null): id=33685258 state=member (new) addr=r(0)
> ip(10.255.1.2) r(1) ip(10.255.2.2)  votes=0 born=0 seen=212
> proc=00000000000000000000000000000000
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: Node
> lxdcv01nd02.bauer-uk.bauermedia.group now has process list:
> 00000000000000000000000000013312 (78610)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> crm_update_peer: Node lxdcv01nd01.bauer-uk.bauermedia.group: id=16908042
> state=member addr=r(0) ip(10.255.1.1) r(1) ip(10.255.2.1)  (new) votes=1
> born=0 seen=212 proc=00000000000000000000000000053312
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: Node
> lxdcv01nd02.bauer-uk.bauermedia.group now has 1 quorum votes (was 0)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: notice:
> ais_dispatch: Membership 212: quorum acquired
> Apr 20 10:54:37 corosync [pcmk  ] info: send_member_notification: Sending
> membership update 212 to 1 children
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> crm_get_peer: Node 33685258 is now known as
> lxdcv01nd02.bauer-uk.bauermedia.group
> Apr 20 10:54:37 corosync [pcmk  ] WARN: route_ais_message: Sending message
> to local.crmd failed: unknown (rc=-2)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> crm_update_peer: Node lxdcv01nd02.bauer-uk.bauermedia.group: id=33685258
> state=member addr=r(0) ip(10.255.1.2) r(1) ip(10.255.2.2)  votes=1 (new)
> born=196 seen=212 proc=00000000000000000000000000013312 (new)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> cib_process_diff: Diff 0.91.3 -> 0.91.4 not applied to 0.89.0: current
> "epoch" is less than required
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> cib_server_process_diff: Requesting re-sync from peer
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: WARN:
> cib_diff_notify: Local-only Change (client:crmd, call: 77): -1.-1.-1
> (Application of an update diff failed, requesting a full refresh)
> Apr 20 10:54:37 corosync [MAIN  ] Completed service synchronization, ready
> to provide service.
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: WARN:
> cib_server_process_diff: Not applying diff 0.91.4 -> 0.91.5 (sync in
> progress)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: WARN:
> cib_server_process_diff: Not applying diff 0.91.5 -> 0.91.6 (sync in
> progress)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: WARN:
> cib_server_process_diff: Not applying diff 0.91.6 -> 0.92.1 (sync in
> progress)
> Apr 20 10:54:37 corosync [pcmk  ] WARN: route_ais_message: Sending message
> to local.crmd failed: unknown (rc=-2)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]: info:
> cib_replace_notify: Local-only Replace: -1.-1.-1 from
> lxdcv01nd02.bauer-uk.bauermedia.group
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22456]: info:
> write_cib_contents: Archived previous version as
> /var/lib/heartbeat/crm/cib-81.raw
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22456]: info:
> write_cib_contents: Wrote version 0.92.0 of the CIB to disk (digest:
> 65cf2f5895618dbd08c40b8c39a479c5)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22456]: info:
> retrieveCib: Reading cluster configuration from:
> /var/lib/heartbeat/crm/cib.8nhla0 (digest:
> /var/lib/heartbeat/crm/cib.nUDbdi)
> Apr 20 10:54:37 corosync [pcmk  ] WARN: route_ais_message: Sending message
> to local.crmd failed: unknown (rc=-2)
> Apr 20 10:54:37 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child process
> mgmtd exited (pid=22451, rc=100)
> Apr 20 10:54:37 corosync [pcmk  ] notice: pcmk_wait_dispatch: Child process
> mgmtd no longer wishes to be respawned
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: Node
> lxdcv01nd01.bauer-uk.bauermedia.group now has process list:
> 00000000000000000000000000013312 (78610)
> Apr 20 10:54:37 corosync [pcmk  ] WARN: route_ais_message: Sending message
> to local.crmd failed: unknown (rc=-2)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]: info:
> G_main_add_SignalHandler: Added signal handler for signal 15
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]: info:
> G_main_add_SignalHandler: Added signal handler for signal 17
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]: info:
> enabling coredumps
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]: info:
> G_main_add_SignalHandler: Added signal handler for signal 10
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]: info:
> G_main_add_SignalHandler: Added signal handler for signal 12
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]: info:
> Started.
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> do_cib_control: CIB connection established
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> crm_cluster_connect: Connecting to OpenAIS
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> init_ais_connection_once: Creating connection to our AIS plugin
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> init_ais_connection_once: AIS connection established
> Apr 20 10:54:37 corosync [pcmk  ] info: pcmk_ipc: Recorded connection
> 0x18f53c0 for crmd/22450
> Apr 20 10:54:37 corosync [pcmk  ] info: pcmk_ipc: Sending membership update
> 212 to crmd
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> get_ais_nodeid: Server details: id=16908042
> uname=lxdcv01nd01.bauer-uk.bauermedia.group cname=pcmk
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> crm_new_peer: Node lxdcv01nd01.bauer-uk.bauermedia.group now has id:
> 16908042
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> crm_new_peer: Node 16908042 is now known as
> lxdcv01nd01.bauer-uk.bauermedia.group
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> do_ha_control: Connected to the cluster
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> do_started: Delaying start, CCM (0000000000100000) not connected
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> crmd_init: Starting crmd's mainloop
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> config_query_callback: Checking for expired actions every 900000ms
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> config_query_callback: Sending expected-votes=2 to corosync
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: notice:
> ais_dispatch: Membership 212: quorum acquired
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> crm_new_peer: Node lxdcv01nd02.bauer-uk.bauermedia.group now has id:
> 33685258
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> crm_new_peer: Node 33685258 is now known as
> lxdcv01nd02.bauer-uk.bauermedia.group
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> crm_update_peer: Node lxdcv01nd02.bauer-uk.bauermedia.group: id=33685258
> state=member (new) addr=r(0) ip(10.255.1.2) r(1) ip(10.255.2.2)  votes=1
> born=196 seen=212 proc=00000000000000000000000000013312
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> crm_update_peer: Node lxdcv01nd01.bauer-uk.bauermedia.group: id=16908042
> state=member (new) addr=r(0) ip(10.255.1.1) r(1) ip(10.255.2.1)  (new)
> votes=1 (new) born=212 seen=212 proc=00000000000000000000000000013312 (new)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> do_started: The local CRM is operational
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> do_state_transition: State transition S_STARTING -> S_PENDING [
> input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group pengine: [22449]:
> info: main: Starting pengine
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> ais_dispatch: Membership 212: quorum retained
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> update_dc: Set DC to lxdcv01nd02.bauer-uk.bauermedia.group (3.0.1)
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> update_attrd: Connecting to attrd...
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> find_hash_entry: Creating hash entry for terminate
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC
> cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> find_hash_entry: Creating hash entry for shutdown
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> attrd_local_callback: Sending full refresh (origin=crmd)
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> erase_xpath_callback: Deletion of
> "//node_state[@uname='lxdcv01nd01.bauer-uk.bauermedia.group']/transient_attributes":
> ok (rc=0)
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> crm_new_peer: Node lxdcv01nd02.bauer-uk.bauermedia.group now has id:
> 33685258
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> crm_new_peer: Node 33685258 is now known as
> lxdcv01nd02.bauer-uk.bauermedia.group
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> find_hash_entry: Creating hash entry for probe_complete
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> attrd_perform_update: Delaying operation probe_complete=<null>: cib not
> connected
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> do_lrm_rsc_op: Performing key=6:4:7:e6a3b9c7-c24d-497a-9c07-d6082ee231a9
> op=local-manage_monitor_0 )
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]: info:
> rsc:local-manage:2: probe
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> do_lrm_rsc_op: Performing key=7:4:7:e6a3b9c7-c24d-497a-9c07-d6082ee231a9
> op=lcdcv01_monitor_0 )
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]: info:
> rsc:lcdcv01:3: probe
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> process_lrm_event: LRM operation lcdcv01_monitor_0 (call=3, rc=0,
> cib-update=7, confirmed=true) ok
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> process_lrm_event: LRM operation local-manage_monitor_0 (call=2, rc=7,
> cib-update=8, confirmed=true) not running
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> attrd_trigger_update: Sending flush op to all hosts for: probe_complete
> (true)
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> attrd_perform_update: Delaying operation probe_complete=true: cib not
> connected
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> do_lrm_rsc_op: Performing key=9:5:0:e6a3b9c7-c24d-497a-9c07-d6082ee231a9
> op=lcdcv01_stop_0 )
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]: info:
> rsc:lcdcv01:4: stop
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> attrd_trigger_update: Sending flush op to all hosts for: probe_complete
> (true)
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> attrd_perform_update: Delaying operation probe_complete=true: cib not
> connected
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]: info:
> RA output: (lcdcv01:stop:stderr) logd is not running
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> process_lrm_event: LRM operation lcdcv01_stop_0 (call=4, rc=0, cib-update=9,
> confirmed=true) ok
> Apr 20 10:54:38 corosync [TOTEM ] ring 1 active with no faults
> Apr 20 10:54:41 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> cib_connect: Connected to the CIB after 1 signon attempts
> Apr 20 10:54:41 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> cib_connect: Sending full refresh
> Apr 20 10:54:41 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> attrd_trigger_update: Sending flush op to all hosts for: probe_complete
> (true)
> Apr 20 10:54:41 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> attrd_perform_update: Sent update 4: probe_complete=true
>
>
> Bauer Corporate Services UK LP (BCS) is a division of the Bauer Media Group
> the
> largest consumer publisher in the UK, and second largest commercial radio
> broadcaster. BCS provides financial services and manages and develops IT
> systems
> on which our UK publishing, broadcast, digital and partner businesses
> depend.
>
> The information in this email is intended only for the addressee(s) named
> above.
> Access to this email by anyone else is unauthorised. If you are not the
> intended
> recipient of this message any disclosure, copying, distribution or any
> action
> taken in reliance on it is prohibited and may be unlawful. Bauer Corporate
> Services do not warrant that any attachments are free from viruses or other
> defects and accept no liability for any losses resulting from infected email
> transmissions.
>
> Please note that any views expressed in this email may be those of the
> originator and do not necessarily reflect those of this organisation.
>
> Bauer Corporate Services UK LP is registered in England; Registered address
> is
> 1 Lincoln Court, Lincoln Road, Peterborough, PE1 2RF.
>
> Registration number LP13195
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE




More information about the Pacemaker mailing list