[Pacemaker] pacemaker-1.0.6 + corosync 1.1.2 crashing

Nikola Ciprich extmaillist at linuxbox.cz
Tue Nov 10 09:28:20 UTC 2009


Hello Andrew et al,
few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well.
It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some
reasong attrd and cib seem to be crashing.

here's snippet from corosync log:
Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Corosync built-in features: nss rdma
Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Nov 10 14:12:21 vbox3 corosync[4299]:   [TOTEM ] Initializing transport (UDP/IP).
Nov 10 14:12:21 vbox3 corosync[4299]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Nov 10 14:12:21 vbox3 corosync[4299]:   [TOTEM ] The network interface [10.58.0.1] is now up.
Nov 10 14:12:21 vbox3 corosync[4299]:   [pcmk  ] info: process_ais_conf: Reading configure
Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Corosync built-in features: nss rdma
Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Nov 10 14:13:16 vbox3 corosync[4348]:   [TOTEM ] Initializing transport (UDP/IP).
Nov 10 14:13:16 vbox3 corosync[4348]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Nov 10 14:13:16 vbox3 corosync[4348]:   [TOTEM ] The network interface [10.58.0.1] is now up.
Nov 10 14:13:16 vbox3 corosync[4348]:   [pcmk  ] info: process_ais_conf: Reading configure
Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Corosync built-in features: nss rdma
Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Nov 10 14:13:24 vbox3 corosync[4357]:   [TOTEM ] Initializing transport (UDP/IP).
Nov 10 14:13:24 vbox3 corosync[4357]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Nov 10 14:13:24 vbox3 corosync[4357]:   [TOTEM ] The network interface [10.58.0.1] is now up.
Nov 10 14:13:24 vbox3 corosync[4357]:   [pcmk  ] info: process_ais_conf: Reading configure
Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Corosync built-in features: nss rdma
Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Nov 10 14:13:57 vbox3 corosync[4380]:   [TOTEM ] Initializing transport (UDP/IP).
Nov 10 14:13:57 vbox3 corosync[4380]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Nov 10 14:13:58 vbox3 corosync[4380]:   [TOTEM ] The network interface [10.58.0.1] is now up.
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: process_ais_conf: Reading configure
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_init: Local handle: 9213452461992312833 for logging
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_next: Processing additional logging options...
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Found 'off' for option: debug
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Defaulting to 'off' for option: to_file
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_init: Local handle: 2013064636357672962 for service
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_next: Processing additional service options...
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Defaulting to 'no' for option: use_logd
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Found 'no' for option: use_mgmtd
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: CRM: Initialized
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] Logging: Initialized pcmk_startup
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: Service: 9
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: Local hostname: vbox3
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_update_nodeid: Local node id: 16792074
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Creating entry for node 16792074 born on 0
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: 0x260ee10 Node 16792074 now known as vbox3 (was: (null))
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Node vbox3 now has 1 quorum votes (was 0)
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Node 16792074/vbox3 is now: member
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4384 for process stonithd
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4385 for process cib
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4386 for process lrmd
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4387 for process attrd
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4388 for process pengine
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4389 for process crmd
Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: Pacemaker Cluster Manager 1.0.6
Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync extended virtual synchrony service
Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync configuration service
Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync cluster config database access v1.01
Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync profile loading service
Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_peer_update: Transitional membership event on ring 4: memb=0, new=0, lost=0
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 10
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 12
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: Stack hogger failed 0xffffffff
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_peer_update: Stable membership event on ring 4: memb=1, new=1, lost=0
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_peer_update: NEW:  vbox3 16792074
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_peer_update: MEMB: vbox3 16792074
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Node vbox3 now has process list: 00000000000000000000000000013312 (78610)
Nov 10 14:13:58 vbox3 corosync[4380]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 10 14:13:58 vbox3 corosync[4380]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov 10 14:13:58 vbox3 cib: [4385]: info: Invoked: /usr/lib64/heartbeat/cib
Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_TriggerHandler: Added signal manual handler
Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Nov 10 14:13:58 vbox3 cib: [4385]: info: retrieveCib: Reading c
Nov 10 14:13:58 vbox3 cib: [4385]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Continuing with an empty configuration.
Nov 10 14:13:58 vbox3 cib: [4385]: info: startCib: CIB Initialization completed successfully
Nov 10 14:13:58 vbox3 cib: [4385]: info: crm_cluster_connect: Connecting to OpenAIS
Nov 10 14:13:58 vbox3 cib: [4385]: info: init_ais_connection: Creating connection to our AIS plugin
Nov 10 14:13:58 vbox3 crmd: [4389]: info: Invoked: /usr/lib64/heartbeat/crmd
Nov 10 14:13:58 vbox3 crmd: [4389]: info: main: CRM Hg Version: cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
Nov 10 14:13:58 vbox3 crmd: [4389]: info: crmd_init: Starting crmd
Nov 10 14:13:58 vbox3 crmd: [4389]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Nov 10 14:13:58 vbox3 pengine: [4388]: info: Invoked: /usr/lib64/heartbeat/pengine
Nov 10 14:13:58 vbox3 pengine: [4388]: info: main: Starting pengine
Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 15
Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 10
Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 12
Nov 10 14:13:58 vbox3 lrmd: [4386]: info: Started.
Nov 10 14:13:58 vbox3 attrd: [4387]: info: Invoked: /usr/lib64/heartbeat/attrd
Nov 10 14:13:58 vbox3 attrd: [4387]: info: main: Starting up
Nov 10 14:13:58 vbox3 attrd: [4387]: info: crm_cluster_connect: Connecting to OpenAIS
Nov 10 14:13:58 vbox3 attrd: [4387]: info: init_ais_connection: Creating connection to our AIS plugin
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_cluster_connect: Connecting to OpenAIS
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: Creating connection to our AIS plugin
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: AIS connection established
Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x2615120 for stonithd/4384
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: get_ais_nodeid: Server details: id=16792074 uname=vbox3
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node vbox3 now has id: 16792074
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node 16792074 is now known as vbox3
Nov 10 14:13:58 vbox3 stonithd: [4384]: notice: /usr/lib64/heartbeat/stonithd start up successfully.
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4385, core=false)
Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4391 for process cib
Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4387, core=false)
Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4392 for process attrd
Nov 10 14:13:59 vbox3 crmd: [4389]: info: do_cib_control: Could not connect to the CIB service: connection failed
Nov 10 14:13:59 vbox3 crmd: [4389]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
Nov 10 14:13:59 vbox3 crmd: [4389]: info: crmd_init: Starting crmd's mainloop
Nov 10 14:13:59 vbox3 cib: [4391]: info: Invoked: /usr/lib64/heartbeat/cib
Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_TriggerHandler: Added signal manual handler
Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Nov 10 14:13:59 vbox3 cib: [4391]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/
Nov 10 14:13:59 vbox3 cib: [4391]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Continuing with an empty configuration.
Nov 10 14:13:59 vbox3 cib: [4391]: info: startCib: CIB Initialization completed successfully
Nov 10 14:13:59 vbox3 cib: [4391]: info: crm_cluster_connect: Connecting to OpenAIS
Nov 10 14:13:59 vbox3 cib: [4391]: info: init_ais_connection: Creating connection to our AIS plugin
Nov 10 14:13:59 vbox3 attrd: [4392]: info: Invoked: /usr/lib64/heartbeat/attrd
Nov 10 14:13:59 vbox3 attrd: [4392]: info: main: Starting up
Nov 10 14:13:59 vbox3 attrd: [4392]: info: crm_cluster_connect: Connecting to OpenAIS
Nov 10 14:13:59 vbox3 attrd: [4392]: info: init_ais_connection: Creating connection to our AIS plugin
Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4391, core=false)
Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4393 for process cib
Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4392, core=false)
Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4394 for process attrd
and last few lines then keep repeating...

here's gdb backtrace obtained from core files:
cib:
#0  0x00007f9f07218f48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#1  0x00007f9f0949bf06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
#2  0x00007f9f096a5c37 in init_ais_connection (dispatch=0x40d516 <cib_ais_dispatch>, destroy=0x40d658 <cib_ais_destroy>, our_uuid=0x0,
    our_uname=0x616f28, nodeid=0x0) at ais.c:588
#3  0x00007f9f096a1576 in crm_cluster_connect (our_uname=0x616f28, our_uuid=0x0, dispatch=0x40d516, destroy=0x40d658, hb_conn=0x0)
    at cluster.c:56
#4  0x000000000040d753 in cib_init () at main.c:424
#5  0x000000000040d08e in main (argc=1, argv=0x7fff9ec48f98) at main.c:218


attrd:
#0  0x00007f194ea0cf48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#1  0x00007f1950c8ff06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
#2  0x00007f1950e99c37 in init_ais_connection (dispatch=0x402891 <attrd_ais_dispatch>, destroy=0x402af3 <attrd_ais_destroy>,
    our_uuid=0x605918, our_uname=0x605910, nodeid=0x0) at ais.c:588
#3  0x00007f1950e95576 in crm_cluster_connect (our_uname=0x605910, our_uuid=0x605918, dispatch=0x402891, destroy=0x402af3, hb_conn=0x0)
    at cluster.c:56
#4  0x0000000000403185 in main (argc=1, argv=0x7fffd3548b38) at attrd.c:569

Unfortunately I'm not 100% sure that all the packages I installed on those machines are compiled the same way, as I
deleted old (testing) packages. But the versions are the same.
Any idea where I should look for possible culprit?
thanks a lot for reply!
with best regards
nik


-- 
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis at linuxbox.cz
-------------------------------------




More information about the Pacemaker mailing list