[Pacemaker] pacemaker-1.0.6 + corosync 1.1.2 crashing

Mark Horton mark at nostromo.net
Tue Nov 10 18:48:08 EST 2009


Nikola,
Sorry, I don't have a solution, but I'm curious about your setup.
Which version of DLM are you using?  Did you have to compile it
yourself?

Regards,
Mark

On Tue, Nov 10, 2009 at 7:28 AM, Nikola Ciprich <extmaillist at linuxbox.cz> wrote:
> Hello Andrew et al,
> few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well.
> It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some
> reasong attrd and cib seem to be crashing.
>
> here's snippet from corosync log:
> Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Corosync built-in features: nss rdma
> Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:12:21 vbox3 corosync[4299]:   [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:12:21 vbox3 corosync[4299]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Nov 10 14:12:21 vbox3 corosync[4299]:   [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:12:21 vbox3 corosync[4299]:   [pcmk  ] info: process_ais_conf: Reading configure
> Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Corosync built-in features: nss rdma
> Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:16 vbox3 corosync[4348]:   [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:16 vbox3 corosync[4348]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:16 vbox3 corosync[4348]:   [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:16 vbox3 corosync[4348]:   [pcmk  ] info: process_ais_conf: Reading configure
> Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Corosync built-in features: nss rdma
> Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:24 vbox3 corosync[4357]:   [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:24 vbox3 corosync[4357]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:24 vbox3 corosync[4357]:   [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:24 vbox3 corosync[4357]:   [pcmk  ] info: process_ais_conf: Reading configure
> Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Corosync built-in features: nss rdma
> Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:57 vbox3 corosync[4380]:   [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:57 vbox3 corosync[4380]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:58 vbox3 corosync[4380]:   [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: process_ais_conf: Reading configure
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_init: Local handle: 9213452461992312833 for logging
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_next: Processing additional logging options...
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Found 'off' for option: debug
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Defaulting to 'off' for option: to_file
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_init: Local handle: 2013064636357672962 for service
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_next: Processing additional service options...
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Defaulting to 'no' for option: use_logd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Found 'no' for option: use_mgmtd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: CRM: Initialized
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] Logging: Initialized pcmk_startup
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: Service: 9
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: Local hostname: vbox3
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_update_nodeid: Local node id: 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Creating entry for node 16792074 born on 0
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: 0x260ee10 Node 16792074 now known as vbox3 (was: (null))
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Node vbox3 now has 1 quorum votes (was 0)
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Node 16792074/vbox3 is now: member
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4384 for process stonithd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4385 for process cib
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4386 for process lrmd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4387 for process attrd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4388 for process pengine
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4389 for process crmd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: Pacemaker Cluster Manager 1.0.6
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync extended virtual synchrony service
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync configuration service
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync cluster config database access v1.01
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync profile loading service
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_peer_update: Transitional membership event on ring 4: memb=0, new=0, lost=0
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: Stack hogger failed 0xffffffff
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_peer_update: Stable membership event on ring 4: memb=1, new=1, lost=0
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_peer_update: NEW:  vbox3 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_peer_update: MEMB: vbox3 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Node vbox3 now has process list: 00000000000000000000000000013312 (78610)
> Nov 10 14:13:58 vbox3 corosync[4380]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Nov 10 14:13:58 vbox3 corosync[4380]:   [MAIN  ] Completed service synchronization, ready to provide service.
> Nov 10 14:13:58 vbox3 cib: [4385]: info: Invoked: /usr/lib64/heartbeat/cib
> Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_TriggerHandler: Added signal manual handler
> Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 cib: [4385]: info: retrieveCib: Reading c
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Continuing with an empty configuration.
> Nov 10 14:13:58 vbox3 cib: [4385]: info: startCib: CIB Initialization completed successfully
> Nov 10 14:13:58 vbox3 cib: [4385]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 cib: [4385]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: Invoked: /usr/lib64/heartbeat/crmd
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: main: CRM Hg Version: cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: crmd_init: Starting crmd
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 pengine: [4388]: info: Invoked: /usr/lib64/heartbeat/pengine
> Nov 10 14:13:58 vbox3 pengine: [4388]: info: main: Starting pengine
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 15
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: Started.
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: Invoked: /usr/lib64/heartbeat/attrd
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: main: Starting up
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: AIS connection established
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x2615120 for stonithd/4384
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: get_ais_nodeid: Server details: id=16792074 uname=vbox3
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node vbox3 now has id: 16792074
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node 16792074 is now known as vbox3
> Nov 10 14:13:58 vbox3 stonithd: [4384]: notice: /usr/lib64/heartbeat/stonithd start up successfully.
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4385, core=false)
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4391 for process cib
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4387, core=false)
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4392 for process attrd
> Nov 10 14:13:59 vbox3 crmd: [4389]: info: do_cib_control: Could not connect to the CIB service: connection failed
> Nov 10 14:13:59 vbox3 crmd: [4389]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
> Nov 10 14:13:59 vbox3 crmd: [4389]: info: crmd_init: Starting crmd's mainloop
> Nov 10 14:13:59 vbox3 cib: [4391]: info: Invoked: /usr/lib64/heartbeat/cib
> Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_TriggerHandler: Added signal manual handler
> Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:59 vbox3 cib: [4391]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Continuing with an empty configuration.
> Nov 10 14:13:59 vbox3 cib: [4391]: info: startCib: CIB Initialization completed successfully
> Nov 10 14:13:59 vbox3 cib: [4391]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:59 vbox3 cib: [4391]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: Invoked: /usr/lib64/heartbeat/attrd
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: main: Starting up
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4391, core=false)
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4393 for process cib
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4392, core=false)
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4394 for process attrd
> and last few lines then keep repeating...
>
> here's gdb backtrace obtained from core files:
> cib:
> #0  0x00007f9f07218f48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> #1  0x00007f9f0949bf06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> #2  0x00007f9f096a5c37 in init_ais_connection (dispatch=0x40d516 <cib_ais_dispatch>, destroy=0x40d658 <cib_ais_destroy>, our_uuid=0x0,
>    our_uname=0x616f28, nodeid=0x0) at ais.c:588
> #3  0x00007f9f096a1576 in crm_cluster_connect (our_uname=0x616f28, our_uuid=0x0, dispatch=0x40d516, destroy=0x40d658, hb_conn=0x0)
>    at cluster.c:56
> #4  0x000000000040d753 in cib_init () at main.c:424
> #5  0x000000000040d08e in main (argc=1, argv=0x7fff9ec48f98) at main.c:218
>
>
> attrd:
> #0  0x00007f194ea0cf48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> #1  0x00007f1950c8ff06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> #2  0x00007f1950e99c37 in init_ais_connection (dispatch=0x402891 <attrd_ais_dispatch>, destroy=0x402af3 <attrd_ais_destroy>,
>    our_uuid=0x605918, our_uname=0x605910, nodeid=0x0) at ais.c:588
> #3  0x00007f1950e95576 in crm_cluster_connect (our_uname=0x605910, our_uuid=0x605918, dispatch=0x402891, destroy=0x402af3, hb_conn=0x0)
>    at cluster.c:56
> #4  0x0000000000403185 in main (argc=1, argv=0x7fffd3548b38) at attrd.c:569
>
> Unfortunately I'm not 100% sure that all the packages I installed on those machines are compiled the same way, as I
> deleted old (testing) packages. But the versions are the same.
> Any idea where I should look for possible culprit?
> thanks a lot for reply!
> with best regards
> nik
>
>
> --
> -------------------------------------
> Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 01 Ostrava
>
> tel.:   +420 596 603 142
> fax:    +420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: servis at linuxbox.cz
> -------------------------------------
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>




More information about the Pacemaker mailing list