[Pacemaker] stonithd dumps core since 1.0.0

Roderick van Domburg r.s.a.vandomburg at nedforce.nl
Tue Oct 14 09:15:21 EDT 2008


Hello everyone,

We have been running cman+gfs2 and heartbeat+pacemaker simultaneously  
on our systems. This worked great until we updated to heartbeat-2.99.2  
and pacemaker-1.0.0 yesterday, which crashes while calling  
is_openais_cluster(). Previously we ran heartbeat-2.99.1 and  
pacemaker-0.7.3 successfully.

I'll post this to the linux-ha list too.

/var/log/messages:

Oct 14 14:49:55 node1 logd: [1455]: info: logd started with default  
configuration.
Oct 14 14:49:55 node1 logd: [1463]: info: G_main_add_SignalHandler:  
Added signal handler for signal 15
Oct 14 14:49:55 node1 logd: [1455]: info: G_main_add_SignalHandler:  
Added signal handler for signal 15
Oct 14 14:49:55 node1 heartbeat: [1479]: info: Enabling logging daemon
Oct 14 14:49:55 node1 heartbeat: [1479]: info: logfile and debug file  
are those specified in logd config file (default /etc/logd.cf)
Oct 14 14:49:55 node1 heartbeat: [1479]: info: ******************
Oct 14 14:49:55 node1 heartbeat: [1479]: info: Configuration  
validated. Starting heartbeat 2.99.2
Oct 14 14:49:55 node1 heartbeat: [1480]: info: heartbeat: version 2.99.2
Oct 14 14:49:55 node1 heartbeat: [1480]: info: Heartbeat generation:  
1219055953
Oct 14 14:49:55 node1 heartbeat: [1480]: info: glib: UDP multicast  
heartbeat started for group 239.0.0.45 port 694 interface eth0 (ttl=1  
loop=0)
Oct 14 14:49:55 node1 heartbeat: [1480]: info:  
G_main_add_TriggerHandler: Added signal manual handler
Oct 14 14:49:55 node1 heartbeat: [1480]: info:  
G_main_add_TriggerHandler: Added signal manual handler
Oct 14 14:49:55 node1 heartbeat: [1480]: notice: Using watchdog  
device: /dev/watchdog
Oct 14 14:49:55 node1 heartbeat: [1480]: info:  
G_main_add_SignalHandler: Added signal handler for signal 17
Oct 14 14:49:55 node1 heartbeat: [1480]: info: Local status now set  
to: 'up'
Oct 14 14:50:55 node1 heartbeat: [1480]: WARN: node node2: is dead
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Comm_now_up(): updating  
status to active
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Local status now set  
to: 'active'
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/ 
usr/lib64/heartbeat/ccm" (498,496)
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/ 
usr/lib64/heartbeat/cib" (498,496)
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/ 
usr/lib64/heartbeat/lrmd -r" (0,0)
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/ 
usr/lib64/heartbeat/stonithd" (0,0)
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/ 
usr/lib64/heartbeat/attrd" (498,496)
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/ 
usr/lib64/heartbeat/crmd" (498,496)
Oct 14 14:50:55 node1 heartbeat: [1489]: info: Starting "/usr/lib64/ 
heartbeat/ccm" as uid 498  gid 496 (pid 1489)
Oct 14 14:50:55 node1 heartbeat: [1492]: info: Starting "/usr/lib64/ 
heartbeat/stonithd" as uid 0  gid 0 (pid 1492)
Oct 14 14:50:55 node1 heartbeat: [1491]: info: Starting "/usr/lib64/ 
heartbeat/lrmd -r" as uid 0  gid 0 (pid 1491)
Oct 14 14:50:55 node1 heartbeat: [1493]: info: Starting "/usr/lib64/ 
heartbeat/attrd" as uid 498  gid 496 (pid 1493)
Oct 14 14:50:55 node1 heartbeat: [1490]: info: Starting "/usr/lib64/ 
heartbeat/cib" as uid 498  gid 496 (pid 1490)
Oct 14 14:50:55 node1 heartbeat: [1494]: info: Starting "/usr/lib64/ 
heartbeat/crmd" as uid 498  gid 496 (pid 1494)
Oct 14 14:50:55 node1 lrmd: [1491]: info: G_main_add_SignalHandler:  
Added signal handler for signal 15
Oct 14 14:50:55 node1 stonithd: [1492]: info:  
G_main_add_SignalHandler: Added signal handler for signal 10
Oct 14 14:50:55 node1 stonithd: [1492]: info:  
G_main_add_SignalHandler: Added signal handler for signal 12
Oct 14 14:50:55 node1 cib: [1490]: info: G_main_add_SignalHandler:  
Added signal handler for signal 15
Oct 14 14:50:55 node1 cib: [1490]: info: G_main_add_TriggerHandler:  
Added signal manual handler
Oct 14 14:50:55 node1 cib: [1490]: info: G_main_add_SignalHandler:  
Added signal handler for signal 17
Oct 14 14:50:55 node1 attrd: [1493]: info: G_main_add_SignalHandler:  
Added signal handler for signal 15
Oct 14 14:50:55 node1 attrd: [1493]: info: main: Starting up....
Oct 14 14:50:55 node1 attrd: [1493]: ERROR: main: HA Signon failed
Oct 14 14:50:55 node1 attrd: [1493]: ERROR: main: Aborting startup
Oct 14 14:50:55 node1 heartbeat: [1480]: WARN: Managed /usr/lib64/ 
heartbeat/attrd process 1493 exited with return code 100.
Oct 14 14:50:55 node1 ccm: [1489]: info: Hostname: node1
Oct 14 14:50:55 node1 crmd: [1494]: info: main: CRM Hg Version: node:  
9a6c6d1dd87154b11fdf9ff7fadf5fd33500bca4
Oct 14 14:50:55 node1 crmd: [1494]: info: crmd_init: Starting crmd
Oct 14 14:50:55 node1 crmd: [1494]: info: G_main_add_SignalHandler:  
Added signal handler for signal 15
Oct 14 14:50:55 node1 crmd: [1494]: info: G_main_add_TriggerHandler:  
Added signal manual handler
Oct 14 14:50:55 node1 crmd: [1494]: info: G_main_add_SignalHandler:  
Added signal handler for signal 17
Oct 14 14:50:55 node1 stonithd: [1492]: ERROR: crm_abort:  
is_heartbeat_cluster: Triggered fatal assert at utils.c:1626 :  
is_openais_cluster()
Oct 14 14:50:55 node1 cib: [1490]: info: retrieveCib: Reading cluster  
configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/ 
heartbeat/crm/cib.xml.sig)
Oct 14 14:50:55 node1 lrmd: [1491]: info: G_main_add_SignalHandler:  
Added signal handler for signal 17
Oct 14 14:50:55 node1 lrmd: [1491]: info: G_main_add_SignalHandler:  
Added signal handler for signal 10
Oct 14 14:50:55 node1 lrmd: [1491]: info: G_main_add_SignalHandler:  
Added signal handler for signal 12
Oct 14 14:50:55 node1 lrmd: [1491]: info: Started.
Oct 14 14:50:55 node1 heartbeat: [1480]: WARN: Managed /usr/lib64/ 
heartbeat/stonithd process 1492 killed by signal 6 [SIGABRT - Abort].
Oct 14 14:50:55 node1 heartbeat: [1480]: ERROR: Managed /usr/lib64/ 
heartbeat/stonithd process 1492 dumped core
Oct 14 14:50:55 node1 heartbeat: [1480]: ERROR: Respawning client "/ 
usr/lib64/heartbeat/stonithd":
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/ 
usr/lib64/heartbeat/stonithd" (0,0)
Oct 14 14:50:56 node1 cib: [1490]: info: startCib: CIB Initialization  
completed successfully
Oct 14 14:50:56 node1 cib: [1490]: CRIT: cib_init: Cannot sign in to  
the cluster... terminating
Oct 14 14:50:56 node1 heartbeat: [1480]: WARN: Managed /usr/lib64/ 
heartbeat/cib process 1490 exited with return code 100.
Oct 14 14:50:56 node1 heartbeat: [1480]: EMERG: Rebooting system.   
Reason: /usr/lib64/heartbeat/cib
Oct 14 14:50:56 node1 crmd: [1494]: WARN: do_cib_control: Couldn't  
complete CIB registration 1 times... pause and retry
Oct 14 14:50:56 node1 crmd: [1494]: info: crmd_init: Starting crmd's  
mainloop
Oct 14 14:50:56 node1 heartbeat: [1495]: info: Starting "/usr/lib64/ 
heartbeat/stonithd" as uid 0  gid 0 (pid 1495)
Oct 14 14:50:56 node1 stonithd: [1495]: info:  
G_main_add_SignalHandler: Added signal handler for signal 10
Oct 14 14:50:56 node1 stonithd: [1495]: info:  
G_main_add_SignalHandler: Added signal handler for signal 12
Oct 14 14:50:56 node1 stonithd: [1495]: ERROR: crm_abort:  
is_heartbeat_cluster: Triggered fatal assert at utils.c:1626 :  
is_openais_cluster()
Oct 14 14:50:57 node1 kernel: md: stopping all md devices.
Oct 14 14:51:17 node1 syslogd 1.4.1: restart.

This occurs no matter whether cman and openais are running or not.

I have attached the coredump.
Version information:

- CentOS 5.2 x86_64 (2.6.18-92.1.13.el5xen)
- heartbeat-common.x86_64 2.99.2-21.1
- heartbeat-resources.x86_64 2.99.2-21.1
- heartbeat.x86_64 2.99.2-21.1
- libheartbeat2.x86_64 2.99.2-21.1
- pacemaker.x86_64 1.0.0-1.6
- libpacemaker3.x86_64 1.0.0-1.6
- openais.x86_64 0.80.3-19.1
- cman.x86_64 2.0.84-2.el5_2.1

ha.cf:

autojoin none
mcast eth0 239.0.0.45 694 1 0
warntime 15
deadtime 60
initdead 60
keepalive 3
node node1
node node2
crm on
watchdog /dev/watchdog
use_logd on

openais.conf:

totem {
	version: 2
	secauth: on
	threads: 1
	heartbeat_failures_allowed: 3
	interface {
		ringnumber: 0
		bindnetaddr: 10.0.3.1
		mcastaddr: 239.0.0.45
		mcastport: 5405
	}
}

logging {
	debug: off
	timestamp: on
}

amf {
	mode: disabled
}

I have tried switching either to another IP, but to no avail.
Any insights into this behavior?

Kind regards,

Roderick
-------------- next part --------------
A non-text attachment was scrubbed...
Name: core.1492
Type: application/octet-stream
Size: 724992 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20081014/17418b93/attachment.obj>
-------------- next part --------------





More information about the Pacemaker mailing list