[Pacemaker] crm_cluster_connect: Triggered fatal assert at cluster.c:65 : hb_conn != NULL

Nikita Michalko michalko.system at a-i-p.com
Mon Jul 18 11:17:02 EDT 2011


Hi all!

I have succesfully configured and running 2-nodes-cluster. By testing 
different scenaries became I that error.
Situation:
1st node was running, the 2nd was rebooted and heartbeat started only on the 
1st node - it was OK, all resources were running on the 1st node.
Then I removed on the 2nd node all files  in /var/lib/heartbeat/crm/ and in 
/var/lib//pengine/. 
After starting the heartbeat/PM on the 2nd node, I'm facing to the following 
errors:
--- SNIP ---
Jul 18 15:54:25 pollux cib: [16884]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.xml (digest: 
/var/lib/heartbeat/crm/cib.xml.sig)
Jul 18 15:54:25 pollux cib: [16884]: WARN: validate_cib_digest: No on-disk 
digest present
Jul 18 15:54:25 pollux cib: [16884]: info: validate_with_relaxng: Creating RNG 
parser context
Jul 18 15:54:25 pollux cib: [16884]: info: startCib: CIB Initialization 
completed successfully
Jul 18 15:54:25 pollux cib: [16884]: info: crm_cluster_connect: Connecting to 
cluster infrastructure: heartbeat
Jul 18 15:54:25 pollux cib: [16884]: ERROR: crm_abort: crm_cluster_connect: 
Triggered fatal assert at cluster.c:65 : hb_conn != NULL
Jul 18 15:54:25 pollux heartbeat: [16824]: WARN: Managed 
/usr/lib64/heartbeat/cib process 16884 killed by signal 6 [SIGABRT - Abort].
Jul 18 15:54:25 pollux heartbeat: [16824]: ERROR: Managed 
/usr/lib64/heartbeat/cib process 16884 dumped core
Jul 18 15:54:25 pollux heartbeat: [16824]: ERROR: Client 
/usr/lib64/heartbeat/cib "respawning too fast"
Jul 18 15:54:26 pollux crmd: [16850]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped! (2000ms)
Jul 18 15:54:27 pollux crmd: [16850]: info: do_cib_control: Could not connect 
to the CIB service: connection failed
Jul 18 15:54:27 pollux crmd: [16850]: WARN: do_cib_control: Couldn't complete 
CIB registration 5 times... pause and retry
Jul 18 15:54:29 pollux crmd: [16850]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped! (2000ms)
...
crm_verify -V -x /var/lib/heartbeat/crm/cib.xml - > OK!
After stopping the PM/HA on the 1st node and removing all relevant PM/HA 
files, it is the same on the 1st node. Making new configuration with crm 
configure shows errors:
Signon to CIB failed: connection failed
Init failed, could not perform requested operations
ERROR: cannot parse xml: no element found: line 1, column 0

Versions:

pacemaker :     1.1.5 (Build: c86cb93c5a57c1f507a21be69d24fd28dee85397)
cluster-glue :     1.0.7 (Build: 6fa74ce2ed7ef6df41be2b634cd4aa89c318a8dc)
resource-agents: 1.0.4 (Build: 7a11934b142d1daf42a04fbaa0391a3ac47cee4c)
heartbeat:        3.0.5

What do I wrong? 
Configuration attached...


TIA!
Nikita Michalko
-------------- next part --------------
A non-text attachment was scrubbed...
Name: NM_cib.xml
Type: application/xml
Size: 13010 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110718/63c2d62d/attachment-0002.wsdl>


More information about the Pacemaker mailing list