[Pacemaker] crm_cluster_connect: Triggered fatal assert at cluster.c:65 : hb_conn != NULL

Andrew Beekhof andrew at beekhof.net
Mon Jul 18 20:08:03 EDT 2011


On Tue, Jul 19, 2011 at 9:58 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Tue, Jul 19, 2011 at 1:17 AM, Nikita Michalko
> <michalko.system at a-i-p.com> wrote:
>> Hi all!
>>
>> I have succesfully configured and running 2-nodes-cluster. By testing
>> different scenaries became I that error.
>> Situation:
>> 1st node was running, the 2nd was rebooted and heartbeat started only on the
>> 1st node - it was OK, all resources were running on the 1st node.
>> Then I removed on the 2nd node all files  in /var/lib/heartbeat/crm/ and in
>> /var/lib//pengine/.
>> After starting the heartbeat/PM on the 2nd node, I'm facing to the following
>> errors:
>> --- SNIP ---
>> Jul 18 15:54:25 pollux cib: [16884]: info: retrieveCib: Reading cluster
>> configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
>> /var/lib/heartbeat/crm/cib.xml.sig)
>> Jul 18 15:54:25 pollux cib: [16884]: WARN: validate_cib_digest: No on-disk
>> digest present
>> Jul 18 15:54:25 pollux cib: [16884]: info: validate_with_relaxng: Creating RNG
>> parser context
>> Jul 18 15:54:25 pollux cib: [16884]: info: startCib: CIB Initialization
>> completed successfully
>> Jul 18 15:54:25 pollux cib: [16884]: info: crm_cluster_connect: Connecting to
>> cluster infrastructure: heartbeat
>> Jul 18 15:54:25 pollux cib: [16884]: ERROR: crm_abort: crm_cluster_connect:
>> Triggered fatal assert at cluster.c:65 : hb_conn != NULL
>> Jul 18 15:54:25 pollux heartbeat: [16824]: WARN: Managed
>> /usr/lib64/heartbeat/cib process 16884 killed by signal 6 [SIGABRT - Abort].
>> Jul 18 15:54:25 pollux heartbeat: [16824]: ERROR: Managed
>> /usr/lib64/heartbeat/cib process 16884 dumped core
>> Jul 18 15:54:25 pollux heartbeat: [16824]: ERROR: Client
>> /usr/lib64/heartbeat/cib "respawning too fast"
>> Jul 18 15:54:26 pollux crmd: [16850]: info: crm_timer_popped: Wait Timer
>> (I_NULL) just popped! (2000ms)
>> Jul 18 15:54:27 pollux crmd: [16850]: info: do_cib_control: Could not connect
>> to the CIB service: connection failed
>> Jul 18 15:54:27 pollux crmd: [16850]: WARN: do_cib_control: Couldn't complete
>> CIB registration 5 times... pause and retry
>> Jul 18 15:54:29 pollux crmd: [16850]: info: crm_timer_popped: Wait Timer
>> (I_NULL) just popped! (2000ms)
>> ...
>> crm_verify -V -x /var/lib/heartbeat/crm/cib.xml - > OK!
>> After stopping the PM/HA on the 1st node and removing all relevant PM/HA
>> files, it is the same on the 1st node. Making new configuration with crm
>> configure shows errors:
>> Signon to CIB failed: connection failed
>> Init failed, could not perform requested operations
>> ERROR: cannot parse xml: no element found: line 1, column 0
>>
>> Versions:
>>
>> pacemaker :     1.1.5 (Build: c86cb93c5a57c1f507a21be69d24fd28dee85397)
>
> Mercurial has no record of this changeset.
> Where did you get the packages from?

Specifically because it does not look like they support heartbeat,
which is what it triggering this error.

>> cluster-glue :     1.0.7 (Build: 6fa74ce2ed7ef6df41be2b634cd4aa89c318a8dc)
>> resource-agents: 1.0.4 (Build: 7a11934b142d1daf42a04fbaa0391a3ac47cee4c)
>> heartbeat:        3.0.5
>>
>> What do I wrong?
>> Configuration attached...
>>
>>
>> TIA!
>> Nikita Michalko
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
>




More information about the Pacemaker mailing list