[Pacemaker] Pacemaker won't start after node was fenced

Jake Smith jsmith at argotec.com
Tue Jan 27 01:23:55 EST 2015


Had a failover of my active/passive cluster and now the passive node will
not rejoin the cluster.

 

2 nodes running Ubuntu 12.04

coro 1.4.2-2, openais 1.1.4-4, pcmk 1.1.6-2ubuntu3

 

Corosync ring membership is fine on both rings.

 

Tried stopping coro/pace and clearing /var/lib/heartbeat/crm/ and then
restarting on passive node without success.

Tried rebooting passive node (again - it was successfully fenced)

Tried updating pacemaker to latest in distro (1.1.6-2ubuntu3.3) then went
back on passive node

Tried putting active node in maintenance mode and stopping pacemaker and
corosync on both nodes.  Then restarting on both nodes.  Corosync came
back fine as before but now I have the same problem on both nodes with
pacemaker not starting successfully.  Both show exactly same now - attrd:
[24883]: ERROR: main: HA Signon failed.

 

Log:

Jan 27 01:09:59 Condor crmd: [24885]: info: crmd_init: Starting crmd

Jan 27 01:09:59 Condor cib: [24881]: info: validate_with_relaxng: Creating
RNG parser context

Jan 27 01:09:59 Condor lrmd: [24882]: info: enabling coredumps

Jan 27 01:09:59 Condor lrmd: [24882]: info: Started.

Jan 27 01:09:59 Condor corosync[24778]:   [IPC   ] Invalid IPC
credentials.

Jan 27 01:09:59 Condor attrd: [24883]: ERROR: main: HA Signon failed

Jan 27 01:09:59 Condor attrd: [24883]: ERROR: main: Aborting startup

Jan 27 01:09:59 Condor pacemakerd: [24877]: ERROR: pcmk_child_exit: Child
process attrd exited (pid=24883, rc=100)

Jan 27 01:09:59 Condor pacemakerd: [24877]: notice: pcmk_child_exit: Child
process attrd no longer wishes to be respawned

Jan 27 01:09:59 Condor pacemakerd: [24877]: info: update_node_processes:
Node Condor now has process list: 00000000000000000000000000110312 (was
00000000000000000000000000111312)

Jan 27 01:09:59 Condor stonith-ng: [24880]: info:
init_ais_connection_classic: AIS connection established

Jan 27 01:09:59 Condor stonith-ng: [24880]: info: get_ais_nodeid: Server
details: id=167837962 uname=Condor cname=pcmk

Jan 27 01:09:59 Condor stonith-ng: [24880]: info:
init_ais_connection_once: Connection to 'classic openais (with plugin)':
established

Jan 27 01:09:59 Condor stonith-ng: [24880]: info: crm_new_peer: Node
Condor now has id: 167837962

Jan 27 01:09:59 Condor stonith-ng: [24880]: info: crm_new_peer: Node
167837962 is now known as Condor

Jan 27 01:09:59 Condor stonith-ng: [24880]: info: main: Starting
stonith-ng mainloop

Jan 27 01:09:59 Condor stonith-ng: [24880]: info: crm_update_peer: Node
Condor: id=167837962 state=unknown addr=(null) votes=0 born=0 seen=0
proc=00000000000000000000000000110312 (new)

Jan 27 01:09:59 Condor cib: [24881]: info: startCib: CIB Initialization
completed successfully

Jan 27 01:09:59 Condor cib: [24881]: info: get_cluster_type: Cluster type
is: 'openais'

Jan 27 01:09:59 Condor cib: [24881]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)

Jan 27 01:09:59 Condor cib: [24881]: info: init_ais_connection_classic:
Creating connection to our Corosync plugin

Jan 27 01:09:59 Condor corosync[24778]:   [IPC   ] Invalid IPC
credentials.

Jan 27 01:09:59 Condor cib: [24881]: info: init_ais_connection_classic:
Connection to our AIS plugin (9) failed: unknown (100)

Jan 27 01:09:59 Condor cib: [24881]: CRIT: cib_init: Cannot sign in to the
cluster... terminating

Jan 27 01:09:59 Condor pacemakerd: [24877]: ERROR: pcmk_child_exit: Child
process cib exited (pid=24881, rc=100)

Jan 27 01:09:59 Condor pacemakerd: [24877]: notice: pcmk_child_exit: Child
process cib no longer wishes to be respawned

Jan 27 01:09:59 Condor pacemakerd: [24877]: info: update_node_processes:
Node Condor now has process list: 00000000000000000000000000110212 (was
00000000000000000000000000110312)

Jan 27 01:09:59 Condor stonith-ng: [24880]: info: crm_update_peer: Node
Condor: id=167837962 state=unknown addr=(null) votes=0 born=0 seen=0
proc=00000000000000000000000000110212 (new)

Jan 27 01:10:00 Condor crmd: [24885]: info: do_cib_control: Could not
connect to the CIB service: connection failed

Jan 27 01:10:00 Condor crmd: [24885]: WARN: do_cib_control: Couldn't
complete CIB registration 1 times... pause and retry

Jan 27 01:10:00 Condor crmd: [24885]: info: crmd_init: Starting crmd's
mainloop

Jan 27 01:10:01 Condor CRON[24888]: (root) CMD (/etc/init.d/watchdog -e
>/dev/null 2>&1)

Jan 27 01:10:02 Condor crmd: [24885]: info: crm_timer_popped: Wait Timer
(I_NULL) just popped (2000ms)

Jan 27 01:10:03 Condor crmd: [24885]: info: do_cib_control: Could not
connect to the CIB service: connection failed

Jan 27 01:10:03 Condor crmd: [24885]: WARN: do_cib_control: Couldn't
complete CIB registration 2 times... pause and retry

Jan 27 01:10:05 Condor crmd: [24885]: info: crm_timer_popped: Wait Timer
(I_NULL) just popped (2000ms)

Jan 27 01:10:06 Condor crmd: [24885]: info: do_cib_control: Could not
connect to the CIB service: connection failed

Jan 27 01:10:06 Condor crmd: [24885]: WARN: do_cib_control: Couldn't
complete CIB registration 3 times... pause and retry

Jan 27 01:10:08 Condor crmd: [24885]: info: crm_timer_popped: Wait Timer
(I_NULL) just popped (2000ms)

Jan 27 01:10:09 Condor crmd: [24885]: info: do_cib_control: Could not
connect to the CIB service: connection failed

Jan 27 01:10:09 Condor crmd: [24885]: WARN: do_cib_control: Couldn't
complete CIB registration 4 times... pause and retry

Jan 27 01:10:11 Condor crmd: [24885]: info: crm_timer_popped: Wait Timer
(I_NULL) just popped (2000ms)

Jan 27 01:10:12 Condor crmd: [24885]: info: do_cib_control: Could not
connect to the CIB service: connection failed

Jan 27 01:10:12 Condor crmd: [24885]: WARN: do_cib_control: Couldn't
complete CIB registration 5 times... pause and retry

 

Jacob A. Smith
IT Manager
Argotec, LLC



 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20150127/3ea31b69/attachment-0002.html>


More information about the Pacemaker mailing list