[Pacemaker] Pacemaker1.1.8:pacemaker srvc fails to stop gracefully & crm resource cleanup fails

Parshvi parshvi.17 at gmail.com
Fri Jan 4 08:32:32 EST 2013


Hi,
We have a two node cluster setup, running the following versions:
cluster-glue-1.0.6
resource-agents-1.0.4
pacemaker-1.1.8
corosync-1.4.1
libaio-devel-0.3.106
libibverbs-1.1.3
libqb-0.14.2
librdmacm-1.0.10
libtool-ltdl-1.5.22
pacemaker-cli-1.1.8
pacemaker-cluster-libs-1.1.8
pacemaker-libs-1.1.8
crm-1.2

crm_mon failed on Node-1 with the following error:
establish cib_ro connection: Resource temporarily unavailable (11)

crm resource cleanup <rsc> failed on Node-1 with the following error:
Could not establish cib_rw connection: Resource temporarily unavailable (11)
Error signing on to the CIB service: Transport endpoint is not connected

All the resources were running on both nodes as configured. All the pacemaker & 
corosync processes were running.
After some time node-1 appeared offline:

Last updated: Wed Jan 2 11:36:11 2013
Last change: Wed Jan 2 11:31:43 2013 via crmd on CSS-FU-2
Stack: openais
Current DC: CSS-FU-2 - partition with quorum
Version: 1.1.8-2.el5-394e906
2 Nodes configured, 2 expected votes
19 Resources configured.

Online: [ CSS-FU-2 ]
OFFLINE: [ CSS-FU-1 ]

Next, stopping pacemaker service also didn't succeed. It succeeded on Node-2.
We had to kill pacemaker service to bring everything in-sync.
I have collated some of the logs (error/warning) of the duration:
It can be found at:
http://dl.dropbox.com/u/20096935/Pacemaker_Stop_Failure/pacemaker1.1_stop_failur
e.txt

Immediate help required.






More information about the Pacemaker mailing list