[ClusterLabs] Pacemaker fails to start after few starts
Kostiantyn Ponomarenko
konstantin.ponomarenko at gmail.com
Fri Mar 27 14:10:21 UTC 2015
Hi,
If I start/stop Corosync and Pacemaker few times I get the state where
Corosync is running, but Pacemaker cannot start.
Here is a snippet from /var/log/messages:
Mar 27 14:00:49 daemon.notice<29> corosync[111057]: [MAIN ] Corosync
Cluster Engine ('2.3.4'): started and ready to provide service.
Mar 27 14:00:49 daemon.info<30> corosync[111057]: [MAIN ] Corosync
built-in features: pie relro bindnow
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [TOTEM ] Initializing
transport (UDP/IP Unicast).
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [TOTEM ] Initializing
transmit/receive security (NSS) crypto: none hash: sha256
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [TOTEM ] Initializing
transport (UDP/IP Unicast).
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [TOTEM ] Initializing
transmit/receive security (NSS) crypto: none hash: sha256
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [TOTEM ] The network
interface [169.254.0.2] is now up.
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [SERV ] Service
engine loaded: corosync configuration map access [0]
Mar 27 14:00:49 daemon.info<30> corosync[111058]: [QB ] server name:
cmap
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [SERV ] Service
engine loaded: corosync configuration service [1]
Mar 27 14:00:49 daemon.info<30> corosync[111058]: [QB ] server name:
cfg
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [SERV ] Service
engine loaded: corosync cluster closed process group service v1.01 [2]
Mar 27 14:00:49 daemon.info<30> corosync[111058]: [QB ] server name:
cpg
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [SERV ] Service
engine loaded: corosync profile loading service [4]
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [QUORUM] Using quorum
provider corosync_votequorum
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [SERV ] Service
engine loaded: corosync vote quorum service v1.0 [5]
Mar 27 14:00:49 daemon.info<30> corosync[111058]: [QB ] server name:
votequorum
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [SERV ] Service
engine loaded: corosync cluster quorum service v0.1 [3]
Mar 27 14:00:49 daemon.info<30> corosync[111058]: [QB ] server name:
quorum
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [TOTEM ] adding new
UDPU member {169.254.0.2}
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [TOTEM ] adding new
UDPU member {169.254.0.3}
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [TOTEM ] The network
interface [169.254.1.2] is now up.
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [TOTEM ] adding new
UDPU member {169.254.1.2}
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [TOTEM ] adding new
UDPU member {169.254.1.3}
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [TOTEM ] A new
membership (169.254.0.2:1296) was formed. Members joined: 1
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [QUORUM] Members[1]: 1
Mar 27 14:00:49 daemon.notice<29> corosync[111058]: [MAIN ] Completed
service synchronization, ready to provide service.
Mar 27 14:00:49 daemon.notice<29> pacemaker: Starting Pacemaker Cluster
Manager
Mar 27 14:00:49 daemon.notice<29> pacemakerd[111069]: notice:
crm_add_logfile: Additional logging available in /var/log/pacemaker.log
Mar 27 14:00:49 daemon.err<27> pacemakerd[111069]: error:
mcp_read_config: Couldn't create logfile: /var/log/pacemaker.log
Mar 27 14:00:49 daemon.notice<29> pacemakerd[111069]: notice:
mcp_read_config: Configured corosync to accept connections from group 107:
Library error (2)
Mar 27 14:00:49 daemon.notice<29> pacemakerd[111069]: notice: main:
Starting Pacemaker 1.1.12 (Build: 561c4cf): generated-manpages
agent-manpages ascii-docs ncurses libqb-logging libqb-ipc lha-fencing
upstart nagios corosync-native snmp libesmtp acls
Mar 27 14:00:49 daemon.notice<29> pacemakerd[111069]: notice:
cluster_connect_quorum: Quorum lost
Mar 27 14:00:49 daemon.notice<29> stonithd[111072]: notice:
crm_cluster_connect: Connecting to cluster infrastructure: corosync
Mar 27 14:00:49 daemon.notice<29> attrd[111074]: notice:
crm_cluster_connect: Connecting to cluster infrastructure: corosync
Mar 27 14:00:49 daemon.err<27> corosync[111058]: [MAIN ] Denied
connection attempt from 105:107
Mar 27 14:00:49 daemon.err<27> attrd[111074]: error:
cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
Mar 27 14:00:49 daemon.err<27> attrd[111074]: error: main: Cluster
connection failed
Mar 27 14:00:49 daemon.err<27> corosync[111058]: [QB ] Invalid IPC
credentials (111058-111074-2).
Mar 27 14:00:49 daemon.notice<29> attrd[111074]: notice: main: Cleaning
up before exit
Mar 27 14:00:49 daemon.notice<29> cib[111071]: notice:
crm_cluster_connect: Connecting to cluster infrastructure: corosync
Mar 27 14:00:49 kern.info<6> kernel: [190948.176344] attrd[111074]:
segfault at 1b8 ip 00007fbcdab2f9e1 sp 00007fff3fe65690 error 4 in
libqb.so.0.17.1[7fbcdab20000+22000]
Mar 27 14:00:50 daemon.err<27> corosync[111058]: [MAIN ] Denied
connection attempt from 105:107
Mar 27 14:00:50 daemon.err<27> corosync[111058]: [QB ] Invalid IPC
credentials (111058-111071-2).
Mar 27 14:00:50 daemon.err<27> cib[111071]: error: cluster_connect_cpg:
Could not connect to the Cluster Process Group API: 11
Mar 27 14:00:50 daemon.crit<26> cib[111071]: crit: cib_init: Cannot
sign in to the cluster... terminating
Mar 27 14:00:50 daemon.notice<29> crmd[111076]: notice: main: CRM Git
Version: 561c4cf
Mar 27 14:00:50 daemon.notice<29> pacemakerd[111069]: notice:
crm_update_peer_state: pcmk_quorum_notification: Node node-0[1] - state is
now member (was (null))
Mar 27 14:00:50 daemon.err<27> pacemakerd[111069]: error:
pcmk_child_exit: Child process cib (111071) exited: Network is down (100)
Mar 27 14:00:50 daemon.warning<28> pacemakerd[111069]: warning:
pcmk_child_exit: Pacemaker child process cib no longer wishes to be
respawned. Shutting ourselves down.
Mar 27 14:00:50 daemon.err<27> pacemakerd[111069]: error: child_waitpid:
Managed process 111074 (attrd) dumped core
Mar 27 14:00:50 daemon.notice<29> pacemakerd[111069]: notice:
pcmk_child_exit: Child process attrd terminated with signal 11 (pid=111074,
core=1)
Mar 27 14:00:50 daemon.notice<29> pacemakerd[111069]: notice:
pcmk_shutdown_worker: Shuting down Pacemaker
Mar 27 14:00:50 daemon.notice<29> pacemakerd[111069]: notice: stop_child:
Stopping crmd: Sent -15 to process 111076
Mar 27 14:00:50 daemon.warning<28> crmd[111076]: warning: do_cib_control:
Couldn't complete CIB registration 1 times... pause and retry
Mar 27 14:00:50 daemon.notice<29> crmd[111076]: notice: crm_shutdown:
Requesting shutdown, upper limit is 1200000ms
Mar 27 14:00:50 daemon.warning<28> crmd[111076]: warning: do_log: FSA:
Input I_SHUTDOWN from crm_shutdown() received in state S_STARTING
Mar 27 14:00:50 daemon.notice<29> crmd[111076]: notice:
do_state_transition: State transition S_STARTING -> S_STOPPING [
input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
Mar 27 14:00:50 daemon.notice<29> crmd[111076]: notice:
terminate_cs_connection: Disconnecting from Corosync
Mar 27 14:00:50 daemon.notice<29> pacemakerd[111069]: notice: stop_child:
Stopping pengine: Sent -15 to process 111075
Mar 27 14:00:50 daemon.notice<29> pacemakerd[111069]: notice: stop_child:
Stopping lrmd: Sent -15 to process 111073
Mar 27 14:00:50 daemon.notice<29> pacemakerd[111069]: notice: stop_child:
Stopping stonith-ng: Sent -15 to process 111072
Mar 27 14:00:59 daemon.err<27> stonithd[111072]: error: setup_cib: Could
not connect to the CIB service: Transport endpoint is not connected (-107)
Mar 27 14:00:59 daemon.notice<29> pacemakerd[111069]: notice:
pcmk_shutdown_worker: Shutdown complete
Mar 27 14:00:59 daemon.notice<29> pacemakerd[111069]: notice:
pcmk_shutdown_worker: Attempting to inhibit respawning after fatal error
Thank you,
Kostya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150327/fdf0ab56/attachment-0003.html>
More information about the Users
mailing list