[ClusterLabs] 2-Node cluster - both nodes unclean - can't start cluster
Lentes, Bernd
bernd.lentes at helmholtz-muenchen.de
Fri Mar 10 13:49:26 EST 2023
Hi,
I don’t get my cluster running. I had problems with an OCFS2 Volume, both
nodes have been fenced.
When I do now a “systemctl start pacemaker.service”, crm_mon shows for a few
seconds both nodes as UNCLEAN, then pacemaker stops.
I try to confirm the fendcing with “Stonith_admin –C”, but it doesn’t work.
Maybe time is to short, pacemaker is just running for a few seconds.
Here is the log:
Mar 10 19:36:24 [31037] ha-idg-1 corosync notice [MAIN ] Corosync Cluster
Engine ('2.3.6'): started and ready to provide service.
Mar 10 19:36:24 [31037] ha-idg-1 corosync info [MAIN ] Corosync built-in
features: debug testagents augeas systemd pie relro bindnow
Mar 10 19:36:24 [31037] ha-idg-1 corosync notice [TOTEM ] Initializing
transport (UDP/IP Multicast).
Mar 10 19:36:24 [31037] ha-idg-1 corosync notice [TOTEM ] Initializing
transmit/receive security (NSS) crypto: aes256 hash: sha1
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [TOTEM ] The network
interface [192.168.100.10] is now up.
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [SERV ] Service engine
loaded: corosync configuration map access [0]
Mar 10 19:36:25 [31037] ha-idg-1 corosync info [QB ] server name: cmap
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [SERV ] Service engine
loaded: corosync configuration service [1]
Mar 10 19:36:25 [31037] ha-idg-1 corosync info [QB ] server name: cfg
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [SERV ] Service engine
loaded: corosync cluster closed process group service v1.01 [2]
Mar 10 19:36:25 [31037] ha-idg-1 corosync info [QB ] server name: cpg
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [SERV ] Service engine
loaded: corosync profile loading service [4]
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [QUORUM] Using quorum
provider corosync_votequorum
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [QUORUM] This node is
within the primary component and will provide service.
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [QUORUM] Members[0]:
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [SERV ] Service engine
loaded: corosync vote quorum service v1.0 [5]
Mar 10 19:36:25 [31037] ha-idg-1 corosync info [QB ] server name:
votequorum
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [SERV ] Service engine
loaded: corosync cluster quorum service v0.1 [3]
Mar 10 19:36:25 [31037] ha-idg-1 corosync info [QB ] server name:
quorum
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [TOTEM ] A new membership
(192.168.100.10:2340) was formed. Members joined: 1084777482
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [QUORUM] Members[1]:
1084777482
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice [MAIN ] Completed service
synchronization, ready to provide service.
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: notice: main: Starting
Pacemaker 1.1.24+20210811.f5abda0ee-3.27.1 | build=1.1.24+20210811.f5abda0ee
features: generated-manpages agent-manp
ages ncurses libqb-logging libqb-ipc lha-fencing systemd nagios
corosync-native atomic-attrd snmp libesmtp acls cibsecrets
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: main: Maximum core
file size is: 18446744073709551615
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: qb_ipcs_us_publish:
server name: pacemakerd
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info:
pcmk__ipc_is_authentic_process_active: Could not connect to lrmd IPC:
Connection refused
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info:
pcmk__ipc_is_authentic_process_active: Could not connect to cib_ro IPC:
Connection refused
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info:
pcmk__ipc_is_authentic_process_active: Could not connect to crmd IPC:
Connection refused
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info:
pcmk__ipc_is_authentic_process_active: Could not connect to attrd IPC:
Connection refused
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info:
pcmk__ipc_is_authentic_process_active: Could not connect to pengine IPC:
Connection refused
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info:
pcmk__ipc_is_authentic_process_active: Could not connect to stonith-ng
IPC: Connection refused
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: corosync_node_name:
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: notice: get_node_name:
Could not obtain a node name for corosync nodeid 1084777482
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: crm_get_peer:
Created entry 3c2499de-58a8-44f7-bf1e-03ff1fbec774/0x1456550 for node
(null)/1084777482 (1 total)
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: crm_get_peer: Node
1084777482 has uuid 1084777482
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: crm_update_peer_proc:
cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: notice:
cluster_connect_quorum: Quorum acquired
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: corosync_node_name:
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: notice: get_node_name:
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: crm_get_peer: Node
1084777482 is now known as ha-idg-1
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child:
Using uid=90 and group=90 for process cib
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child:
Forked child 31045 for process cib
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child:
Forked child 31046 for process stonith-ng
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child:
Forked child 31047 for process lrmd
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child:
Using uid=90 and group=90 for process attrd
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child:
Forked child 31048 for process attrd
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child:
Using uid=90 and group=90 for process pengine
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child:
Forked child 31049 for process pengine
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child:
Using uid=90 and group=90 for process crmd
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: start_child:
Forked child 31050 for process crmd
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: main: Starting
mainloop
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info:
pcmk_quorum_notification: Quorum retained | membership=2340 members=1
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: notice:
crm_update_peer_state_iter: Node ha-idg-1 state is now member |
nodeid=1084777482 previous=unknown source=pcmk_quorum_notification
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: pcmk_cpg_membership:
Group pacemakerd event 0: node 1084777482 pid 31044 joined via cpg_join
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: pcmk_cpg_membership:
Group pacemakerd event 0: ha-idg-1 (node 1084777482 pid 31044) is member
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: crm_log_init:
Changed active directory to /var/lib/pacemaker/cores
Mar 10 19:36:25 [31049] ha-idg-1 pengine: info: crm_log_init:
Changed active directory to /var/lib/pacemaker/cores
Mar 10 19:36:25 [31049] ha-idg-1 pengine: info: qb_ipcs_us_publish:
server name: pengine
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: get_cluster_type:
Verifying cluster type: 'corosync'
Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: crm_log_init:
Changed active directory to /var/lib/pacemaker/cores
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: get_cluster_type:
Assuming an active 'corosync' cluster
Mar 10 19:36:25 [31049] ha-idg-1 pengine: info: main: Starting
pengine
Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: main: Starting up
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: retrieveCib:
Reading cluster configuration file /var/lib/pacemaker/cib/cib.xml (digest:
/var/lib/pacemaker/cib/cib.xml.sig)
Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: get_cluster_type:
Verifying cluster type: 'corosync'
Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: get_cluster_type:
Assuming an active 'corosync' cluster
Mar 10 19:36:25 [31048] ha-idg-1 attrd: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: crm_log_init:
Changed active directory to /var/lib/pacemaker/cores
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: get_cluster_type:
Verifying cluster type: 'corosync'
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: get_cluster_type:
Assuming an active 'corosync' cluster
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Mar 10 19:36:25 [31047] ha-idg-1 lrmd: info: crm_log_init:
Changed active directory to /var/lib/pacemaker/cores
Mar 10 19:36:25 [31047] ha-idg-1 lrmd: info: qb_ipcs_us_publish:
server name: lrmd
Mar 10 19:36:25 [31047] ha-idg-1 lrmd: info: main: Starting
Mar 10 19:36:25 [31050] ha-idg-1 crmd: info: crm_log_init:
Changed active directory to /var/lib/pacemaker/cores
Mar 10 19:36:25 [31050] ha-idg-1 crmd: info: main: CRM Git
Version: 1.1.24+20210811.f5abda0ee-3.27.1 (1.1.24+20210811.f5abda0ee)
Mar 10 19:36:25 [31050] ha-idg-1 crmd: info: get_cluster_type:
Verifying cluster type: 'corosync'
Mar 10 19:36:25 [31050] ha-idg-1 crmd: info: get_cluster_type:
Assuming an active 'corosync' cluster
Mar 10 19:36:25 [31050] ha-idg-1 crmd: warning:
log_deprecation_warnings: Compile-time support for crm_mon SNMP
options is deprecated and will be removed in a future release (configure
alerts instead)
Mar 10 19:36:25 [31050] ha-idg-1 crmd: warning:
log_deprecation_warnings: Compile-time support for crm_mon SMTP
options is deprecated and will be removed in a future release (configure
alerts instead)
Mar 10 19:36:25 [31050] ha-idg-1 crmd: info: do_log: Input
I_STARTUP received in state S_STARTING from crmd_init
Mar 10 19:36:25 [31045] ha-idg-1 cib: info:
validate_with_relaxng: Creating RNG parser context
Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: corosync_node_name:
Unable to get node name for nodeid 1084777482 ⇐========= this happens
quite often
Mar 10 19:36:25 [31048] ha-idg-1 attrd: notice: get_node_name:
Could not obtain a node name for corosync nodeid 1084777482
Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: crm_get_peer:
Created entry c1bd522c-34da-49b3-97cb-22fd4580959b/0x109e210 for node
(null)/1084777482 (1 total)
Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: crm_get_peer: Node
1084777482 has uuid 1084777482
Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: crm_update_peer_proc:
cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online
Mar 10 19:36:25 [31048] ha-idg-1 attrd: notice:
crm_update_peer_state_iter: Node (null) state is now member |
nodeid=1084777482 previous=unknown source=crm_update_peer_proc
Mar 10 19:36:25 [31048] ha-idg-1 attrd: info:
init_cs_connection_once: Connection to 'corosync': established
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: corosync_node_name:
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: notice: get_node_name:
Could not obtain a node name for corosync nodeid 1084777482
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: crm_get_peer:
Created entry 1d232d33-d274-415d-be94-765dc1b4e1e4/0x9478d0 for node
(null)/1084777482 (1 total)
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: crm_get_peer: Node
1084777482 has uuid 1084777482
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: crm_update_peer_proc:
cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: notice:
crm_update_peer_state_iter: Node (null) state is now member |
nodeid=1084777482 previous=unknown source=crm_update_peer_proc
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: startCib: CIB
Initialization completed successfully
Mar 10 19:36:25 [31045] ha-idg-1 cib: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: corosync_node_name:
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31048] ha-idg-1 attrd: notice: get_node_name:
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: crm_get_peer: Node
1084777482 is now known as ha-idg-1
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: corosync_node_name:
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: notice: get_node_name:
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info:
init_cs_connection_once: Connection to 'corosync': established
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: corosync_node_name:
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31045] ha-idg-1 cib: notice: get_node_name:
Could not obtain a node name for corosync nodeid 1084777482
Mar 10 19:36:25 [31048] ha-idg-1 attrd: info: main: Cluster
connection active
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: crm_get_peer:
Created entry 7c2b1d3d-0ab6-4fa6-887c-5d01e5927a67/0x147af10 for node
(null)/1084777482 (1 total)
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: crm_get_peer: Node
1084777482 has uuid 1084777482
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: crm_update_peer_proc:
cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online
Mar 10 19:36:25 [31045] ha-idg-1 cib: notice:
crm_update_peer_state_iter: Node (null) state is now member |
nodeid=1084777482 previous=unknown source=crm_update_peer_proc
Mar 10 19:36:25 [31045] ha-idg-1 cib: info:
init_cs_connection_once: Connection to 'corosync': established
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: corosync_node_name:
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: notice: get_node_name:
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng: info: crm_get_peer: Node
1084777482 is now known as ha-idg-1
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: corosync_node_name:
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31045] ha-idg-1 cib: notice: get_node_name:
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: crm_get_peer: Node
1084777482 is now known as ha-idg-1
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: qb_ipcs_us_publish:
server name: cib_ro
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: qb_ipcs_us_publish:
server name: cib_rw
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: qb_ipcs_us_publish:
server name: cib_shm
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: cib_init:
Starting cib mainloop
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: pcmk_cpg_membership:
Group cib event 0: node 1084777482 pid 31045 joined via cpg_join
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: pcmk_cpg_membership:
Group cib event 0: ha-idg-1 (node 1084777482 pid 31045) is member
Mar 10 19:36:25 [31045] ha-idg-1 cib: info: cib_file_backup:
Archived previous version as /var/lib/pacemaker/cib/cib-34.raw
Mar 10 19:36:25 [31045] ha-idg-1 cib: info:
cib_file_write_with_digest: Wrote version 7.29548.0 of the CIB to disk
(digest: 03b4ec65319cef255d43fc1ec9d285a5)
Mar 10 19:36:25 [31045] ha-idg-1 cib: info:
cib_file_write_with_digest: Reading cluster configuration file
/var/lib/pacemaker/cib/cib.MBy2v0 (digest:
/var/lib/pacemaker/cib/cib.nDn0X9)
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_cib_control: CIB
connection established
Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: corosync_node_name:
Unable to get node name for nodeid 1084777482
Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice: get_node_name:
Could not obtain a node name for corosync nodeid 1084777482
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: crm_get_peer:
Created entry 873262c1-ede0-4ba7-97e6-53ead0a6d7b0/0x1613910 for node
(null)/1084777482 (1 total)
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: crm_get_peer: Node
1084777482 has uuid 1084777482
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: crm_update_peer_proc:
cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: corosync_node_name:
Unable to get node name for nodeid 1084777482
Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice: get_node_name:
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info:
init_cs_connection_once: Connection to 'corosync': established
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: corosync_node_name:
Unable to get node name for nodeid 1084777482
Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice: get_node_name:
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: crm_get_peer: Node
1084777482 is now known as ha-idg-1
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: peer_update_callback:
Cluster node ha-idg-1 is now in unknown state ⇐===== is that the
problem ?
Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: attrd_erase_attrs:
Clearing transient attributes from CIB |
xpath=//node_state[@uname='ha-idg-1']/transient_attributes
Mar 10 19:36:26 [31048] ha-idg-1 attrd: info:
attrd_start_election_if_needed: Starting an election to determine the
writer
Mar 10 19:36:26 [31045] ha-idg-1 cib: info: cib_process_request:
Forwarding cib_delete operation for section
//node_state[@uname='ha-idg-1']/transient_attributes to all
(origin=local/attrd/2)
Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: corosync_node_name:
Unable to get node name for nodeid 1084777482
Mar 10 19:36:26 [31048] ha-idg-1 attrd: notice: get_node_name:
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: main: CIB
connection active
Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: qb_ipcs_us_publish:
server name: attrd
Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: main: Accepting
attribute updates
Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: pcmk_cpg_membership:
Group attrd event 0: node 1084777482 pid 31048 joined via cpg_join
Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: pcmk_cpg_membership:
Group attrd event 0: ha-idg-1 (node 1084777482 pid 31048) is member
Mar 10 19:36:26 [31045] ha-idg-1 cib: info: corosync_node_name:
Unable to get node name for nodeid 1084777482
Mar 10 19:36:26 [31045] ha-idg-1 cib: notice: get_node_name:
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: election_check:
election-attrd won by local node
Mar 10 19:36:26 [31048] ha-idg-1 attrd: notice: attrd_declare_winner:
Recorded local node as attribute writer (was unset)
Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: attrd_peer_update:
Setting #attrd-protocol[ha-idg-1]: (null) -> 2 from ha-idg-1
Mar 10 19:36:26 [31048] ha-idg-1 attrd: info: write_attribute:
Processed 1 private change for #attrd-protocol, id=n/a, set=n/a
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: setup_cib:
Watching for stonith topology changes
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: qb_ipcs_us_publish:
server name: stonith-ng
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: main: Starting
stonith-ng mainloop
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: pcmk_cpg_membership:
Group stonith-ng event 0: node 1084777482 pid 31046 joined via cpg_join
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: pcmk_cpg_membership:
Group stonith-ng event 0: ha-idg-1 (node 1084777482 pid 31046) is member
Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice:
cluster_connect_quorum: Quorum acquired
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: init_cib_cache_cb:
Updating device list from the cib: init
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: cib_devices_update:
Updating devices to version 7.29548.0
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: notice: unpack_config: On
loss of CCM Quorum: Ignore
Mar 10 19:36:26 [31045] ha-idg-1 cib: info: cib_process_request:
Completed cib_delete operation for section
//node_state[@uname='ha-idg-1']/transient_attributes: OK (rc=0,
origin=ha-idg-1/attrd/2, version=7.29548.0)
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_ha_control:
Connected to the cluster
Mar 10 19:36:26 [31045] ha-idg-1 cib: info: cib_process_request:
Forwarding cib_modify operation for section nodes to all
(origin=local/crmd/3)
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: lrmd_ipc_connect:
Connecting to lrmd
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_lrm_control: LRM
connection established
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_started:
Delaying start, no membership data (0000000000100000)
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info:
pcmk_quorum_notification: Quorum retained | membership=2340 members=1
Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice:
crm_update_peer_state_iter: Node ha-idg-1 state is now member |
nodeid=1084777482 previous=unknown source=pcmk_quorum_notification
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: peer_update_callback:
Cluster node ha-idg-1 is now member (was in unknown state)
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_started:
Delaying start, Config not read (0000000000000040)
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: pcmk_cpg_membership:
Group crmd event 0: node 1084777482 pid 31050 joined via cpg_join
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: pcmk_cpg_membership:
Group crmd event 0: ha-idg-1 (node 1084777482 pid 31050) is member
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_started:
Delaying start, Config not read (0000000000000040)
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_started:
Delaying start, Config not read (0000000000000040)
Mar 10 19:36:26 [31045] ha-idg-1 cib: info: cib_process_request:
Completed cib_modify operation for section nodes: OK (rc=0,
origin=ha-idg-1/crmd/3, version=7.29548.0)
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: qb_ipcs_us_publish:
server name: crmd
Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice: do_started: The
local CRM is operational ⇐============================ looks pretty good
Mar 10 19:36:26 [31050] ha-idg-1 crmd: info: do_log: Input
I_PENDING received in state S_STARTING from do_started
Mar 10 19:36:26 [31050] ha-idg-1 crmd: notice: do_state_transition:
State transition S_STARTING -> S_PENDING | input=I_PENDING
cause=C_FSA_INTERNAL origin=do_started
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: action_synced_wait:
Managed fence_ilo2_metadata_1 process 31052 exited with rc=0
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info:
stonith_device_register: Added 'fence_ilo_ha-idg-2' to the device list (1
active devices)
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info: action_synced_wait:
Managed fence_ilo4_metadata_1 process 31054 exited with rc=0
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng: info:
stonith_device_register: Added 'fence_ilo_ha-idg-1' to the device list (2
active devices)
Mar 10 19:36:28 [31050] ha-idg-1 crmd: info:
te_trigger_stonith_history_sync: Fence history will be synchronized
cluster-wide within 30 seconds
Mar 10 19:36:28 [31050] ha-idg-1 crmd: notice: te_connect_stonith:
Fencer successfully connected
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: notice: handle_request:
Received manual confirmation that ha-idg-1 is fenced
<===================== seems to be my "stonith_admin -C"
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: notice:
initiate_remote_stonith_op: Initiating manual confirmation for
ha-idg-1: 23926653-7baa-44b8-ade3-5ee8468f3db6
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: notice: stonith_manual_ack:
Injecting manual confirmation that ha-idg-1 is safely off/down
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: notice: remote_op_done:
Operation 'off' targeting ha-idg-1 on a human for
stonith_admin.31555 at ha-idg-1.23926653: OK
Mar 10 19:36:34 [31050] ha-idg-1 crmd: info: exec_alert_list:
Sending fencing alert via smtp_alert to informatic.idg at helmholtz-muenchen.de
Mar 10 19:36:34 [31047] ha-idg-1 lrmd: info:
process_lrmd_alert_exec: Executing alert smtp_alert for
6bb5a831-e90c-4b0b-8783-0092a26a1e6c
Mar 10 19:36:34 [31050] ha-idg-1 crmd: crit:
tengine_stonith_notify: We were allegedly just fenced by a human for
ha-idg-1! <===================== what does that mean ? I didn't fence
it
Mar 10 19:36:34 [31050] ha-idg-1 crmd: info: crm_xml_cleanup:
Cleaning up memory from libxml2
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: warning: pcmk_child_exit:
Shutting cluster down because crmd[31050] had fatal failure
<======================= ???
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: pcmk_shutdown_worker:
Shutting down Pacemaker
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: stop_child:
Stopping pengine | sent signal 15 to process 31049
Mar 10 19:36:34 [31049] ha-idg-1 pengine: notice: crm_signal_dispatch:
Caught 'Terminated' signal | 15 (invoking handler)
Mar 10 19:36:34 [31049] ha-idg-1 pengine: info: qb_ipcs_us_withdraw:
withdrawing server sockets
Mar 10 19:36:34 [31049] ha-idg-1 pengine: info: crm_xml_cleanup:
Cleaning up memory from libxml2
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: pcmk_child_exit:
pengine[31049] exited with status 0 (OK)
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: stop_child:
Stopping attrd | sent signal 15 to process 31048
Mar 10 19:36:34 [31048] ha-idg-1 attrd: notice: crm_signal_dispatch:
Caught 'Terminated' signal | 15 (invoking handler)
Mar 10 19:36:34 [31048] ha-idg-1 attrd: info: main: Shutting
down attribute manager
Mar 10 19:36:34 [31048] ha-idg-1 attrd: info: qb_ipcs_us_withdraw:
withdrawing server sockets
Mar 10 19:36:34 [31048] ha-idg-1 attrd: info: attrd_cib_destroy_cb:
Connection disconnection complete
Mar 10 19:36:34 [31048] ha-idg-1 attrd: info: crm_xml_cleanup:
Cleaning up memory from libxml2
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: pcmk_child_exit:
attrd[31048] exited with status 0 (OK)
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: stop_child:
Stopping lrmd | sent signal 15 to process 31047
Mar 10 19:36:34 [31047] ha-idg-1 lrmd: notice: crm_signal_dispatch:
Caught 'Terminated' signal | 15 (invoking handler)
Mar 10 19:36:34 [31047] ha-idg-1 lrmd: info: lrmd_exit:
Terminating with 0 clients
Mar 10 19:36:34 [31047] ha-idg-1 lrmd: info: qb_ipcs_us_withdraw:
withdrawing server sockets
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 10 19:36:34 [31047] ha-idg-1 lrmd: info: crm_xml_cleanup:
Cleaning up memory from libxml2
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: pcmk_child_exit:
lrmd[31047] exited with status 0 (OK)
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: stop_child:
Stopping stonith-ng | sent signal 15 to process 31046
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: notice: crm_signal_dispatch:
Caught 'Terminated' signal | 15 (invoking handler)
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: info: stonith_shutdown:
Terminating with 3 clients
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: info:
cib_connection_destroy: Connection to the CIB closed.
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: info: qb_ipcs_us_withdraw:
withdrawing server sockets
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng: info: crm_xml_cleanup:
Cleaning up memory from libxml2
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: pcmk_child_exit:
stonith-ng[31046] exited with status 0 (OK)
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: stop_child:
Stopping cib | sent signal 15 to process 31045
Mar 10 19:36:34 [31045] ha-idg-1 cib: notice: crm_signal_dispatch:
Caught 'Terminated' signal | 15 (invoking handler)
Mar 10 19:36:34 [31045] ha-idg-1 cib: info: cib_shutdown:
Disconnected 0 clients
Mar 10 19:36:34 [31045] ha-idg-1 cib: info: cib_shutdown: All
clients disconnected (0)
Mar 10 19:36:34 [31045] ha-idg-1 cib: info: terminate_cib:
initiate_exit: Exiting from mainloop...
Mar 10 19:36:34 [31045] ha-idg-1 cib: info:
crm_cluster_disconnect: Disconnecting from cluster infrastructure: corosync
Mar 10 19:36:34 [31045] ha-idg-1 cib: info:
terminate_cs_connection: Disconnecting from Corosync
Mar 10 19:36:34 [31045] ha-idg-1 cib: info:
terminate_cs_connection: No Quorum connection
Mar 10 19:36:34 [31045] ha-idg-1 cib: notice:
terminate_cs_connection: Disconnected from Corosync
Mar 10 19:36:34 [31045] ha-idg-1 cib: info:
crm_cluster_disconnect: Disconnected from corosync
Mar 10 19:36:34 [31045] ha-idg-1 cib: info:
crm_cluster_disconnect: Disconnecting from cluster infrastructure: corosync
Mar 10 19:36:34 [31045] ha-idg-1 cib: info:
terminate_cs_connection: Disconnecting from Corosync
Mar 10 19:36:34 [31045] ha-idg-1 cib: info:
cluster_disconnect_cpg: No CPG connection
Mar 10 19:36:34 [31045] ha-idg-1 cib: info:
terminate_cs_connection: No Quorum connection
Mar 10 19:36:34 [31045] ha-idg-1 cib: notice:
terminate_cs_connection: Disconnected from Corosync
Mar 10 19:36:34 [31045] ha-idg-1 cib: info:
crm_cluster_disconnect: Disconnected from corosync
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 10 19:36:34 [31045] ha-idg-1 cib: info: qb_ipcs_us_withdraw:
withdrawing server sockets
Mar 10 19:36:34 [31045] ha-idg-1 cib: info: qb_ipcs_us_withdraw:
withdrawing server sockets
Mar 10 19:36:34 [31045] ha-idg-1 cib: info: qb_ipcs_us_withdraw:
withdrawing server sockets
Mar 10 19:36:34 [31045] ha-idg-1 cib: info: crm_xml_cleanup:
Cleaning up memory from libxml2
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: pcmk_child_exit:
cib[31045] exited with status 0 (OK)
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: pcmk_shutdown_worker:
Shutdown complete
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: notice: pcmk_shutdown_worker:
Attempting to inhibit respawning after fatal error
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info:
pcmk_exit_with_cluster: Asking Corosync to shut down
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [CFG ] Node 1084777482
was shut down by sysadmin
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd: info: crm_xml_cleanup:
Cleaning up memory from libxml2
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [SERV ] Unloading all
Corosync service engines.
Mar 10 19:36:34 [31037] ha-idg-1 corosync info [QB ] withdrawing
server sockets
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [SERV ] Service engine
unloaded: corosync vote quorum service v1.0
Mar 10 19:36:34 [31037] ha-idg-1 corosync info [QB ] withdrawing
server sockets
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [SERV ] Service engine
unloaded: corosync configuration map access
Mar 10 19:36:34 [31037] ha-idg-1 corosync info [QB ] withdrawing
server sockets
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [SERV ] Service engine
unloaded: corosync configuration service
Mar 10 19:36:34 [31037] ha-idg-1 corosync info [QB ] withdrawing
server sockets
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [SERV ] Service engine
unloaded: corosync cluster closed process group service v1.01
Mar 10 19:36:34 [31037] ha-idg-1 corosync info [QB ] withdrawing
server sockets
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [SERV ] Service engine
unloaded: corosync cluster quorum service v0.1
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [SERV ] Service engine
unloaded: corosync profile loading service
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice [MAIN ] Corosync Cluster
Engine exiting normally
Bernd
--
Bernd Lentes
System Administrator
Institute for Metabolism and Cell Death (MCD)
Building 25 - office 122
HelmholtzZentrum München
bernd.lentes at helmholtz-muenchen.de
phone: +49 89 3187 1241
+49 89 3187 49123
fax: +49 89 3187 2294
https://www.helmholtz-munich.de/en/mcd
Public key:
30 82 01 0a 02 82 01 01 00 b3 72 3e ce 2c 0a 6f 58 49 2c 92 23 c7 b9 c1 ff
6c 3a 53 be f7 9e e9 24 b7 49 fa 3c e8 de 28 85 2c d3 ed f7 70 03 3f 4d 82
fc cc 96 4f 18 27 1f df 25 b3 13 00 db 4b 1d ec 7f 1b cf f9 cd e8 5b 1f 11
b3 a7 48 f8 c8 37 ed 41 ff 18 9f d7 83 51 a9 bd 86 c2 32 b3 d6 2d 77 ff 32
83 92 67 9e ae ae 9c 99 ce 42 27 6f bf d8 c2 a1 54 fd 2b 6b 12 65 0e 8a 79
56 be 53 89 70 51 02 6a eb 76 b8 92 25 2d 88 aa 57 08 42 ef 57 fb fe 00 71
8e 90 ef b2 e3 22 f3 34 4f 7b f1 c4 b1 7c 2f 1d 6f bd c8 a6 a1 1f 25 f3 e4
4b 6a 23 d3 d2 fa 27 ae 97 80 a3 f0 5a c4 50 4a 45 e3 45 4d 82 9f 8b 87 90
d0 f9 92 2d a7 d2 67 53 e6 ae 1e 72 3e e9 e0 c9 d3 1c 23 e0 75 78 4a 45 60
94 f8 e3 03 0b 09 85 08 d0 6c f3 ff ce fa 50 25 d9 da 81 7b 2a dc 9e 28 8b
83 04 b4 0a 9f 37 b8 ac 58 f1 38 43 0e 72 af 02 03 01 00 01
(null)
Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH), Ingolstadter Landstr. 1, 85764 Neuherberg, www.helmholtz-munich.de. Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther, Daniela Sommer (kom.) | Aufsichtsratsvorsitzende: Prof. Dr. Veronika von Messling | Registergericht: Amtsgericht Muenchen HRB 6466 | USt-IdNr. DE 129521671
More information about the Users
mailing list