[ClusterLabs] 2-Node cluster - both nodes unclean - can't start cluster

Lentes, Bernd bernd.lentes at helmholtz-muenchen.de
Fri Mar 10 13:49:26 EST 2023


Hi,

I don’t get my cluster running. I had problems with an OCFS2 Volume, both 
nodes have been fenced.
When I do now a “systemctl start pacemaker.service”, crm_mon shows for a few 
seconds both nodes as UNCLEAN, then pacemaker stops.
I try to confirm the fendcing with “Stonith_admin –C”, but it doesn’t work.
Maybe time is to short, pacemaker is just running for a few seconds.

Here is the log:

Mar 10 19:36:24 [31037] ha-idg-1 corosync notice  [MAIN  ] Corosync Cluster 
Engine ('2.3.6'): started and ready to provide service.
Mar 10 19:36:24 [31037] ha-idg-1 corosync info    [MAIN  ] Corosync built-in 
features: debug testagents augeas systemd pie relro bindnow
Mar 10 19:36:24 [31037] ha-idg-1 corosync notice  [TOTEM ] Initializing 
transport (UDP/IP Multicast).
Mar 10 19:36:24 [31037] ha-idg-1 corosync notice  [TOTEM ] Initializing 
transmit/receive security (NSS) crypto: aes256 hash: sha1
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice  [TOTEM ] The network 
interface [192.168.100.10] is now up.
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice  [SERV  ] Service engine 
loaded: corosync configuration map access [0]
Mar 10 19:36:25 [31037] ha-idg-1 corosync info    [QB    ] server name: cmap
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice  [SERV  ] Service engine 
loaded: corosync configuration service [1]
Mar 10 19:36:25 [31037] ha-idg-1 corosync info    [QB    ] server name: cfg
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice  [SERV  ] Service engine 
loaded: corosync cluster closed process group service v1.01 [2]
Mar 10 19:36:25 [31037] ha-idg-1 corosync info    [QB    ] server name: cpg
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice  [SERV  ] Service engine 
loaded: corosync profile loading service [4]
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice  [QUORUM] Using quorum 
provider corosync_votequorum
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice  [QUORUM] This node is 
within the primary component and will provide service.
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice  [QUORUM] Members[0]:
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice  [SERV  ] Service engine 
loaded: corosync vote quorum service v1.0 [5]
Mar 10 19:36:25 [31037] ha-idg-1 corosync info    [QB    ] server name: 
votequorum
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice  [SERV  ] Service engine 
loaded: corosync cluster quorum service v0.1 [3]
Mar 10 19:36:25 [31037] ha-idg-1 corosync info    [QB    ] server name: 
quorum
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice  [TOTEM ] A new membership 
(192.168.100.10:2340) was formed. Members joined: 1084777482
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice  [QUORUM] Members[1]: 
1084777482
Mar 10 19:36:25 [31037] ha-idg-1 corosync notice  [MAIN  ] Completed service 
synchronization, ready to provide service.
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:   notice: main:    Starting 
Pacemaker 1.1.24+20210811.f5abda0ee-3.27.1 | build=1.1.24+20210811.f5abda0ee 
features: generated-manpages agent-manp
ages ncurses libqb-logging libqb-ipc lha-fencing systemd nagios 
corosync-native atomic-attrd snmp libesmtp acls cibsecrets
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: main:    Maximum core 
file size is: 18446744073709551615
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: qb_ipcs_us_publish: 
server name: pacemakerd
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: 
pcmk__ipc_is_authentic_process_active:   Could not connect to lrmd IPC: 
Connection refused
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: 
pcmk__ipc_is_authentic_process_active:   Could not connect to cib_ro IPC: 
Connection refused
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: 
pcmk__ipc_is_authentic_process_active:   Could not connect to crmd IPC: 
Connection refused
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: 
pcmk__ipc_is_authentic_process_active:   Could not connect to attrd IPC: 
Connection refused
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: 
pcmk__ipc_is_authentic_process_active:   Could not connect to pengine IPC: 
Connection refused
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: 
pcmk__ipc_is_authentic_process_active:   Could not connect to stonith-ng 
IPC: Connection refused
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: corosync_node_name: 
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:   notice: get_node_name: 
Could not obtain a node name for corosync nodeid 1084777482
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: crm_get_peer: 
Created entry 3c2499de-58a8-44f7-bf1e-03ff1fbec774/0x1456550 for node 
(null)/1084777482 (1 total)
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: crm_get_peer:    Node 
1084777482 has uuid 1084777482
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: crm_update_peer_proc: 
cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:   notice: 
cluster_connect_quorum:  Quorum acquired
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: corosync_node_name: 
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:   notice: get_node_name: 
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: crm_get_peer:    Node 
1084777482 is now known as ha-idg-1
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: start_child: 
Using uid=90 and group=90 for process cib
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: start_child: 
Forked child 31045 for process cib
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: start_child: 
Forked child 31046 for process stonith-ng
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: start_child: 
Forked child 31047 for process lrmd
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: start_child: 
Using uid=90 and group=90 for process attrd
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: start_child: 
Forked child 31048 for process attrd
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: start_child: 
Using uid=90 and group=90 for process pengine
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: start_child: 
Forked child 31049 for process pengine
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: start_child: 
Using uid=90 and group=90 for process crmd
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: start_child: 
Forked child 31050 for process crmd
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: main:    Starting 
mainloop
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: 
pcmk_quorum_notification:        Quorum retained | membership=2340 members=1
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:   notice: 
crm_update_peer_state_iter:      Node ha-idg-1 state is now member | 
nodeid=1084777482 previous=unknown source=pcmk_quorum_notification
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: pcmk_cpg_membership: 
Group pacemakerd event 0: node 1084777482 pid 31044 joined via cpg_join
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: pcmk_cpg_membership: 
Group pacemakerd event 0: ha-idg-1 (node 1084777482 pid 31044) is member
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: mcp_cpg_deliver: 
Ignoring process list sent by peer for local node
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: mcp_cpg_deliver: 
Ignoring process list sent by peer for local node
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: mcp_cpg_deliver: 
Ignoring process list sent by peer for local node
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: mcp_cpg_deliver: 
Ignoring process list sent by peer for local node
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: mcp_cpg_deliver: 
Ignoring process list sent by peer for local node
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: mcp_cpg_deliver: 
Ignoring process list sent by peer for local node
Mar 10 19:36:25 [31044] ha-idg-1 pacemakerd:     info: mcp_cpg_deliver: 
Ignoring process list sent by peer for local node
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: crm_log_init: 
Changed active directory to /var/lib/pacemaker/cores
Mar 10 19:36:25 [31049] ha-idg-1    pengine:     info: crm_log_init: 
Changed active directory to /var/lib/pacemaker/cores
Mar 10 19:36:25 [31049] ha-idg-1    pengine:     info: qb_ipcs_us_publish: 
server name: pengine
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: get_cluster_type: 
Verifying cluster type: 'corosync'
Mar 10 19:36:25 [31048] ha-idg-1      attrd:     info: crm_log_init: 
Changed active directory to /var/lib/pacemaker/cores
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: get_cluster_type: 
Assuming an active 'corosync' cluster
Mar 10 19:36:25 [31049] ha-idg-1    pengine:     info: main:    Starting 
pengine
Mar 10 19:36:25 [31048] ha-idg-1      attrd:     info: main:    Starting up
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: retrieveCib: 
Reading cluster configuration file /var/lib/pacemaker/cib/cib.xml (digest: 
/var/lib/pacemaker/cib/cib.xml.sig)
Mar 10 19:36:25 [31048] ha-idg-1      attrd:     info: get_cluster_type: 
Verifying cluster type: 'corosync'
Mar 10 19:36:25 [31048] ha-idg-1      attrd:     info: get_cluster_type: 
Assuming an active 'corosync' cluster
Mar 10 19:36:25 [31048] ha-idg-1      attrd:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: corosync
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:     info: crm_log_init: 
Changed active directory to /var/lib/pacemaker/cores
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:     info: get_cluster_type: 
Verifying cluster type: 'corosync'
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:     info: get_cluster_type: 
Assuming an active 'corosync' cluster
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: corosync
Mar 10 19:36:25 [31047] ha-idg-1       lrmd:     info: crm_log_init: 
Changed active directory to /var/lib/pacemaker/cores
Mar 10 19:36:25 [31047] ha-idg-1       lrmd:     info: qb_ipcs_us_publish: 
server name: lrmd
Mar 10 19:36:25 [31047] ha-idg-1       lrmd:     info: main:    Starting
Mar 10 19:36:25 [31050] ha-idg-1       crmd:     info: crm_log_init: 
Changed active directory to /var/lib/pacemaker/cores
Mar 10 19:36:25 [31050] ha-idg-1       crmd:     info: main:    CRM Git 
Version: 1.1.24+20210811.f5abda0ee-3.27.1 (1.1.24+20210811.f5abda0ee)
Mar 10 19:36:25 [31050] ha-idg-1       crmd:     info: get_cluster_type: 
Verifying cluster type: 'corosync'
Mar 10 19:36:25 [31050] ha-idg-1       crmd:     info: get_cluster_type: 
Assuming an active 'corosync' cluster
Mar 10 19:36:25 [31050] ha-idg-1       crmd:  warning: 
log_deprecation_warnings:        Compile-time support for crm_mon SNMP 
options is deprecated and will be removed in a future release (configure 
alerts instead)
Mar 10 19:36:25 [31050] ha-idg-1       crmd:  warning: 
log_deprecation_warnings:        Compile-time support for crm_mon SMTP 
options is deprecated and will be removed in a future release (configure 
alerts instead)
Mar 10 19:36:25 [31050] ha-idg-1       crmd:     info: do_log:  Input 
I_STARTUP received in state S_STARTING from crmd_init
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: 
validate_with_relaxng:   Creating RNG parser context
Mar 10 19:36:25 [31048] ha-idg-1      attrd:     info: corosync_node_name: 
Unable to get node name for nodeid 1084777482       ⇐========= this happens 
quite often
Mar 10 19:36:25 [31048] ha-idg-1      attrd:   notice: get_node_name: 
Could not obtain a node name for corosync nodeid 1084777482
Mar 10 19:36:25 [31048] ha-idg-1      attrd:     info: crm_get_peer: 
Created entry c1bd522c-34da-49b3-97cb-22fd4580959b/0x109e210 for node 
(null)/1084777482 (1 total)
Mar 10 19:36:25 [31048] ha-idg-1      attrd:     info: crm_get_peer:    Node 
1084777482 has uuid 1084777482
Mar 10 19:36:25 [31048] ha-idg-1      attrd:     info: crm_update_peer_proc: 
cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online
Mar 10 19:36:25 [31048] ha-idg-1      attrd:   notice: 
crm_update_peer_state_iter:      Node (null) state is now member | 
nodeid=1084777482 previous=unknown source=crm_update_peer_proc
Mar 10 19:36:25 [31048] ha-idg-1      attrd:     info: 
init_cs_connection_once: Connection to 'corosync': established
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:     info: corosync_node_name: 
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:   notice: get_node_name: 
Could not obtain a node name for corosync nodeid 1084777482
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:     info: crm_get_peer: 
Created entry 1d232d33-d274-415d-be94-765dc1b4e1e4/0x9478d0 for node 
(null)/1084777482 (1 total)
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:     info: crm_get_peer:    Node 
1084777482 has uuid 1084777482
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:     info: crm_update_peer_proc: 
cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:   notice: 
crm_update_peer_state_iter:      Node (null) state is now member | 
nodeid=1084777482 previous=unknown source=crm_update_peer_proc
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: startCib:        CIB 
Initialization completed successfully
Mar 10 19:36:25 [31045] ha-idg-1        cib:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: corosync
Mar 10 19:36:25 [31048] ha-idg-1      attrd:     info: corosync_node_name: 
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31048] ha-idg-1      attrd:   notice: get_node_name: 
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:25 [31048] ha-idg-1      attrd:     info: crm_get_peer:    Node 
1084777482 is now known as ha-idg-1
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:     info: corosync_node_name: 
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:   notice: get_node_name: 
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:     info: 
init_cs_connection_once: Connection to 'corosync': established
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: corosync_node_name: 
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31045] ha-idg-1        cib:   notice: get_node_name: 
Could not obtain a node name for corosync nodeid 1084777482
Mar 10 19:36:25 [31048] ha-idg-1      attrd:     info: main:    Cluster 
connection active
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: crm_get_peer: 
Created entry 7c2b1d3d-0ab6-4fa6-887c-5d01e5927a67/0x147af10 for node 
(null)/1084777482 (1 total)
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: crm_get_peer:    Node 
1084777482 has uuid 1084777482
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: crm_update_peer_proc: 
cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online
Mar 10 19:36:25 [31045] ha-idg-1        cib:   notice: 
crm_update_peer_state_iter:      Node (null) state is now member | 
nodeid=1084777482 previous=unknown source=crm_update_peer_proc
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: 
init_cs_connection_once: Connection to 'corosync': established
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:     info: corosync_node_name: 
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:   notice: get_node_name: 
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:25 [31046] ha-idg-1 stonith-ng:     info: crm_get_peer:    Node 
1084777482 is now known as ha-idg-1
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: corosync_node_name: 
Unable to get node name for nodeid 1084777482
Mar 10 19:36:25 [31045] ha-idg-1        cib:   notice: get_node_name: 
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: crm_get_peer:    Node 
1084777482 is now known as ha-idg-1
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: qb_ipcs_us_publish: 
server name: cib_ro
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: qb_ipcs_us_publish: 
server name: cib_rw
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: qb_ipcs_us_publish: 
server name: cib_shm
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: cib_init: 
Starting cib mainloop
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: pcmk_cpg_membership: 
Group cib event 0: node 1084777482 pid 31045 joined via cpg_join
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: pcmk_cpg_membership: 
Group cib event 0: ha-idg-1 (node 1084777482 pid 31045) is member
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: cib_file_backup: 
Archived previous version as /var/lib/pacemaker/cib/cib-34.raw
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: 
cib_file_write_with_digest:      Wrote version 7.29548.0 of the CIB to disk 
(digest: 03b4ec65319cef255d43fc1ec9d285a5)
Mar 10 19:36:25 [31045] ha-idg-1        cib:     info: 
cib_file_write_with_digest:      Reading cluster configuration file 
/var/lib/pacemaker/cib/cib.MBy2v0 (digest: 
/var/lib/pacemaker/cib/cib.nDn0X9)
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: do_cib_control:  CIB 
connection established
Mar 10 19:36:26 [31050] ha-idg-1       crmd:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: corosync
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: corosync_node_name: 
Unable to get node name for nodeid 1084777482
Mar 10 19:36:26 [31050] ha-idg-1       crmd:   notice: get_node_name: 
Could not obtain a node name for corosync nodeid 1084777482
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: crm_get_peer: 
Created entry 873262c1-ede0-4ba7-97e6-53ead0a6d7b0/0x1613910 for node 
(null)/1084777482 (1 total)
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: crm_get_peer:    Node 
1084777482 has uuid 1084777482
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: crm_update_peer_proc: 
cluster_connect_cpg: Node (null)[1084777482] - corosync-cpg is now online
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: corosync_node_name: 
Unable to get node name for nodeid 1084777482
Mar 10 19:36:26 [31050] ha-idg-1       crmd:   notice: get_node_name: 
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: 
init_cs_connection_once: Connection to 'corosync': established
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: corosync_node_name: 
Unable to get node name for nodeid 1084777482
Mar 10 19:36:26 [31050] ha-idg-1       crmd:   notice: get_node_name: 
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: crm_get_peer:    Node 
1084777482 is now known as ha-idg-1
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: peer_update_callback: 
Cluster node ha-idg-1 is now in unknown state      ⇐===== is that the 
problem ?
Mar 10 19:36:26 [31048] ha-idg-1      attrd:     info: attrd_erase_attrs: 
Clearing transient attributes from CIB | 
xpath=//node_state[@uname='ha-idg-1']/transient_attributes
Mar 10 19:36:26 [31048] ha-idg-1      attrd:     info: 
attrd_start_election_if_needed:  Starting an election to determine the 
writer
Mar 10 19:36:26 [31045] ha-idg-1        cib:     info: cib_process_request: 
Forwarding cib_delete operation for section 
//node_state[@uname='ha-idg-1']/transient_attributes to all 
(origin=local/attrd/2)
Mar 10 19:36:26 [31048] ha-idg-1      attrd:     info: corosync_node_name: 
Unable to get node name for nodeid 1084777482
Mar 10 19:36:26 [31048] ha-idg-1      attrd:   notice: get_node_name: 
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:26 [31048] ha-idg-1      attrd:     info: main:    CIB 
connection active
Mar 10 19:36:26 [31048] ha-idg-1      attrd:     info: qb_ipcs_us_publish: 
server name: attrd
Mar 10 19:36:26 [31048] ha-idg-1      attrd:     info: main:    Accepting 
attribute updates
Mar 10 19:36:26 [31048] ha-idg-1      attrd:     info: pcmk_cpg_membership: 
Group attrd event 0: node 1084777482 pid 31048 joined via cpg_join
Mar 10 19:36:26 [31048] ha-idg-1      attrd:     info: pcmk_cpg_membership: 
Group attrd event 0: ha-idg-1 (node 1084777482 pid 31048) is member
Mar 10 19:36:26 [31045] ha-idg-1        cib:     info: corosync_node_name: 
Unable to get node name for nodeid 1084777482
Mar 10 19:36:26 [31045] ha-idg-1        cib:   notice: get_node_name: 
Defaulting to uname -n for the local corosync node name
Mar 10 19:36:26 [31048] ha-idg-1      attrd:     info: election_check: 
election-attrd won by local node
Mar 10 19:36:26 [31048] ha-idg-1      attrd:   notice: attrd_declare_winner: 
Recorded local node as attribute writer (was unset)
Mar 10 19:36:26 [31048] ha-idg-1      attrd:     info: attrd_peer_update: 
Setting #attrd-protocol[ha-idg-1]: (null) -> 2 from ha-idg-1
Mar 10 19:36:26 [31048] ha-idg-1      attrd:     info: write_attribute: 
Processed 1 private change for #attrd-protocol, id=n/a, set=n/a
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng:     info: setup_cib: 
Watching for stonith topology changes
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng:     info: qb_ipcs_us_publish: 
server name: stonith-ng
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng:     info: main:    Starting 
stonith-ng mainloop
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng:     info: pcmk_cpg_membership: 
Group stonith-ng event 0: node 1084777482 pid 31046 joined via cpg_join
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng:     info: pcmk_cpg_membership: 
Group stonith-ng event 0: ha-idg-1 (node 1084777482 pid 31046) is member
Mar 10 19:36:26 [31050] ha-idg-1       crmd:   notice: 
cluster_connect_quorum:  Quorum acquired
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng:     info: init_cib_cache_cb: 
Updating device list from the cib: init
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng:     info: cib_devices_update: 
Updating devices to version 7.29548.0
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng:   notice: unpack_config:   On 
loss of CCM Quorum: Ignore
Mar 10 19:36:26 [31045] ha-idg-1        cib:     info: cib_process_request: 
Completed cib_delete operation for section 
//node_state[@uname='ha-idg-1']/transient_attributes: OK (rc=0, 
origin=ha-idg-1/attrd/2, version=7.29548.0)
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: do_ha_control: 
Connected to the cluster
Mar 10 19:36:26 [31045] ha-idg-1        cib:     info: cib_process_request: 
Forwarding cib_modify operation for section nodes to all 
(origin=local/crmd/3)
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: lrmd_ipc_connect: 
Connecting to lrmd
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: do_lrm_control:  LRM 
connection established
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: do_started: 
Delaying start, no membership data (0000000000100000)
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: 
pcmk_quorum_notification:        Quorum retained | membership=2340 members=1
Mar 10 19:36:26 [31050] ha-idg-1       crmd:   notice: 
crm_update_peer_state_iter:      Node ha-idg-1 state is now member | 
nodeid=1084777482 previous=unknown source=pcmk_quorum_notification
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: peer_update_callback: 
Cluster node ha-idg-1 is now member (was in unknown state)
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: do_started: 
Delaying start, Config not read (0000000000000040)
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: pcmk_cpg_membership: 
Group crmd event 0: node 1084777482 pid 31050 joined via cpg_join
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: pcmk_cpg_membership: 
Group crmd event 0: ha-idg-1 (node 1084777482 pid 31050) is member
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: do_started: 
Delaying start, Config not read (0000000000000040)
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: do_started: 
Delaying start, Config not read (0000000000000040)
Mar 10 19:36:26 [31045] ha-idg-1        cib:     info: cib_process_request: 
Completed cib_modify operation for section nodes: OK (rc=0, 
origin=ha-idg-1/crmd/3, version=7.29548.0)
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: qb_ipcs_us_publish: 
server name: crmd
Mar 10 19:36:26 [31050] ha-idg-1       crmd:   notice: do_started:      The 
local CRM is operational    ⇐============================ looks pretty good
Mar 10 19:36:26 [31050] ha-idg-1       crmd:     info: do_log:  Input 
I_PENDING received in state S_STARTING from do_started
Mar 10 19:36:26 [31050] ha-idg-1       crmd:   notice: do_state_transition: 
State transition S_STARTING -> S_PENDING | input=I_PENDING 
cause=C_FSA_INTERNAL origin=do_started
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng:     info: action_synced_wait: 
Managed fence_ilo2_metadata_1 process 31052 exited with rc=0
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng:     info: 
stonith_device_register: Added 'fence_ilo_ha-idg-2' to the device list (1 
active devices)
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng:     info: action_synced_wait: 
Managed fence_ilo4_metadata_1 process 31054 exited with rc=0
Mar 10 19:36:26 [31046] ha-idg-1 stonith-ng:     info: 
stonith_device_register: Added 'fence_ilo_ha-idg-1' to the device list (2 
active devices)
Mar 10 19:36:28 [31050] ha-idg-1       crmd:     info: 
te_trigger_stonith_history_sync: Fence history will be synchronized 
cluster-wide within 30 seconds
Mar 10 19:36:28 [31050] ha-idg-1       crmd:   notice: te_connect_stonith: 
Fencer successfully connected
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng:   notice: handle_request: 
Received manual confirmation that ha-idg-1 is fenced 
<===================== seems to be my "stonith_admin -C"
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng:   notice: 
initiate_remote_stonith_op:      Initiating manual confirmation for 
ha-idg-1: 23926653-7baa-44b8-ade3-5ee8468f3db6
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng:   notice: stonith_manual_ack: 
Injecting manual confirmation that ha-idg-1 is safely off/down
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng:   notice: remote_op_done: 
Operation 'off' targeting ha-idg-1 on a human for 
stonith_admin.31555 at ha-idg-1.23926653: OK
Mar 10 19:36:34 [31050] ha-idg-1       crmd:     info: exec_alert_list: 
Sending fencing alert via smtp_alert to informatic.idg at helmholtz-muenchen.de
Mar 10 19:36:34 [31047] ha-idg-1       lrmd:     info: 
process_lrmd_alert_exec: Executing alert smtp_alert for 
6bb5a831-e90c-4b0b-8783-0092a26a1e6c
Mar 10 19:36:34 [31050] ha-idg-1       crmd:     crit: 
tengine_stonith_notify:  We were allegedly just fenced by a human for 
ha-idg-1!      <=====================  what does that mean ? I didn't fence 
it
Mar 10 19:36:34 [31050] ha-idg-1       crmd:     info: crm_xml_cleanup: 
Cleaning up memory from libxml2
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:  warning: pcmk_child_exit: 
Shutting cluster down because crmd[31050] had fatal failure 
<=======================  ???
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:   notice: pcmk_shutdown_worker: 
Shutting down Pacemaker
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:   notice: stop_child: 
Stopping pengine | sent signal 15 to process 31049
Mar 10 19:36:34 [31049] ha-idg-1    pengine:   notice: crm_signal_dispatch: 
Caught 'Terminated' signal | 15 (invoking handler)
Mar 10 19:36:34 [31049] ha-idg-1    pengine:     info: qb_ipcs_us_withdraw: 
withdrawing server sockets
Mar 10 19:36:34 [31049] ha-idg-1    pengine:     info: crm_xml_cleanup: 
Cleaning up memory from libxml2
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:     info: pcmk_child_exit: 
pengine[31049] exited with status 0 (OK)
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:   notice: stop_child: 
Stopping attrd | sent signal 15 to process 31048
Mar 10 19:36:34 [31048] ha-idg-1      attrd:   notice: crm_signal_dispatch: 
Caught 'Terminated' signal | 15 (invoking handler)
Mar 10 19:36:34 [31048] ha-idg-1      attrd:     info: main:    Shutting 
down attribute manager
Mar 10 19:36:34 [31048] ha-idg-1      attrd:     info: qb_ipcs_us_withdraw: 
withdrawing server sockets
Mar 10 19:36:34 [31048] ha-idg-1      attrd:     info: attrd_cib_destroy_cb: 
Connection disconnection complete
Mar 10 19:36:34 [31048] ha-idg-1      attrd:     info: crm_xml_cleanup: 
Cleaning up memory from libxml2
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:     info: pcmk_child_exit: 
attrd[31048] exited with status 0 (OK)
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:   notice: stop_child: 
Stopping lrmd | sent signal 15 to process 31047
Mar 10 19:36:34 [31047] ha-idg-1       lrmd:   notice: crm_signal_dispatch: 
Caught 'Terminated' signal | 15 (invoking handler)
Mar 10 19:36:34 [31047] ha-idg-1       lrmd:     info: lrmd_exit: 
Terminating with 0 clients
Mar 10 19:36:34 [31047] ha-idg-1       lrmd:     info: qb_ipcs_us_withdraw: 
withdrawing server sockets
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:     info: mcp_cpg_deliver: 
Ignoring process list sent by peer for local node
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:     info: mcp_cpg_deliver: 
Ignoring process list sent by peer for local node
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:     info: mcp_cpg_deliver: 
Ignoring process list sent by peer for local node
Mar 10 19:36:34 [31047] ha-idg-1       lrmd:     info: crm_xml_cleanup: 
Cleaning up memory from libxml2
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:     info: pcmk_child_exit: 
lrmd[31047] exited with status 0 (OK)
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:   notice: stop_child: 
Stopping stonith-ng | sent signal 15 to process 31046
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng:   notice: crm_signal_dispatch: 
Caught 'Terminated' signal | 15 (invoking handler)
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng:     info: stonith_shutdown: 
Terminating with 3 clients
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng:     info: 
cib_connection_destroy:  Connection to the CIB closed.
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:     info: mcp_cpg_deliver: 
Ignoring process list sent by peer for local node
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng:     info: qb_ipcs_us_withdraw: 
withdrawing server sockets
Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng:     info: crm_xml_cleanup: 
Cleaning up memory from libxml2
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:     info: pcmk_child_exit: 
stonith-ng[31046] exited with status 0 (OK)
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:   notice: stop_child: 
Stopping cib | sent signal 15 to process 31045
Mar 10 19:36:34 [31045] ha-idg-1        cib:   notice: crm_signal_dispatch: 
Caught 'Terminated' signal | 15 (invoking handler)
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: cib_shutdown: 
Disconnected 0 clients
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: cib_shutdown:    All 
clients disconnected (0)
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: terminate_cib: 
initiate_exit: Exiting from mainloop...
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: 
crm_cluster_disconnect:  Disconnecting from cluster infrastructure: corosync
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: 
terminate_cs_connection: Disconnecting from Corosync
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: 
terminate_cs_connection: No Quorum connection
Mar 10 19:36:34 [31045] ha-idg-1        cib:   notice: 
terminate_cs_connection: Disconnected from Corosync
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: 
crm_cluster_disconnect:  Disconnected from corosync
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: 
crm_cluster_disconnect:  Disconnecting from cluster infrastructure: corosync
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: 
terminate_cs_connection: Disconnecting from Corosync
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: 
cluster_disconnect_cpg:  No CPG connection
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: 
terminate_cs_connection: No Quorum connection
Mar 10 19:36:34 [31045] ha-idg-1        cib:   notice: 
terminate_cs_connection: Disconnected from Corosync
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: 
crm_cluster_disconnect:  Disconnected from corosync
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:     info: mcp_cpg_deliver: 
Ignoring process list sent by peer for local node
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: qb_ipcs_us_withdraw: 
withdrawing server sockets
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: qb_ipcs_us_withdraw: 
withdrawing server sockets
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: qb_ipcs_us_withdraw: 
withdrawing server sockets
Mar 10 19:36:34 [31045] ha-idg-1        cib:     info: crm_xml_cleanup: 
Cleaning up memory from libxml2
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:     info: pcmk_child_exit: 
cib[31045] exited with status 0 (OK)
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:   notice: pcmk_shutdown_worker: 
Shutdown complete
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:   notice: pcmk_shutdown_worker: 
Attempting to inhibit respawning after fatal error
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:     info: 
pcmk_exit_with_cluster:  Asking Corosync to shut down
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice  [CFG   ] Node 1084777482 
was shut down by sysadmin
Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:     info: crm_xml_cleanup: 
Cleaning up memory from libxml2
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice  [SERV  ] Unloading all 
Corosync service engines.
Mar 10 19:36:34 [31037] ha-idg-1 corosync info    [QB    ] withdrawing 
server sockets
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice  [SERV  ] Service engine 
unloaded: corosync vote quorum service v1.0
Mar 10 19:36:34 [31037] ha-idg-1 corosync info    [QB    ] withdrawing 
server sockets
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice  [SERV  ] Service engine 
unloaded: corosync configuration map access
Mar 10 19:36:34 [31037] ha-idg-1 corosync info    [QB    ] withdrawing 
server sockets
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice  [SERV  ] Service engine 
unloaded: corosync configuration service
Mar 10 19:36:34 [31037] ha-idg-1 corosync info    [QB    ] withdrawing 
server sockets
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice  [SERV  ] Service engine 
unloaded: corosync cluster closed process group service v1.01
Mar 10 19:36:34 [31037] ha-idg-1 corosync info    [QB    ] withdrawing 
server sockets
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice  [SERV  ] Service engine 
unloaded: corosync cluster quorum service v0.1
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice  [SERV  ] Service engine 
unloaded: corosync profile loading service
Mar 10 19:36:34 [31037] ha-idg-1 corosync notice  [MAIN  ] Corosync Cluster 
Engine exiting normally

Bernd

-- 
Bernd Lentes
System Administrator
Institute for Metabolism and Cell Death (MCD)
Building 25 - office 122
HelmholtzZentrum München
bernd.lentes at helmholtz-muenchen.de
phone: +49 89 3187 1241
       +49 89 3187 49123
fax:   +49 89 3187 2294
https://www.helmholtz-munich.de/en/mcd

Public key:
30 82 01 0a 02 82 01 01 00 b3 72 3e ce 2c 0a 6f 58 49 2c 92 23 c7 b9 c1 ff 
6c 3a 53 be f7 9e e9 24 b7 49 fa 3c e8 de 28 85 2c d3 ed f7 70 03 3f 4d 82 
fc cc 96 4f 18 27 1f df 25 b3 13 00 db 4b 1d ec 7f 1b cf f9 cd e8 5b 1f 11 
b3 a7 48 f8 c8 37 ed 41 ff 18 9f d7 83 51 a9 bd 86 c2 32 b3 d6 2d 77 ff 32 
83 92 67 9e ae ae 9c 99 ce 42 27 6f bf d8 c2 a1 54 fd 2b 6b 12 65 0e 8a 79 
56 be 53 89 70 51 02 6a eb 76 b8 92 25 2d 88 aa 57 08 42 ef 57 fb fe 00 71 
8e 90 ef b2 e3 22 f3 34 4f 7b f1 c4 b1 7c 2f 1d 6f bd c8 a6 a1 1f 25 f3 e4 
4b 6a 23 d3 d2 fa 27 ae 97 80 a3 f0 5a c4 50 4a 45 e3 45 4d 82 9f 8b 87 90 
d0 f9 92 2d a7 d2 67 53 e6 ae 1e 72 3e e9 e0 c9 d3 1c 23 e0 75 78 4a 45 60 
94 f8 e3 03 0b 09 85 08 d0 6c f3 ff ce fa 50 25 d9 da 81 7b 2a dc 9e 28 8b 
83 04 b4 0a 9f 37 b8 ac 58 f1 38 43 0e 72 af 02 03 01 00 01
(null)

Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH), Ingolstadter Landstr. 1, 85764 Neuherberg, www.helmholtz-munich.de. Geschaeftsfuehrung:  Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther, Daniela Sommer (kom.) | Aufsichtsratsvorsitzende: Prof. Dr. Veronika von Messling | Registergericht: Amtsgericht Muenchen  HRB 6466 | USt-IdNr. DE 129521671





More information about the Users mailing list