[ClusterLabs] Cluster Stopped, No Messages?
Digimer
lists at alteeve.ca
Fri May 28 13:42:41 EDT 2021
Shared storage is not what triggers the need for fencing. Coordinating
actions is what triggers the need. Specifically; If you can run resource
on both/all nodes at the same time, you don't need HA. If you can't, you
need fencing.
digimer
On 2021-05-28 1:19 p.m., Eric Robinson wrote:
> There is no fencing agent on this cluster and no shared storage.
>
> -Eric
>
> *From:* Strahil Nikolov <hunter86_bg at yahoo.com>
> *Sent:* Friday, May 28, 2021 10:08 AM
> *To:* Cluster Labs - All topics related to open-source clustering
> welcomed <users at clusterlabs.org>; Eric Robinson <eric.robinson at psmnv.com>
> *Subject:* Re: [ClusterLabs] Cluster Stopped, No Messages?
>
> what is your fencing agent ?
>
> Best Regards,
>
> Strahil Nikolov
>
> On Thu, May 27, 2021 at 20:52, Eric Robinson
>
> <eric.robinson at psmnv.com <mailto:eric.robinson at psmnv.com>> wrote:
>
> We found one of our cluster nodes down this morning. The server was
> up but cluster services were not running. Upon examination of the
> logs, we found that the cluster just stopped around 9:40:31 and then
> I started it up manually (pcs cluster start) at 11:49:48. I can’t
> imagine that Pacemaker just randomly terminates. Any thoughts why it
> would behave this way?
>
>
>
>
>
> May 27 09:25:31 [92170] 001store01a pengine: notice:
> process_pe_message: Calculated transition 91482, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-756.bz2
>
> May 27 09:25:31 [92171] 001store01a crmd: info:
> do_state_transition: State transition S_POLICY_ENGINE ->
> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> origin=handle_response
>
> May 27 09:25:31 [92171] 001store01a crmd: info:
> do_te_invoke: Processing graph 91482
> (ref=pe_calc-dc-1622121931-124396) derived from
> /var/lib/pacemaker/pengine/pe-input-756.bz2
>
> May 27 09:25:31 [92171] 001store01a crmd: notice:
> run_graph: Transition 91482 (Complete=0, Pending=0, Fired=0,
> Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-756.bz2): Complete
>
> May 27 09:25:31 [92171] 001store01a crmd: info:
> do_log: Input I_TE_SUCCESS received in state
> S_TRANSITION_ENGINE from notify_crmd
>
> May 27 09:25:31 [92171] 001store01a crmd: notice:
> do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE
> | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
>
> May 27 09:40:31 [92171] 001store01a crmd: info:
> crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped
> (900000ms)
>
> May 27 09:40:31 [92171] 001store01a crmd: notice:
> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE |
> input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped
>
> May 27 09:40:31 [92171] 001store01a crmd: info:
> do_state_transition: Progressed to state S_POLICY_ENGINE after
> C_TIMER_POPPED
>
> May 27 09:40:31 [92170] 001store01a pengine: info:
> process_pe_message: Input has not changed since last time, not
> saving to disk
>
> May 27 09:40:31 [92170] 001store01a pengine: info:
> determine_online_status: Node 001store01a is online
>
> May 27 09:40:31 [92170] 001store01a pengine: info:
> determine_op_status: Operation monitor found resource
> p_pure-ftpd-itls active on 001store01a
>
> May 27 09:40:31 [92170] 001store01a pengine: warning:
> unpack_rsc_op_failure: Processing failed op monitor for
> p_vip_ftpclust01 on 001store01a: unknown error (1)
>
> May 27 09:40:31 [92170] 001store01a pengine: info:
> determine_op_status: Operation monitor found resource
> p_pure-ftpd-etls active on 001store01a
>
> May 27 09:40:31 [92170] 001store01a pengine: info:
> unpack_node_loop: Node 1 is already processed
>
> May 27 09:40:31 [92170] 001store01a pengine: info:
> unpack_node_loop: Node 1 is already processed
>
> May 27 09:40:31 [92170] 001store01a pengine: info:
> common_print: p_vip_ftpclust01
> (ocf::heartbeat:IPaddr2): Started 001store01a
>
> May 27 09:40:31 [92170] 001store01a pengine: info:
> common_print: p_replicator (systemd:pure-replicator):
> Started 001store01a
>
> May 27 09:40:31 [92170] 001store01a pengine: info:
> common_print: p_pure-ftpd-etls
> (systemd:pure-ftpd-etls): Started 001store01a
>
> May 27 09:40:31 [92170] 001store01a pengine: info:
> common_print: p_pure-ftpd-itls
> (systemd:pure-ftpd-itls): Started 001store01a
>
> May 27 09:40:31 [92170] 001store01a pengine: info:
> LogActions: Leave p_vip_ftpclust01 (Started 001store01a)
>
> May 27 09:40:31 [92170] 001store01a pengine: info:
> LogActions: Leave p_replicator (Started 001store01a)
>
> May 27 09:40:31 [92170] 001store01a pengine: info:
> LogActions: Leave p_pure-ftpd-etls (Started 001store01a)
>
> May 27 09:40:31 [92170] 001store01a pengine: info:
> LogActions: Leave p_pure-ftpd-itls (Started 001store01a)
>
> May 27 09:40:31 [92170] 001store01a pengine: notice:
> process_pe_message: Calculated transition 91483, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-756.bz2
>
> May 27 09:40:31 [92171] 001store01a crmd: info:
> do_state_transition: State transition S_POLICY_ENGINE ->
> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> origin=handle_response
>
> May 27 09:40:31 [92171] 001store01a crmd: info:
> do_te_invoke: Processing graph 91483
> (ref=pe_calc-dc-1622122831-124397) derived from
> /var/lib/pacemaker/pengine/pe-input-756.bz2
>
> May 27 09:40:31 [92171] 001store01a crmd: notice:
> run_graph: Transition 91483 (Complete=0, Pending=0, Fired=0,
> Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-756.bz2): Complete
>
> May 27 09:40:31 [92171] 001store01a crmd: info:
> do_log: Input I_TE_SUCCESS received in state
> S_TRANSITION_ENGINE from notify_crmd
>
> May 27 09:40:31 [92171] 001store01a crmd: notice:
> do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE
> | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
>
> [10667] 001store01a.ccnva.local corosyncnotice [MAIN ] Corosync
> Cluster Engine ('2.4.3'): started and ready to provide service.
>
> [10667] 001store01a.ccnva.local corosyncinfo [MAIN ] Corosync
> built-in features: dbus systemd xmlconf qdevices qnetd snmp
> libcgroup pie relro bindnow
>
> [10667] 001store01a.ccnva.local corosyncnotice [TOTEM ]
> Initializing transport (UDP/IP Unicast).
>
> [10667] 001store01a.ccnva.local corosyncnotice [TOTEM ]
> Initializing transmit/receive security (NSS) crypto: none hash: none
>
> [10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] The network
> interface [10.51.14.40] is now up.
>
> [10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service
> engine loaded: corosync configuration map access [0]
>
> [10667] 001store01a.ccnva.local corosyncinfo [QB ] server
> name: cmap
>
> [10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service
> engine loaded: corosync configuration service [1]
>
> [10667] 001store01a.ccnva.local corosyncinfo [QB ] server
> name: cfg
>
> [10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service
> engine loaded: corosync cluster closed process group service v1.01 [2]
>
> [10667] 001store01a.ccnva.local corosyncinfo [QB ] server
> name: cpg
>
> [10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service
> engine loaded: corosync profile loading service [4]
>
> [10667] 001store01a.ccnva.local corosyncnotice [QUORUM] Using
> quorum provider corosync_votequorum
>
> [10667] 001store01a.ccnva.local corosyncnotice [VOTEQ ] Waiting for
> all cluster members. Current votes: 1 expected_votes: 2
>
> [10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service
> engine loaded: corosync vote quorum service v1.0 [5]
>
> [10667] 001store01a.ccnva.local corosyncinfo [QB ] server
> name: votequorum
>
> [10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service
> engine loaded: corosync cluster quorum service v0.1 [3]
>
> [10667] 001store01a.ccnva.local corosyncinfo [QB ] server
> name: quorum
>
> [10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] adding new
> UDPU member {10.51.14.40}
>
> [10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] adding new
> UDPU member {10.51.14.41}
>
> [10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] A new
> membership (10.51.14.40:6412) was formed. Members joined: 1
>
> [10667] 001store01a.ccnva.local corosyncnotice [VOTEQ ] Waiting for
> all cluster members. Current votes: 1 expected_votes: 2
>
> [10667] 001store01a.ccnva.local corosyncnotice [VOTEQ ] Waiting for
> all cluster members. Current votes: 1 expected_votes: 2
>
> [10667] 001store01a.ccnva.local corosyncnotice [VOTEQ ] Waiting for
> all cluster members. Current votes: 1 expected_votes: 2
>
> [10667] 001store01a.ccnva.local corosyncnotice [QUORUM] Members[1]: 1
>
> [10667] 001store01a.ccnva.local corosyncnotice [MAIN ] Completed
> service synchronization, ready to provide service.
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> notice: main: Starting Pacemaker 1.1.18-11.el7_5.3 |
> build=2b07d5c5a9 features: generated-manpages agent-manpages ncurses
> libqb-logging libqb-ipc systemd nagios corosync-native atomic-attrd
> acls
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> info: main: Maximum core file size is: 18446744073709551615
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> info: qb_ipcs_us_publish: server name: pacemakerd
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> info: crm_get_peer: Created entry
> 05ad8b08-25a3-4a2d-84cb-1fc355fb697c/0x55d844a446b0 for node
> 001store01a/1 (1 total)
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> info: crm_get_peer: Node 1 is now known as 001store01a
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> info: crm_get_peer: Node 1 has uuid 1
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> info: crm_update_peer_proc: cluster_connect_cpg: Node
> 001store01a[1] - corosync-cpg is now online
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> warning: cluster_connect_quorum: Quorum lost
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> info: crm_get_peer: Created entry
> 2f1f038e-9cc1-4a43-bab9-e7c91ca0bf3f/0x55d844a45ee0 for node
> 001store01b/2 (2 total)
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> info: crm_get_peer: Node 2 is now known as 001store01b
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> info: crm_get_peer: Node 2 has uuid 2
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> info: start_child: Using uid=189 and group=189 for process cib
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> info: start_child: Forked child 10682 for process cib
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> info: start_child: Forked child 10683 for process stonith-ng
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> info: start_child: Forked child 10684 for process lrmd
>
> May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd:
> info: start_child: Using uid=189 and group=189 for process attrd
>
>
>
>
>
>
>
> Disclaimer : This email and any files transmitted with it are
> confidential and intended solely for intended recipients. If you are
> not the named addressee you should not disseminate, distribute, copy
> or alter this email. Any views or opinions presented in this email
> are solely those of the author and might not represent those of
> Physician Select Management. Warning: Although Physician Select
> Management has taken reasonable precautions to ensure no viruses are
> present in this email, the company cannot accept responsibility for
> any loss or damage arising from the use of this email or attachments.
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> <https://lists.clusterlabs.org/mailman/listinfo/users>
>
> ClusterLabs home: https://www.clusterlabs.org/
> <https://www.clusterlabs.org/>
>
> Disclaimer : This email and any files transmitted with it are
> confidential and intended solely for intended recipients. If you are not
> the named addressee you should not disseminate, distribute, copy or
> alter this email. Any views or opinions presented in this email are
> solely those of the author and might not represent those of Physician
> Select Management. Warning: Although Physician Select Management has
> taken reasonable precautions to ensure no viruses are present in this
> email, the company cannot accept responsibility for any loss or damage
> arising from the use of this email or attachments.
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
--
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
More information about the Users
mailing list