<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div><div></div><div><div><br></div><div><div>Apr 23 11:04:59.462 testcluster-c1 pacemaker-execd [1295872] (async_action_complete) warning: virtual_ip_monitor_30000[724966] timed out after 20000ms</div><div>Apr 23 11:05:20.190 testcluster-c1 pacemaker-execd [1295872] (async_action_complete) warning: PingChecks_monitor_10000[724970] timed out after 30000ms</div><div>Apr 23 11:05:24.754 testcluster-c1 pacemaker-controld [1295875] (throttle_check_thresholds) notice: High CPU load detected: 108.660004</div><div>Apr 23 11:05:29.558 testcluster-c1 pacemaker-controld [1295875] (qb_ipcs_us_withdraw) info: withdrawing server sockets</div><div>Apr 23 11:05:30.270 testcluster-c1 pacemaker-controld [1295875] (pe_ipc_destroy) info: Connection to the scheduler released</div><div>Apr 23 11:05:34.982 testcluster-c1 pacemaker-controld [1295875] (tengine_stonith_connection_destroy) info: Fencing daemon disconnected</div><div>Apr 23 11:05:35.350 testcluster-c1 pacemaker-controld [1295875] (crmd_exit) notice: Forcing immediate exit with status 100 (Fatal error occurred, will not respawn)</div><div>Apr 23 11:05:35.538 testcluster-c1 pacemaker-controld [1295875] (crm_xml_cleanup) info: Cleaning up memory from libxml2</div><div>Apr 23 11:05:35.850 testcluster-c1 pacemaker-controld [1295875] (crm_exit) info: Exiting pacemaker-controld | with status 100</div><div>Apr 23 11:05:38.630 testcluster-c1 pacemaker-execd [1295872] (cancel_recurring_action) info: Cancelling ocf operation virtual_ip_monitor_30000</div><div>Apr 23 11:05:38.630 testcluster-c1 pacemaker-execd [1295872] (services_action_cancel) info: Terminating in-flight op virtual_ip_monitor_30000[724993] early because it was cancelled</div><div>Apr 23 11:05:38.610 testcluster-c1 pacemakerd [1295869] (pcmk_child_exit) warning: Shutting cluster down because pacemaker-controld[1295875] had fatal failure</div><div>Apr 23 11:05:38.630 testcluster-c1 pacemakerd [1295869] (pcmk_shutdown_worker) notice: Shutting down Pacemaker</div><div>Apr 23 11:05:38.630 testcluster-c1 pacemakerd [1295869] (stop_child) notice: Stopping pacemaker-schedulerd | sent signal 15 to process 1295874</div><div>Apr 23 11:05:38.634 testcluster-c1 pacemaker-execd [1295872] (async_action_complete) info: virtual_ip_monitor_30000[724993] terminated with signal 9 (Killed)</div><div>Apr 23 11:05:38.634 testcluster-c1 pacemaker-execd [1295872] (cancel_recurring_action) info: Cancelling ocf operation virtual_ip_monitor_30000</div><div>Apr 23 11:05:38.650 testcluster-c1 pacemaker-execd [1295872] (send_client_notify) warning: Could not notify client crmd: Bad file descriptor | rc=9</div><div>Apr 23 11:05:38.650 testcluster-c1 pacemaker-execd [1295872] (cancel_recurring_action) info: Cancelling systemd operation docker-services_status_60000</div><div>Apr 23 11:05:38.650 testcluster-c1 pacemaker-execd [1295872] (cancel_recurring_action) info: Cancelling ocf operation PingChecks_monitor_10000</div><div>Apr 23 11:05:38.650 testcluster-c1 pacemaker-execd [1295872] (services_action_cancel) info: Terminating in-flight op PingChecks_monitor_10000[724994] early because it was cancelled</div><div>Apr 23 11:05:38.650 testcluster-c1 pacemaker-execd [1295872] (async_action_complete) info: PingChecks_monitor_10000[724994] terminated with signal 9 (Killed)</div><div>Apr 23 11:05:38.650 testcluster-c1 pacemaker-execd [1295872] (cancel_recurring_action) info: Cancelling ocf operation PingChecks_monitor_10000</div><div>Apr 23 11:05:38.654 testcluster-c1 pacemaker-execd [1295872] (cancel_recurring_action) info: Cancelling ocf operation DrbdFS_monitor_20000</div><div>Apr 23 11:05:38.654 testcluster-c1 pacemaker-execd [1295872] (services_action_cancel) info: Terminating in-flight op DrbdFS_monitor_20000[724990] early because it was cancelled</div><div>Apr 23 11:05:38.662 testcluster-c1 pacemaker-execd [1295872] (async_action_complete) info: DrbdFS_monitor_20000[724990] terminated with signal 9 (Killed)</div><div>Apr 23 11:05:38.662 testcluster-c1 pacemaker-execd [1295872] (cancel_recurring_action) info: Cancelling ocf operation DrbdFS_monitor_20000</div><div>Apr 23 11:05:38.802 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__set_instance_roles) info: Promoting DrbdData:0 (Promoted testcluster-c1)</div><div>Apr 23 11:05:38.802 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__set_instance_roles) info: DrbdData-clone: Promoted 1 instances of a possible 1</div><div>Apr 23 11:05:38.806 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: --- 0.30.149 2</div><div>Apr 23 11:05:38.806 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: +++ 0.30.150 c0591b944f7de4d8a5b8ce02555e8983</div><div>Apr 23 11:05:38.806 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: -- /cib/status/node_state[@id='1']/transient_attributes[@id='1']</div><div>Apr 23 11:05:38.806 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib: @num_updates=150</div><div>Apr 23 11:05:38.806 testcluster-c1 pacemaker-schedulerd[1295874] (RecurringOp) info: Start recurring monitor (20s) for DrbdFS on testcluster-c1</div><div>Apr 23 11:05:38.806 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave DrbdData:0 (Promoted testcluster-c1)</div><div>Apr 23 11:05:38.806 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave DrbdData:1 (Unpromoted testcluster-c2)</div><div>Apr 23 11:05:38.806 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Recover DrbdFS ( testcluster-c1 )</div><div>Apr 23 11:05:38.806 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Restart docker-services ( testcluster-c1 ) due to required DrbdFS start</div><div>Apr 23 11:05:38.806 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Restart virtual_ip ( testcluster-c1 ) due to required docker-services start</div><div>Apr 23 11:05:38.806 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave PingChecks:0 (Started testcluster-c1)</div><div>Apr 23 11:05:38.810 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave PingChecks:1 (Started testcluster-c2)</div><div>Apr 23 11:05:39.186 testcluster-c1 pacemaker-schedulerd[1295874] (handle_pecalc_op) error: Couldn't send transition graph to peer, writing to /var/lib/pacemaker/pengine/pengine.graph.3KGsrP instead</div><div>Apr 23 11:05:39.442 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__log_transition_summary) notice: Calculated transition 495, saving inputs in /var/lib/pacemaker/pengine/pe-input-28.bz2</div><div>Apr 23 11:05:39.570 testcluster-c1 pacemaker-schedulerd[1295874] (handle_pecalc_op) info: Input has not changed since last time, not saving to disk</div><div>Apr 23 11:05:41.426 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_delete operation for section //node_state[@uname='testcluster-c1']/transient_attributes: OK (rc=0, origin=testcluster-c2/crmd/27, version=0.30.150)</div><div>Apr 23 11:05:41.998 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section cib: OK (rc=0, origin=testcluster-c2/crmd/29, version=0.30.150)</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_config) notice: On loss of quorum: Ignore</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (determine_online_status) info: Node testcluster-c1 is online</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (determine_online_status) info: Node testcluster-c2 is online</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: virtual_ip has failed 1 times on testcluster-c1</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of virtual_ip on testcluster-c1 because it expired | virtual_ip_clear_failcount_0</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: virtual_ip has failed 1 times on testcluster-c1</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of virtual_ip on testcluster-c1 because it expired | virtual_ip_clear_failcount_0</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: virtual_ip has failed 1 times on testcluster-c1</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of virtual_ip on testcluster-c1 because it expired | virtual_ip_clear_failcount_0</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_rsc_op) notice: Rescheduling virtual_ip_monitor_30000 after failure expired on testcluster-c1 | actual=1 expected=0 magic=2:1;43:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: PingChecks:0 has failed 1 times on testcluster-c1</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of PingChecks:0 on testcluster-c1 because it expired | PingChecks:0_clear_failcount_0</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: PingChecks:0 has failed 1 times on testcluster-c1</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of PingChecks:0 on testcluster-c1 because it expired | PingChecks:0_clear_failcount_0</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: PingChecks:0 has failed 1 times on testcluster-c1</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of PingChecks:0 on testcluster-c1 because it expired | PingChecks:0_clear_failcount_0</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_rsc_op) notice: Rescheduling PingChecks_monitor_10000 after failure expired on testcluster-c1 | actual=1 expected=0 magic=2:1;51:4:0:a085676a-5b2b-4928-83f2-c01315e5aec5</div><div>Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_rsc_op_failure) warning: Unexpected result (error: Process did not exit within specified timeout) was recorded for monitor of DrbdFS on testcluster-c1 at Apr 23 11:02:43 2024 | rc=1 id=DrbdFS_last_failure</div><div>_0</div><div>Apr 23 11:05:43.334 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section crm_config: OK (rc=0, origin=testcluster-c2/crmd/31, version=0.30.150)</div><div>Apr 23 11:05:43.334 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: DrbdFS has failed 1 times on testcluster-c1</div><div>Apr 23 11:05:43.334 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__threshold_reached) info: DrbdFS can fail 999999 more times on testcluster-c1 before reaching migration threshold (1000000)</div><div>Apr 23 11:05:44.022 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section crm_config: OK (rc=0, origin=testcluster-c2/crmd/33, version=0.30.150)</div><div>Apr 23 11:05:46.818 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section crm_config: OK (rc=0, origin=testcluster-c2/crmd/35, version=0.30.150)</div><div>Apr 23 11:05:47.342 testcluster-c1 pacemaker-based [1295870] (cib_process_replace) info: Digest matched on replace from testcluster-c2: c0591b944f7de4d8a5b8ce02555e8983</div><div>Apr 23 11:05:47.414 testcluster-c1 pacemaker-based [1295870] (cib_process_replace) info: Replaced 0.30.150 with 0.30.150 from testcluster-c2</div><div>Apr 23 11:05:48.786 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_replace operation for section 'all': OK (rc=0, origin=testcluster-c2/crmd/38, version=0.30.150)</div><div>Apr 23 11:05:48.794 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__set_instance_roles) info: Promoting DrbdData:0 (Promoted testcluster-c1)</div><div>Apr 23 11:05:48.794 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__set_instance_roles) info: DrbdData-clone: Promoted 1 instances of a possible 1</div><div>Apr 23 11:05:48.794 testcluster-c1 pacemaker-schedulerd[1295874] (RecurringOp) info: Start recurring monitor (20s) for DrbdFS on testcluster-c1</div><div>Apr 23 11:05:48.798 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave DrbdData:0 (Promoted testcluster-c1)</div><div>Apr 23 11:05:48.798 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave DrbdData:1 (Unpromoted testcluster-c2)</div><div>Apr 23 11:05:48.798 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Recover DrbdFS ( testcluster-c1 )</div><div>Apr 23 11:05:48.798 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Restart docker-services ( testcluster-c1 ) due to required DrbdFS start</div><div>Apr 23 11:05:48.798 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Restart virtual_ip ( testcluster-c1 ) due to required docker-services start</div><div>Apr 23 11:05:48.798 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave PingChecks:0 (Started testcluster-c1)</div><div>Apr 23 11:05:48.798 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave PingChecks:1 (Started testcluster-c2)</div><div>Apr 23 11:05:48.802 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__log_transition_summary) notice: Calculated transition 496, saving inputs in /var/lib/pacemaker/pengine/pe-input-29.bz2</div><div>Apr 23 11:05:48.806 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section nodes: OK (rc=0, origin=testcluster-c2/crmd/39, version=0.30.150)</div><div>Apr 23 11:05:48.818 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: --- 0.30.150 2</div><div>Apr 23 11:05:48.818 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: +++ 0.30.151 (null)</div><div>Apr 23 11:05:48.818 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: -- /cib/status/node_state[@id='2']/lrm[@id='2']</div><div>Apr 23 11:05:48.818 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib: @num_updates=151</div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_delete operation for section //node_state[@uname='testcluster-c2']/lrm: OK (rc=0, origin=testcluster-c2/crmd/40, version=0.30.151)</div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: --- 0.30.151 2</div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: +++ 0.30.152 (null)</div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib: @num_updates=152</div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib/status/node_state[@id='2']: @crm-debug-origin=do_lrm_query_internal</div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ /cib/status/node_state[@id='2']: <lrm id="2"/></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_resources></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_resource id="virtual_ip" type="IPaddr2" class="ocf" provider="heartbeat"></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_rsc_op id="virtual_ip_last_0" operation_key="virtual_ip_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.</div><div>11.0" transition-key="5:2:7:3f9900ac-185f-4dc1-972d-934d41ca6dc9" transition-magic="0:7;5:2:7:3f9900ac-185f-4dc1-972d-934d41ca6dc9" exit-reason="" on_node="testcluster-c2" call-id="21" rc-code="7" op-status="0" interval="0" last-rc-change="1713422197" exec-time="89" queue-time="0" op</div><div>-digest="3a768873ca9f51044dbe8d63366fb7ce"/></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ </lrm_resource></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_resource id="DrbdData" type="drbd" class="ocf" provider="linbit"></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_rsc_op id="DrbdData_last_0" operation_key="DrbdData_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.11.0" tr</div><div>ansition-key="12:1:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" transition-magic="0:0;12:1:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" exit-reason="" on_node="testcluster-c2" call-id="11" rc-code="0" op-status="0" interval="0" last-rc-change="1713422196" exec-time="985" queue-time="0" op-dige</div><div>st="c0e018b73fdf522b6cdd355e125af15e" op-force-restart=" drbd_resource " op-restart-digest="c0e018b73fdf522b6cdd355e125af15e"/></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_rsc_op id="DrbdData_monitor_60000" operation_key="DrbdData_monitor_60000" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_</div><div>set="3.11.0" transition-key="13:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" transition-magic="0:0;13:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" exit-reason="" on_node="testcluster-c2" call-id="13" rc-code="0" op-status="0" interval="60000" last-rc-change="1713422197" exec-time="129" que</div><div>ue-time="0" op-digest="db2fb3216fa7207fdc0b2bf31bbd09c1"/></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ </lrm_resource></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_resource id="docker-services" type="docker-services" class="systemd"></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_rsc_op id="docker-services_last_0" operation_key="docker-services_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_featu</div><div>re_set="3.11.0" transition-key="4:2:7:3f9900ac-185f-4dc1-972d-934d41ca6dc9" transition-magic="0:7;4:2:7:3f9900ac-185f-4dc1-972d-934d41ca6dc9" exit-reason="" on_node="testcluster-c2" call-id="17" rc-code="7" op-status="0" interval="0" last-rc-change="1713422197" exec-time="6" queue-ti</div><div>me="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ </lrm_resource></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_resource id="PingChecks" type="ping" class="ocf" provider="pacemaker"></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_rsc_op id="PingChecks_last_0" operation_key="PingChecks_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.11.0</div><div>" transition-key="50:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" transition-magic="0:0;50:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" exit-reason="" on_node="testcluster-c2" call-id="27" rc-code="0" op-status="0" interval="0" last-rc-change="1713422197" exec-time="9196" queue-time="0" op</div><div>-digest="1ac9a59a0519ceb6f9188233ae443791" op-force-restart=" use_fping failure_score options host_list multiplier name pidfile " op-restart-digest="e460710e452f12aebd0abcae1b26e5f9"/></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_rsc_op id="PingChecks_monitor_10000" operation_key="PingChecks_monitor_10000" operation="monitor" crm-debug-origin="build_active_RAs" crm_feat</div><div>ure_set="3.11.0" transition-key="51:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" transition-magic="0:0;51:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" exit-reason="" on_node="testcluster-c2" call-id="28" rc-code="0" op-status="0" interval="10000" last-rc-change="1713422207" exec-time="9281</div><div>" queue-time="0" op-digest="56e4de959c35addb7fded1aeb1ae4e30"/></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ </lrm_resource></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_resource id="DrbdFS" type="Filesystem" class="ocf" provider="heartbeat"></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_rsc_op id="DrbdFS_last_0" operation_key="DrbdFS_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.11.0" tr</div><div>ansition-key="4:1:7:3f9900ac-185f-4dc1-972d-934d41ca6dc9" transition-magic="0:7;4:1:7:3f9900ac-185f-4dc1-972d-934d41ca6dc9" exit-reason="" on_node="testcluster-c2" call-id="10" rc-code="7" op-status="0" interval="0" last-rc-change="1713422196" exec-time="121" queue-time="0" op-digest</div><div>="40b189eb6bf31d9b108771148daa913b"/></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ </lrm_resource></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ </lrm_resources></div><div>Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ </lrm></div><div>Apr 23 11:05:48.826 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=testcluster-c2/crmd/41, version=0.30.152)</div><div>Apr 23 11:05:48.830 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section nodes: OK (rc=0, origin=testcluster-c2/crmd/44, version=0.30.152)</div><div>Apr 23 11:05:48.834 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: --- 0.30.152 2</div><div>Apr 23 11:05:48.834 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: +++ 0.30.153 (null)</div><div>Apr 23 11:05:48.834 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib: @num_updates=153</div><div>Apr 23 11:05:48.834 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib/status/node_state[@id='1']: @crmd=offline, @crm-debug-origin=do_state_transition, @join=down</div><div>Apr 23 11:05:48.834 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib/status/node_state[@id='2']: @crm-debug-origin=do_state_transition</div><div>Apr 23 11:05:48.834 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=testcluster-c2/crmd/45, version=0.30.153)</div><div>Apr 23 11:05:49.346 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: --- 0.30.153 2</div><div>Apr 23 11:05:49.350 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: +++ 0.30.154 (null)</div><div>Apr 23 11:05:49.350 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib: @num_updates=154, @dc-uuid=2</div><div>Apr 23 11:05:50.922 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section cib: OK (rc=0, origin=testcluster-c2/crmd/46, version=0.30.154)</div><div>Apr 23 11:05:51.146 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_config) notice: On loss of quorum: Ignore</div><div>Apr 23 11:05:51.150 testcluster-c1 pacemaker-schedulerd[1295874] (determine_online_status) info: Node testcluster-c1 is online</div><div>Apr 23 11:05:51.150 testcluster-c1 pacemaker-schedulerd[1295874] (determine_online_status) info: Node testcluster-c2 is online</div><div>Apr 23 11:05:51.154 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: virtual_ip has failed 1 times on testcluster-c1</div><div>Apr 23 11:05:51.154 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of virtual_ip on testcluster-c1 because it expired | virtual_ip_clear_failcount_0</div><div>Apr 23 11:05:51.154 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: virtual_ip has failed 1 times on testcluster-c1</div><div>Apr 23 11:05:51.154 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of virtual_ip on testcluster-c1 because it expired | virtual_ip_clear_failcount_0</div><div>Apr 23 11:05:51.158 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: virtual_ip has failed 1 times on testcluster-c1</div><div>Apr 23 11:05:51.158 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: --- 0.30.154 2</div><div>Apr 23 11:05:51.158 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: +++ 0.30.155 (null)</div><div>Apr 23 11:05:51.158 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: -- /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='DrbdData']/lrm_rsc_op[@id='DrbdData_monitor_60000']</div><div>Apr 23 11:05:51.158 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib: @num_updates=155</div><div>Apr 23 11:05:51.158 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of virtual_ip on testcluster-c1 because it expired | virtual_ip_clear_failcount_0</div><div>Apr 23 11:05:51.158 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_rsc_op) notice: Rescheduling virtual_ip_monitor_30000 after failure expired on testcluster-c1 | actual=1 expected=0 magic=2:1;43:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9</div><div>Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: PingChecks:0 has failed 1 times on testcluster-c1</div><div>Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of PingChecks:0 on testcluster-c1 because it expired | PingChecks:0_clear_failcount_0</div><div>Apr 23 11:05:51.162 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_delete operation for section status: OK (rc=0, origin=testcluster-c2/crmd/49, version=0.30.155)</div><div>Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: PingChecks:0 has failed 1 times on testcluster-c1</div><div>Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of PingChecks:0 on testcluster-c1 because it expired | PingChecks:0_clear_failcount_0</div><div>Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: PingChecks:0 has failed 1 times on testcluster-c1</div><div>Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of PingChecks:0 on testcluster-c1 because it expired | PingChecks:0_clear_failcount_0</div><div>Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_rsc_op) notice: Rescheduling PingChecks_monitor_10000 after failure expired on testcluster-c1 | actual=1 expected=0 magic=2:1;51:4:0:a085676a-5b2b-4928-83f2-c01315e5aec5</div><div>Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_rsc_op_failure) warning: Unexpected result (error: Process did not exit within specified timeout) was recorded for monitor of DrbdFS on testcluster-c1 at Apr 23 11:02:43 2024 | rc=1 id=DrbdFS_last_failure</div><div>_0</div><div>Apr 23 11:05:51.942 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: DrbdFS has failed 1 times on testcluster-c1</div><div>Apr 23 11:05:51.942 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__threshold_reached) info: DrbdFS can fail 999999 more times on testcluster-c1 before reaching migration threshold (1000000)</div><div>Apr 23 11:05:51.962 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: --- 0.30.155 2</div><div>Apr 23 11:05:51.962 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: +++ 0.30.156 (null)</div><div>Apr 23 11:05:51.962 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib: @num_updates=156</div><div>Apr 23 11:05:51.962 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib/status/node_state[@id='2']: @crm-debug-origin=do_update_resource</div><div>Apr 23 11:05:51.962 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='DrbdData']/lrm_rsc_op[@id='DrbdData_last_0']: @operation_key=DrbdData_promote_0, @operation=promote, @crm-deb</div><div>ug-origin=do_update_resource, @transition-key=6:0:0:91e178a7-4098-4729-92b3-d7cf8fdc6e64, @transition-magic=-1:193;6:0:0:91e178a7-4098-4729-92b3-d7cf8fdc6e64, @call-id=-1, @rc-code=193, @op-status=-1, @last-rc-change=1713863139, @exec-time=0</div><div>Apr 23 11:05:51.974 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=testcluster-c2/crmd/50, version=0.30.156)</div><div>Apr 23 11:05:52.674 testcluster-c1 pacemaker-based [1295870] (cib_process_ping) info: Reporting our current digest to testcluster-c2: fddf1766600ffe74873bf40cd3a0cbcb for 0.30.156 (0x564d66568980 0)</div><div>Apr 23 11:05:53.074 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__set_instance_roles) info: Promoting DrbdData:0 (Promoted testcluster-c1)</div><div>Apr 23 11:05:53.074 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__set_instance_roles) info: DrbdData-clone: Promoted 1 instances of a possible 1</div><div>Apr 23 11:05:53.082 testcluster-c1 pacemaker-schedulerd[1295874] (RecurringOp) info: Start recurring monitor (20s) for DrbdFS on testcluster-c1</div><div>Apr 23 11:05:53.946 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave DrbdData:0 (Promoted testcluster-c1)</div><div>Apr 23 11:05:54.086 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave DrbdData:1 (Unpromoted testcluster-c2)</div><div>Apr 23 11:05:54.334 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Recover DrbdFS ( testcluster-c1 )</div><div>Apr 23 11:05:55.130 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Restart docker-services ( testcluster-c1 ) due to required DrbdFS start</div><div>Apr 23 11:05:55.906 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Restart virtual_ip ( testcluster-c1 ) due to required docker-services start</div><div>Apr 23 11:05:56.162 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave PingChecks:0 (Started testcluster-c1)</div><div>Apr 23 11:05:56.422 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave PingChecks:1 (Started testcluster-c2)</div><div>Apr 23 11:05:56.986 testcluster-c1 pacemaker-based [1295870] (cib_file_backup) info: Archived previous version as /var/lib/pacemaker/cib/cib-35.raw</div><div>Apr 23 11:05:57.446 testcluster-c1 pacemaker-based [1295870] (cib_file_write_with_digest) info: Wrote version 0.30.0 of the CIB to disk (digest: 6559c9de6a8bfa2291405312f21fe540)</div><div>Apr 23 11:05:57.590 testcluster-c1 pacemaker-based [1295870] (cib_file_write_with_digest) info: Reading cluster configuration file /var/lib/pacemaker/cib/cib.SOWrXW (digest: /var/lib/pacemaker/cib/cib.pDuq4R)</div><div>Apr 23 11:05:58.734 testcluster-c1 pacemaker-based [1295870] (cib_process_ping) info: Reporting our current digest to testcluster-c1: fddf1766600ffe74873bf40cd3a0cbcb for 0.30.156 (0x564d66568980 0)</div><div>Apr 23 11:05:59.138 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__log_transition_summary) notice: Calculated transition 497, saving inputs in /var/lib/pacemaker/pengine/pe-input-30.bz2</div><div>Apr 23 11:05:59.166 testcluster-c1 pacemaker-schedulerd[1295874] (crm_signal_dispatch) notice: Caught 'Terminated' signal | 15 (invoking handler)</div><div>Apr 23 11:05:59.174 testcluster-c1 pacemaker-schedulerd[1295874] (qb_ipcs_us_withdraw) info: withdrawing server sockets</div><div>Apr 23 11:05:59.178 testcluster-c1 pacemaker-schedulerd[1295874] (crm_xml_cleanup) info: Cleaning up memory from libxml2</div><div>Apr 23 11:05:59.198 testcluster-c1 pacemaker-schedulerd[1295874] (crm_exit) info: Exiting pacemaker-schedulerd | with status 0</div><div>Apr 23 11:06:00.486 testcluster-c1 pacemakerd [1295869] (pcmk_child_exit) info: pacemaker-schedulerd[1295874] exited with status 0 (OK)</div><div>Apr 23 11:06:00.546 testcluster-c1 pacemaker-attrd [1295873] (crm_signal_dispatch) notice: Caught 'Terminated' signal | 15 (invoking handler)</div><div>Apr 23 11:06:00.550 testcluster-c1 pacemakerd [1295869] (stop_child) notice: Stopping pacemaker-attrd | sent signal 15 to process 1295873</div><div>Apr 23 11:06:00.570 testcluster-c1 pacemaker-attrd [1295873] (main) info: Shutting down attribute manager</div><div>Apr 23 11:06:00.582 testcluster-c1 pacemaker-attrd [1295873] (qb_ipcs_us_withdraw) info: withdrawing server sockets</div><div>Apr 23 11:06:01.078 testcluster-c1 pacemaker-attrd [1295873] (attrd_cib_destroy_cb) info: Connection disconnection complete</div><div>Apr 23 11:06:01.126 testcluster-c1 pacemaker-attrd [1295873] (crm_xml_cleanup) info: Cleaning up memory from libxml2</div><div>Apr 23 11:06:01.382 testcluster-c1 pacemaker-attrd [1295873] (crm_exit) info: Exiting pacemaker-attrd | with status 0</div><div>Apr 23 11:06:02.926 testcluster-c1 pacemakerd [1295869] (pcmk_child_exit) info: pacemaker-attrd[1295873] exited with status 0 (OK)</div><div>Apr 23 11:06:02.958 testcluster-c1 pacemakerd [1295869] (stop_child) notice: Stopping pacemaker-execd | sent signal 15 to process 1295872</div><div>Apr 23 11:06:03.006 testcluster-c1 pacemaker-execd [1295872] (crm_signal_dispatch) notice: Caught 'Terminated' signal | 15 (invoking handler)</div><div>Apr 23 11:06:03.010 testcluster-c1 pacemaker-execd [1295872] (lrmd_exit) info: Terminating with 0 clients</div><div>Apr 23 11:06:03.014 testcluster-c1 pacemaker-execd [1295872] (qb_ipcs_us_withdraw) info: withdrawing server sockets</div><div>Apr 23 11:06:03.022 testcluster-c1 pacemaker-execd [1295872] (crm_xml_cleanup) info: Cleaning up memory from libxml2</div><div>Apr 23 11:06:03.206 testcluster-c1 pacemaker-execd [1295872] (crm_exit) info: Exiting pacemaker-execd | with status 0</div><div>Apr 23 11:06:03.382 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=testcluster-c2/attrd/5, version=0.30.156)</div><div>Apr 23 11:06:03.438 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=testcluster-c2/attrd/6, version=0.30.156)</div><div>Apr 23 11:06:03.966 testcluster-c1 pacemakerd [1295869] (pcmk_child_exit) info: pacemaker-execd[1295872] exited with status 0 (OK)</div><div>Apr 23 11:06:03.970 testcluster-c1 pacemakerd [1295869] (stop_child) notice: Stopping pacemaker-fenced | sent signal 15 to process 1295871</div><div>Apr 23 11:06:03.978 testcluster-c1 pacemaker-fenced [1295871] (crm_signal_dispatch) notice: Caught 'Terminated' signal | 15 (invoking handler)</div><div>Apr 23 11:06:04.010 testcluster-c1 pacemaker-fenced [1295871] (stonith_shutdown) info: Terminating with 0 clients</div><div>Apr 23 11:06:04.630 testcluster-c1 pacemaker-fenced [1295871] (cib_connection_destroy) info: Connection to the CIB manager closed</div><div>Apr 23 11:06:04.702 testcluster-c1 pacemaker-fenced [1295871] (qb_ipcs_us_withdraw) info: withdrawing server sockets</div><div>Apr 23 11:06:04.778 testcluster-c1 pacemaker-fenced [1295871] (crm_xml_cleanup) info: Cleaning up memory from libxml2</div><div>Apr 23 11:06:04.842 testcluster-c1 pacemaker-fenced [1295871] (crm_exit) info: Exiting pacemaker-fenced | with status 0</div><div>Apr 23 11:06:06.258 testcluster-c1 pacemakerd [1295869] (pcmk_child_exit) info: pacemaker-fenced[1295871] exited with status 0 (OK)</div><div>Apr 23 11:06:06.266 testcluster-c1 pacemaker-based [1295870] (crm_signal_dispatch) notice: Caught 'Terminated' signal | 15 (invoking handler)</div><div>Apr 23 11:06:06.266 testcluster-c1 pacemakerd [1295869] (stop_child) notice: Stopping pacemaker-based | sent signal 15 to process 1295870</div><div>Apr 23 11:06:06.274 testcluster-c1 pacemaker-based [1295870] (cib_shutdown) info: Disconnected 0 clients</div><div>Apr 23 11:06:06.274 testcluster-c1 pacemaker-based [1295870] (cib_shutdown) info: All clients disconnected (0)</div><div>Apr 23 11:06:06.282 testcluster-c1 pacemaker-based [1295870] (initiate_exit) info: Sending disconnect notification to 2 peers...</div><div>Apr 23 11:06:06.334 testcluster-c1 pacemaker-based [1295870] (cib_process_shutdown_req) info: Peer testcluster-c1 is requesting to shut down</div><div>Apr 23 11:06:06.346 testcluster-c1 pacemaker-based [1295870] (cib_process_shutdown_req) info: Peer testcluster-c2 has acknowledged our shutdown request</div><div>Apr 23 11:06:06.346 testcluster-c1 pacemaker-based [1295870] (terminate_cib) info: cib_process_shutdown_req: Exiting from mainloop...</div><div>Apr 23 11:06:06.350 testcluster-c1 pacemaker-based [1295870] (crm_cluster_disconnect) info: Disconnecting from corosync cluster infrastructure</div><div>Apr 23 11:06:06.358 testcluster-c1 pacemaker-based [1295870] (pcmk__corosync_disconnect) notice: Disconnected from Corosync</div><div>Apr 23 11:06:06.366 testcluster-c1 pacemaker-based [1295870] (crm_get_peer) info: Created entry 48e34a90-0596-48f1-b2b2-171d1692aac5/0x564d66683e40 for node testcluster-c2/0 (1 total)</div><div>Apr 23 11:06:06.406 testcluster-c1 pacemaker-based [1295870] (cib_peer_update_callback) info: No more peers</div><div>Apr 23 11:06:06.422 testcluster-c1 pacemaker-based [1295870] (terminate_cib) info: cib_peer_update_callback: Exiting from mainloop...</div><div>Apr 23 11:06:06.426 testcluster-c1 pacemaker-based [1295870] (crm_cluster_disconnect) info: Disconnecting from corosync cluster infrastructure</div><div>Apr 23 11:06:06.426 testcluster-c1 pacemaker-based [1295870] (cluster_disconnect_cpg) info: No CPG connection</div><div>Apr 23 11:06:06.426 testcluster-c1 pacemaker-based [1295870] (pcmk__corosync_disconnect) notice: Disconnected from Corosync</div><div>Apr 23 11:06:06.426 testcluster-c1 pacemaker-based [1295870] (qb_ipcs_us_withdraw) info: withdrawing server sockets</div><div>Apr 23 11:06:06.442 testcluster-c1 pacemaker-based [1295870] (qb_ipcs_us_withdraw) info: withdrawing server sockets</div><div>Apr 23 11:06:06.442 testcluster-c1 pacemaker-based [1295870] (qb_ipcs_us_withdraw) info: withdrawing server sockets</div><div>Apr 23 11:06:06.458 testcluster-c1 pacemaker-based [1295870] (crm_xml_cleanup) info: Cleaning up memory from libxml2</div><div>Apr 23 11:06:09.682 testcluster-c1 pacemaker-based [1295870] (crm_exit) info: Exiting pacemaker-based | with status 0</div><div>Apr 23 11:06:10.450 testcluster-c1 pacemakerd [1295869] (pcmk_child_exit) info: pacemaker-based[1295870] exited with status 0 (OK)</div><div>Apr 23 11:06:10.458 testcluster-c1 pacemakerd [1295869] (pcmk_shutdown_worker) notice: Shutdown complete</div><div>Apr 23 11:06:10.458 testcluster-c1 pacemakerd [1295869] (pcmk_shutdown_worker) notice: Shutting down and staying down after fatal error</div><div>Apr 23 11:06:10.458 testcluster-c1 pacemakerd [1295869] (pcmkd_shutdown_corosync) info: Asking Corosync to shut down</div><div>Apr 23 11:06:10.530 testcluster-c1 pacemakerd [1295869] (crm_xml_cleanup) info: Cleaning up memory from libxml2</div><div>Apr 23 11:06:10.554 testcluster-c1 pacemakerd [1295869] (crm_exit) info: Exiting pacemakerd | with status 100</div><div><br><blockquote type="cite"><div>On 26.04.2024, at 05:54, Reid Wahl <nwahl@redhat.com> wrote:</div><br class="Apple-interchange-newline"><div><div>Any logs from Pacemaker?<br><br>On Thu, Apr 25, 2024 at 3:46 AM Alexander Eastwood via Users<br><users@clusterlabs.org> wrote:<br><blockquote type="cite"><br>Hi all,<br><br>I’m trying to get a better understanding of why our cluster - or specifically corosync.service - entered a failed state. Here are all of the relevant corosync logs from this event, with the last line showing when I manually started the service again:<br><br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice [CFG ] Node 1 was shut down by sysadmin<br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice [SERV ] Unloading all Corosync service engines.<br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync info [QB ] withdrawing server sockets<br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice [SERV ] Service engine unloaded: corosync vote quorum service v1.0<br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync info [QB ] withdrawing server sockets<br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice [SERV ] Service engine unloaded: corosync configuration map access<br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync info [QB ] withdrawing server sockets<br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice [SERV ] Service engine unloaded: corosync configuration service<br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync info [QB ] withdrawing server sockets<br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01<br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync info [QB ] withdrawing server sockets<br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice [SERV ] Service engine unloaded: corosync cluster quorum service v0.1<br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice [SERV ] Service engine unloaded: corosync profile loading service<br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice [SERV ] Service engine unloaded: corosync resource monitoring service<br>Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice [SERV ] Service engine unloaded: corosync watchdog service<br>Apr 23 11:06:11 [1295854] testcluster-c1 corosync info [KNET ] host: host: 1 (passive) best link: 0 (pri: 0)<br>Apr 23 11:06:11 [1295854] testcluster-c1 corosync warning [KNET ] host: host: 1 has no active links<br>Apr 23 11:06:11 [1295854] testcluster-c1 corosync notice [MAIN ] Corosync Cluster Engine exiting normally<br>Apr 23 13:18:36 [796246] testcluster-c1 corosync notice [MAIN ] Corosync Cluster Engine 3.1.6 starting up<br><br>The first line suggests that a manual shutdown of one of the cluster nodes, however neither me nor any of my colleagues did this. The ‘sysadmin’ surely must mean a person logging on to the server and running some command, as opposed to a system process?<br><br>Then in the 3rd row from the bottom there is the warning “host: host: 1 has no active links” which is followed by “Corosync Cluster Engine exiting normally”. Does this mean that the reason for the Cluster Engine exiting is the fact that there are no active links?<br><br>Finally, I am considering adding a systemd override file for the corosync service with the following content:<br><br>[Service]<br>Restart=on-failure<br><br>Is there any reason not to do this? And, given that the process exited normally, would I need to use Restart=always instead?<br><br>Many thanks<br><br>Alex<br>_______________________________________________<br>Manage your subscription:<br>https://lists.clusterlabs.org/mailman/listinfo/users<br><br>ClusterLabs home: https://www.clusterlabs.org/<br></blockquote><br><br><br>-- <br>Regards,<br><br>Reid Wahl (He/Him)<br>Senior Software Engineer, Red Hat<br>RHEL High Availability - Pacemaker<br><br></div></div></blockquote></div><br></div></div></div></body></html>