[ClusterLabs] Non recoverable state of cluster after exit of one node due to killing of processes by oom killer

shivraj dongawe shivraj198 at gmail.com
Sun Feb 14 06:03:49 EST 2021


We are running a two node cluster on Ubuntu 20.04 LTS. Cluster related
package version details are as
follows: pacemaker/focal-updates,focal-security 2.0.3-3ubuntu4.1 amd64
pacemaker/focal 2.0.3-3ubuntu3 amd64
corosync/focal 3.0.3-2ubuntu2 amd64
pcs/focal 0.10.4-3 all
fence-agents/focal 4.5.2-1 amd64
gfs2-utils/focal 3.2.0-3 amd64
dlm-controld/focal 4.0.9-1build1 amd64
lvm2-lockd/focal 2.03.07-1ubuntu1 amd64

Cluster configuration details:
1. Cluster is having a shared storage mounted through gfs2 filesystem with
the help of dlm and lvmlockd.
2. Corosync is configured to use knet for transport.
3. Fencing is configured using fence_scsi on the shared storage which is
being used for gfs2 filesystem
4. Two main resources configured are cluster/virtual ip and postgresql-12,
postgresql-12 is configured as a systemd resource.
We had done failover testing(rebooting/shutting down of a node, link
failure) of the cluster and had observed that resources were getting
migrated properly on the active node.

Recently we came across an issue which has occurred repeatedly in span of
two days.
Details are below:
1. Out of memory killer is getting invoked on active node and it starts
killing processes.
Sample is as follows:
postgres invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE),
order=0, oom_score_adj=0
2. At one instance it started with killing of pacemaker and on another with
postgresql. It does not stop with the killing of a single process it goes
on killing others(more concerning is killing of cluster related processes)
as well. We have observed that swap space on that node is 2 GB against RAM
of 96 GB and are in the process of increasing swap space to see if this
resolves this issue. Postgres is configured with shared_buffers value of 32
GB(which is way less than 96 GB).
We are not yet sure which process is eating up that much memory suddenly.
3. As a result of killing processes on node1, node2 is trying to fence
node1 and thereby initiating stopping of cluster resources on node1.
4. At this point we go in a stage where it is assumed that node1 is down
and application resources, cluster IP and postgresql are being started on
node2.
5. Postgresql on node 2 fails to start in 60 sec(start operation timeout)
and is declared as failed. During the start operation of postgres, we have
found many messages related to failure of fencing and other resources such
as dlm and vg waiting for fencing to complete.
Details of syslog messages of node2 during this event are attached in file.
6. After this point we are in a state where node1 and node2 both go in
fenced state and resources are unrecoverable(all resources on both nodes).

Now my question is out of memory issue of node1 can be taken care by
increasing swap and finding out the process responsible for such huge
memory usage and taking necessary actions to minimize that memory usage,
but the other issue that remains unclear is why cluster is not shifted to
node2 cleanly and become unrecoverable.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210214/93b4a0f7/attachment-0001.htm>
-------------- next part --------------
dlm_controld[1616]: 91659 lvm_postgres_db_vg wait for fencing
dlm_controld[1616]: 91659 lvm_global wait for fencing
Feb 13 11:06:56 DB-2 dlm_controld[1616]: 91659 postgres_db_gfs wait for fencing
Feb 13 11:06:56 DB-2 dlm_controld[1616]: 91659 fence wait 1 pid 507594 running
Feb 13 11:06:56 DB-2 dbus-daemon[1059]: [system] Activating via systemd: service name='org.freedesktop.fwupd' unit='fwupd.service' requested by ':1.151' (uid=62803 pid=507602 comm="/usr/bin/fwupdmgr refresh --no-metadata-check " label="unconfined")
Feb 13 11:06:56 DB-2 systemd[1]: Starting Firmware update daemon...
Feb 13 11:06:56 DB-2 lvm[507606]:   pvscan[507606] PV /dev/mapper/mpatha ignore shared VG.
Feb 13 11:06:56 DB-2 fwupd[507614]: 11:06:56:0286 FuPluginUefi         kernel efivars support missing: /sys/firmware/efi/efivars
Feb 13 11:06:56 DB-2 systemd[1]: lvm2-pvscan at 253:0.service: Succeeded.
Feb 13 11:06:56 DB-2 systemd[1]: Stopped LVM event activation on device 253:0.
Feb 13 11:06:56 DB-2 dlm_controld[1616]: 91659 fence result 1 pid 507594 result -1 term signal 6
Feb 13 11:06:56 DB-2 dlm_controld[1616]: 91659 fence status 1 receive -1 from 2 walltime 1613214416 local 91659
Feb 13 11:06:56 DB-2 dlm_controld[1616]: 91659 fence request 1 pid 507621 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:06:56 DB-2 dlm_controld[1616]: 91660 fence result 1 pid 507621 result -1 term signal 6
Feb 13 11:06:56 DB-2 dlm_controld[1616]: 91660 fence status 1 receive -1 from 2 walltime 1613214416 local 91660
Feb 13 11:06:56 DB-2 dlm_controld[1616]: 91660 fence request 1 pid 507626 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:06:56 DB-2 dbus-daemon[1059]: [system] Successfully activated service 'org.freedesktop.fwupd'
Feb 13 11:06:56 DB-2 systemd[1]: Started Firmware update daemon.
Feb 13 11:06:56 DB-2 fwupdmgr[507602]: Fetching metadata https://cdn.fwupd.org/downloads/firmware.xml.gz
Feb 13 11:06:56 DB-2 systemd[1]: fwupd-refresh.service: Main process exited, code=exited, status=1/FAILURE
Feb 13 11:06:56 DB-2 systemd[1]: fwupd-refresh.service: Failed with result 'exit-code'.
Feb 13 11:06:56 DB-2 systemd[1]: Failed to start Refresh fwupd metadata and update motd.
Feb 13 11:06:56 DB-2 IPaddr2(ClusterIP)[507446]: INFO:
Feb 13 11:06:57 DB-2 corosync[1525]:   [KNET  ] rx: host: 1 link: 0 is up
Feb 13 11:06:57 DB-2 corosync[1525]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Feb 13 11:06:57 DB-2 whoopsie[1553]: [11:06:57] Cannot reach: https://daisy.ubuntu.com
Feb 13 11:06:57 DB-2 corosync[1525]:   [TOTEM ] A new membership (1.139) was formed. Members joined: 1
Feb 13 11:06:57 DB-2 corosync[1525]:   [CPG   ] downlist left_list: 0 received
Feb 13 11:06:57 DB-2 corosync[1525]:   [CPG   ] downlist left_list: 0 received
Feb 13 11:06:57 DB-2 corosync[1525]:   [QUORUM] Members[2]: 1 2
Feb 13 11:06:57 DB-2 corosync[1525]:   [MAIN  ] Completed service synchronization, ready to provide service.
Feb 13 11:06:57 DB-2 pacemakerd[2440]:  notice: Node node1 state is now member
Feb 13 11:06:57 DB-2 pacemaker-controld[2451]:  notice: Node node1 state is now member
Feb 13 11:06:57 DB-2 dlm_controld[1616]: 91661 fence result 1 pid 507626 result -1 term signal 6
Feb 13 11:06:57 DB-2 dlm_controld[1616]: 91661 fence status 1 receive -1 from 2 walltime 1613214417 local 91661
Feb 13 11:06:57 DB-2 ntpd[1436]: Listen normally on 7 bond0 <IP>:123
Feb 13 11:06:57 DB-2 ntpd[1436]: new interface(s) found: waking up resolver
Feb 13 11:06:58 DB-2 pacemaker-fenced[2447]:  notice: Node node1 state is now member
Feb 13 11:06:58 DB-2 pacemaker-based[2446]:  notice: Node node1 state is now member
Feb 13 11:06:59 DB-2 pacemaker-controld[2451]:  notice: Transition 107 aborted: Peer Halt
Feb 13 11:06:59 DB-2 pacemaker-attrd[2449]:  notice: Node node1 state is now member
Feb 13 11:06:59 DB-2 pacemaker-attrd[2449]:  notice: Setting #attrd-protocol[node1]: (unset) -> 2
Feb 13 11:07:00 DB-2 pacemaker-controld[2451]:  notice: Transition 107 aborted: Node join
Feb 13 11:07:04 DB-2 pacemaker-based[2446]:  notice: Local CIB 0.66.36.2c53a0778fe971b87e1d6955538d0082 differs from node1: 0.66.1.ec5a7fc448690bb8538694d295b05e5d 0x5611841c0f90
Feb 13 11:07:27 DB-2 dlm_controld[1616]: 91691 fence request 1 pid 507845 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:28 DB-2 dlm_controld[1616]: 91692 fence result 1 pid 507845 result -1 term signal 6
Feb 13 11:07:28 DB-2 dlm_controld[1616]: 91692 fence status 1 receive -1 from 2 walltime 1613214448 local 91692
Feb 13 11:07:28 DB-2 dlm_controld[1616]: 91692 fence request 1 pid 507847 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:30 DB-2 dlm_controld[1616]: 91693 fence result 1 pid 507847 result -1 term signal 6
Feb 13 11:07:30 DB-2 dlm_controld[1616]: 91693 fence status 1 receive -1 from 2 walltime 1613214450 local 91693
Feb 13 11:07:30 DB-2 dlm_controld[1616]: 91693 fence request 1 pid 507886 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:31 DB-2 dlm_controld[1616]: 91694 fence result 1 pid 507886 result -1 term signal 6
Feb 13 11:07:31 DB-2 dlm_controld[1616]: 91694 fence status 1 receive -1 from 2 walltime 1613214451 local 91694
Feb 13 11:07:31 DB-2 dlm_controld[1616]: 91694 fence request 1 pid 507891 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:32 DB-2 dlm_controld[1616]: 91695 fence result 1 pid 507891 result -1 term signal 6
Feb 13 11:07:32 DB-2 dlm_controld[1616]: 91695 fence status 1 receive -1 from 2 walltime 1613214452 local 91695
Feb 13 11:07:32 DB-2 dlm_controld[1616]: 91695 fence request 1 pid 507893 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:33 DB-2 dlm_controld[1616]: 91697 fence result 1 pid 507893 result -1 term signal 6
Feb 13 11:07:33 DB-2 dlm_controld[1616]: 91697 fence status 1 receive -1 from 2 walltime 1613214453 local 91697
Feb 13 11:07:33 DB-2 dlm_controld[1616]: 91697 fence request 1 pid 507897 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:34 DB-2 dlm_controld[1616]: 91698 fence result 1 pid 507897 result -1 term signal 6
Feb 13 11:07:34 DB-2 dlm_controld[1616]: 91698 fence status 1 receive -1 from 2 walltime 1613214454 local 91698
Feb 13 11:07:34 DB-2 dlm_controld[1616]: 91698 fence request 1 pid 507899 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:35 DB-2 dlm_controld[1616]: 91699 fence result 1 pid 507899 result -1 term signal 6
Feb 13 11:07:35 DB-2 dlm_controld[1616]: 91699 fence status 1 receive -1 from 2 walltime 1613214455 local 91699
Feb 13 11:07:35 DB-2 dlm_controld[1616]: 91699 fence request 1 pid 507901 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:36 DB-2 nrpe[507904]: Error: (use_ssl == true): Request packet version was invalid!
Feb 13 11:07:36 DB-2 nrpe[507904]: Could not read request from client 10.5.55.10, bailing out...
Feb 13 11:07:36 DB-2 nrpe[507904]: INFO: SSL Socket Shutdown.
Feb 13 11:07:37 DB-2 dlm_controld[1616]: 91700 fence result 1 pid 507901 result -1 term signal 6
Feb 13 11:07:37 DB-2 dlm_controld[1616]: 91700 fence status 1 receive -1 from 2 walltime 1613214457 local 91700
Feb 13 11:07:37 DB-2 dlm_controld[1616]: 91700 fence request 1 pid 507911 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:38 DB-2 dlm_controld[1616]: 91701 fence result 1 pid 507911 result -1 term signal 6
Feb 13 11:07:38 DB-2 corosync[1525]:   [KNET  ] link: host: 1 link: 0 is down
Feb 13 11:07:38 DB-2 corosync[1525]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Feb 13 11:07:38 DB-2 corosync[1525]:   [KNET  ] host: host: 1 has no active links
Feb 13 11:07:38 DB-2 corosync[1525]:   [TOTEM ] Token has not been received in 750 ms
Feb 13 11:07:39 DB-2 corosync[1525]:   [TOTEM ] A processor failed, forming new configuration.
Feb 13 11:07:40 DB-2 corosync[1525]:   [TOTEM ] A new membership (2.13d) was formed. Members left: 1
Feb 13 11:07:40 DB-2 corosync[1525]:   [TOTEM ] Failed to receive the leave message. failed: 1
Feb 13 11:07:40 DB-2 corosync[1525]:   [CPG   ] downlist left_list: 1 received
Feb 13 11:07:40 DB-2 corosync[1525]:   [QUORUM] Members[1]: 2
Feb 13 11:07:40 DB-2 corosync[1525]:   [MAIN  ] Completed service synchronization, ready to provide service.
Feb 13 11:07:40 DB-2 pacemaker-attrd[2449]:  notice: Node node1 state is now lost
Feb 13 11:07:40 DB-2 kernel: [91704.635350] dlm: closing connection to node 1
Feb 13 11:07:40 DB-2 pacemaker-based[2446]:  notice: Node node1 state is now lost
Feb 13 11:07:40 DB-2 pacemaker-attrd[2449]:  notice: Removing all node1 attributes for peer loss
Feb 13 11:07:40 DB-2 pacemaker-based[2446]:  notice: Purged 1 peer with id=1 and/or uname=node1 from the membership cache
Feb 13 11:07:40 DB-2 pacemakerd[2440]:  notice: Node node1 state is now lost
Feb 13 11:07:40 DB-2 pacemaker-attrd[2449]:  notice: Purged 1 peer with id=1 and/or uname=node1 from the membership cache
Feb 13 11:07:40 DB-2 pacemaker-fenced[2447]:  notice: Node node1 state is now lost
Feb 13 11:07:40 DB-2 pacemaker-fenced[2447]:  notice: Purged 1 peer with id=1 and/or uname=node1 from the membership cache
Feb 13 11:07:40 DB-2 dlm_controld[1616]: 91703 fence status 1 receive -1 from 2 walltime 1613214458 local 91703
Feb 13 11:07:40 DB-2 dlm_controld[1616]: 91703 fence request 1 pid 507950 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:40 DB-2 pacemaker-controld[2451]:  notice: Node node1 state is now lost
Feb 13 11:07:41 DB-2 dlm_controld[1616]: 91704 fence result 1 pid 507950 result -1 term signal 6
Feb 13 11:07:41 DB-2 dlm_controld[1616]: 91704 fence status 1 receive -1 from 2 walltime 1613214461 local 91704
Feb 13 11:07:41 DB-2 dlm_controld[1616]: 91704 fence request 1 pid 507952 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:42 DB-2 dlm_controld[1616]: 91706 fence result 1 pid 507952 result -1 term signal 6
Feb 13 11:07:42 DB-2 dlm_controld[1616]: 91706 fence status 1 receive -1 from 2 walltime 1613214462 local 91706
Feb 13 11:07:42 DB-2 dlm_controld[1616]: 91706 fence request 1 pid 507954 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:43 DB-2 dlm_controld[1616]: 91707 fence result 1 pid 507954 result -1 term signal 6
Feb 13 11:07:43 DB-2 dlm_controld[1616]: 91707 fence status 1 receive -1 from 2 walltime 1613214463 local 91707
Feb 13 11:07:43 DB-2 dlm_controld[1616]: 91707 fence request 1 pid 507956 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:44 DB-2 dlm_controld[1616]: 91708 fence result 1 pid 507956 result -1 term signal 6
Feb 13 11:07:44 DB-2 dlm_controld[1616]: 91708 fence status 1 receive -1 from 2 walltime 1613214464 local 91708
Feb 13 11:07:44 DB-2 dlm_controld[1616]: 91708 fence request 1 pid 507958 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:46 DB-2 dlm_controld[1616]: 91709 fence result 1 pid 507958 result -1 term signal 6
Feb 13 11:07:46 DB-2 dlm_controld[1616]: 91709 fence status 1 receive -1 from 2 walltime 1613214466 local 91709
Feb 13 11:07:46 DB-2 dlm_controld[1616]: 91709 fence request 1 pid 507999 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:47 DB-2 dlm_controld[1616]: 91710 fence result 1 pid 507999 result -1 term signal 6
Feb 13 11:07:47 DB-2 dlm_controld[1616]: 91710 fence status 1 receive -1 from 2 walltime 1613214467 local 91710
Feb 13 11:07:47 DB-2 dlm_controld[1616]: 91710 fence request 1 pid 508001 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:48 DB-2 dlm_controld[1616]: 91711 fence result 1 pid 508001 result -1 term signal 6
Feb 13 11:07:48 DB-2 dlm_controld[1616]: 91711 fence status 1 receive -1 from 2 walltime 1613214468 local 91711
Feb 13 11:07:48 DB-2 dlm_controld[1616]: 91711 fence request 1 pid 508003 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:49 DB-2 dlm_controld[1616]: 91713 fence result 1 pid 508003 result -1 term signal 6
Feb 13 11:07:49 DB-2 dlm_controld[1616]: 91713 fence status 1 receive -1 from 2 walltime 1613214469 local 91713
Feb 13 11:07:49 DB-2 dlm_controld[1616]: 91713 fence request 1 pid 508042 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:50 DB-2 dlm_controld[1616]: 91714 fence result 1 pid 508042 result -1 term signal 6
Feb 13 11:07:50 DB-2 dlm_controld[1616]: 91714 fence status 1 receive -1 from 2 walltime 1613214470 local 91714
Feb 13 11:07:50 DB-2 dlm_controld[1616]: 91714 fence request 1 pid 508046 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:51 DB-2 dlm_controld[1616]: 91715 fence result 1 pid 508046 result -1 term signal 6
Feb 13 11:07:51 DB-2 dlm_controld[1616]: 91715 fence status 1 receive -1 from 2 walltime 1613214471 local 91715
Feb 13 11:07:51 DB-2 dlm_controld[1616]: 91715 fence request 1 pid 508048 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:52 DB-2 dlm_controld[1616]: 91716 fence result 1 pid 508048 result -1 term signal 6
Feb 13 11:07:52 DB-2 dlm_controld[1616]: 91716 fence status 1 receive -1 from 2 walltime 1613214472 local 91716
Feb 13 11:07:52 DB-2 dlm_controld[1616]: 91716 fence request 1 pid 508050 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:53 DB-2 dlm_controld[1616]: 91717 fence result 1 pid 508050 result -1 term signal 6
Feb 13 11:07:53 DB-2 dlm_controld[1616]: 91717 fence status 1 receive -1 from 2 walltime 1613214473 local 91717
Feb 13 11:07:53 DB-2 dlm_controld[1616]: 91717 fence request 1 pid 508054 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:55 DB-2 dlm_controld[1616]: 91718 fence result 1 pid 508054 result -1 term signal 6
Feb 13 11:07:55 DB-2 dlm_controld[1616]: 91718 fence status 1 receive -1 from 2 walltime 1613214475 local 91718
Feb 13 11:07:55 DB-2 dlm_controld[1616]: 91718 fence request 1 pid 508061 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:55 DB-2 pacemaker-execd[2448]:  notice: Giving up on postgresdb start (rc=196): timeout (elapsed=59988ms, remaining=12ms)
Feb 13 11:07:55 DB-2 pacemaker-controld[2451]:  error: Result of start operation for postgresdb on node2: Timed Out
Feb 13 11:07:55 DB-2 pacemaker-controld[2451]:  notice: Transition 107 action 51 (postgresdb_start_0 on node2): expected 'ok' but got 'OCF_TIMEOUT'
Feb 13 11:07:55 DB-2 pacemaker-controld[2451]:  notice: Transition 107 (Complete=24, Pending=0, Fired=0, Skipped=0, Incomplete=2, Source=/var/lib/pacemaker/pengine/pe-error-13.bz2): Complete
Feb 13 11:07:55 DB-2 pacemaker-attrd[2449]:  notice: Setting fail-count-postgresdb#start_0[node2]: (unset) -> INFINITY
Feb 13 11:07:55 DB-2 pacemaker-attrd[2449]:  notice: Setting last-failure-postgresdb#start_0[node2]: (unset) -> 1613214475
Feb 13 11:07:55 DB-2 pacemaker-schedulerd[2450]:  warning: Cluster node node1 will be fenced: peer is no longer part of the cluster
Feb 13 11:07:55 DB-2 pacemaker-schedulerd[2450]:  warning: Node node1 is unclean
Feb 13 11:07:55 DB-2 pacemaker-schedulerd[2450]:  warning: Unexpected result (OCF_TIMEOUT) was recorded for start of postgresdb on node2 at Feb 13 11:07:55 2021
Feb 13 11:07:55 DB-2 pacemaker-schedulerd[2450]:  warning: Unexpected result (OCF_TIMEOUT) was recorded for start of postgresdb on node2 at Feb 13 11:07:55 2021
Feb 13 11:07:55 DB-2 pacemaker-schedulerd[2450]:  warning: Scheduling Node node1 for STONITH
Feb 13 11:07:55 DB-2 pacemaker-schedulerd[2450]:  notice:  * Fence (reboot) node1 'peer is no longer part of the cluster'
Feb 13 11:07:55 DB-2 pacemaker-schedulerd[2450]:  notice:  * Recover    postgresdb         (          node2 )
Feb 13 11:07:55 DB-2 pacemaker-schedulerd[2450]:  warning: Calculated transition 108 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-13.bz2
Feb 13 11:07:55 DB-2 pacemaker-schedulerd[2450]:  warning: Cluster node node1 will be fenced: peer is no longer part of the cluster
Feb 13 11:07:55 DB-2 pacemaker-schedulerd[2450]:  warning: Node node1 is unclean
Feb 13 11:07:55 DB-2 pacemaker-schedulerd[2450]:  warning: Unexpected result (OCF_TIMEOUT) was recorded for start of postgresdb on node2 at Feb 13 11:07:55 2021
Feb 13 11:07:55 DB-2 pacemaker-schedulerd[2450]:  warning: Unexpected result (OCF_TIMEOUT) was recorded for start of postgresdb on node2 at Feb 13 11:07:55 2021
Feb 13 11:07:55 DB-2 pacemaker-schedulerd[2450]:  warning: Forcing postgresdb away from node2 after 1000000 failures (max=1000000)
Feb 13 11:07:56 DB-2 pacemaker-schedulerd[2450]:  warning: Scheduling Node node1 for STONITH
Feb 13 11:07:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Fence (reboot) node1 'peer is no longer part of the cluster'
Feb 13 11:07:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       postgresdb         (          node2 )   due to node availability
Feb 13 11:07:56 DB-2 pacemaker-schedulerd[2450]:  warning: Calculated transition 109 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-14.bz2
Feb 13 11:07:56 DB-2 pacemaker-controld[2451]:  notice: Initiating stop operation postgresdb_stop_0 locally on node2
Feb 13 11:07:56 DB-2 pacemaker-controld[2451]:  notice: Requesting fencing (reboot) of node node1
Feb 13 11:07:56 DB-2 pacemaker-fenced[2447]:  notice: Client pacemaker-controld.2451.cae34137 wants to fence (reboot) 'node1' with device '(any)'
Feb 13 11:07:56 DB-2 pacemaker-fenced[2447]:  notice: Requesting peer fencing (reboot) targeting node1
Feb 13 11:07:56 DB-2 pacemaker-fenced[2447]:  notice: scsi is eligible to fence (reboot) node1: static-list
Feb 13 11:07:56 DB-2 systemd[1]: Reloading.
Feb 13 11:07:56 DB-2 pacemaker-fenced[2447]:  notice: Requesting that node2 perform 'reboot' action targeting node1
Feb 13 11:07:56 DB-2 pacemaker-fenced[2447]:  notice: scsi is eligible to fence (reboot) node1: static-list
Feb 13 11:07:56 DB-2 dlm_controld[1616]: 91719 fence result 1 pid 508061 result -1 term signal 6
Feb 13 11:07:56 DB-2 dlm_controld[1616]: 91719 fence status 1 receive -1 from 2 walltime 1613214476 local 91719
Feb 13 11:07:56 DB-2 dlm_controld[1616]: 91719 fence request 1 pid 508065 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:56 DB-2 pacemaker-fenced[2447]:  warning: Agent 'fence_scsi' does not advertise support for 'reboot', performing 'off' action instead
Feb 13 11:07:56 DB-2 systemd[1]: /lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/dbus/system_bus_socket → /run/dbus/system_bus_socket; please update the unit file accordingly.
Feb 13 11:07:56 DB-2 pacemaker-fenced[2447]:  notice: Operation 'reboot' [508109] (call 6 from pacemaker-controld.2451) for host 'node1' with device 'scsi' returned: 0 (OK)
Feb 13 11:07:56 DB-2 pacemaker-fenced[2447]:  notice: Operation 'reboot' targeting node1 on node2 for pacemaker-controld.2451 at node2.46c214db: OK
Feb 13 11:07:56 DB-2 pacemaker-controld[2451]:  notice: Stonith operation 6/1:109:0:ef7f482f-a1f1-4769-82f6-641068aeca54: OK (0)
Feb 13 11:07:56 DB-2 pacemaker-controld[2451]:  notice: Peer node1 was terminated (reboot) by node2 on behalf of pacemaker-controld.2451: OK
Feb 13 11:07:57 DB-2 dlm_controld[1616]: 91720 fence result 1 pid 508065 result -1 term signal 6
Feb 13 11:07:57 DB-2 dlm_controld[1616]: 91720 fence status 1 receive -1 from 2 walltime 1613214477 local 91720
Feb 13 11:07:57 DB-2 dlm_controld[1616]: 91720 fence request 1 pid 508159 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:58 DB-2 dlm_controld[1616]: 91721 fence result 1 pid 508159 result -1 term signal 6
Feb 13 11:07:58 DB-2 dlm_controld[1616]: 91721 fence status 1 receive -1 from 2 walltime 1613214478 local 91721
Feb 13 11:07:58 DB-2 dlm_controld[1616]: 91721 fence request 1 pid 508161 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:07:59 DB-2 dlm_controld[1616]: 91723 fence result 1 pid 508161 result -1 term signal 6
Feb 13 11:07:59 DB-2 dlm_controld[1616]: 91723 fence status 1 receive -1 from 2 walltime 1613214479 local 91723
Feb 13 11:07:59 DB-2 dlm_controld[1616]: 91723 fence request 1 pid 508163 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:00 DB-2 dlm_controld[1616]: 91724 fence result 1 pid 508163 result -1 term signal 6
Feb 13 11:08:00 DB-2 dlm_controld[1616]: 91724 fence status 1 receive -1 from 2 walltime 1613214480 local 91724
Feb 13 11:08:00 DB-2 dlm_controld[1616]: 91724 fence request 1 pid 508202 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:01 DB-2 dlm_controld[1616]: 91725 fence result 1 pid 508202 result -1 term signal 6
Feb 13 11:08:01 DB-2 dlm_controld[1616]: 91725 fence status 1 receive -1 from 2 walltime 1613214481 local 91725
Feb 13 11:08:01 DB-2 dlm_controld[1616]: 91725 fence request 1 pid 508204 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:03 DB-2 dlm_controld[1616]: 91726 fence result 1 pid 508204 result -1 term signal 6
Feb 13 11:08:03 DB-2 dlm_controld[1616]: 91726 fence status 1 receive -1 from 2 walltime 1613214483 local 91726
Feb 13 11:08:03 DB-2 dlm_controld[1616]: 91726 fence request 1 pid 508206 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:04 DB-2 dlm_controld[1616]: 91727 fence result 1 pid 508206 result -1 term signal 6
Feb 13 11:08:04 DB-2 dlm_controld[1616]: 91727 fence status 1 receive -1 from 2 walltime 1613214484 local 91727
Feb 13 11:08:04 DB-2 dlm_controld[1616]: 91727 fence request 1 pid 508208 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:05 DB-2 dlm_controld[1616]: 91728 fence result 1 pid 508208 result -1 term signal 6
Feb 13 11:08:05 DB-2 dlm_controld[1616]: 91728 fence status 1 receive -1 from 2 walltime 1613214485 local 91728
Feb 13 11:08:05 DB-2 dlm_controld[1616]: 91728 fence request 1 pid 508210 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:06 DB-2 dlm_controld[1616]: 91730 fence result 1 pid 508210 result -1 term signal 6
Feb 13 11:08:06 DB-2 dlm_controld[1616]: 91730 fence status 1 receive -1 from 2 walltime 1613214486 local 91730
Feb 13 11:08:06 DB-2 dlm_controld[1616]: 91730 fence request 1 pid 508212 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:07 DB-2 dlm_controld[1616]: 91731 fence result 1 pid 508212 result -1 term signal 6
Feb 13 11:08:07 DB-2 dlm_controld[1616]: 91731 fence status 1 receive -1 from 2 walltime 1613214487 local 91731
Feb 13 11:08:07 DB-2 dlm_controld[1616]: 91731 fence request 1 pid 508214 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:08 DB-2 dlm_controld[1616]: 91732 fence result 1 pid 508214 result -1 term signal 6
Feb 13 11:08:08 DB-2 dlm_controld[1616]: 91732 fence status 1 receive -1 from 2 walltime 1613214488 local 91732
Feb 13 11:08:08 DB-2 dlm_controld[1616]: 91732 fence request 1 pid 508216 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:10 DB-2 dlm_controld[1616]: 91733 fence result 1 pid 508216 result -1 term signal 6
Feb 13 11:08:10 DB-2 dlm_controld[1616]: 91733 fence status 1 receive -1 from 2 walltime 1613214490 local 91733
Feb 13 11:08:10 DB-2 dlm_controld[1616]: 91733 fence request 1 pid 508255 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:11 DB-2 dlm_controld[1616]: 91734 fence result 1 pid 508255 result -1 term signal 6
Feb 13 11:08:11 DB-2 dlm_controld[1616]: 91734 fence status 1 receive -1 from 2 walltime 1613214491 local 91734
Feb 13 11:08:11 DB-2 dlm_controld[1616]: 91734 fence request 1 pid 508259 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:12 DB-2 dlm_controld[1616]: 91735 fence result 1 pid 508259 result -1 term signal 6
Feb 13 11:08:12 DB-2 dlm_controld[1616]: 91735 fence status 1 receive -1 from 2 walltime 1613214492 local 91735
Feb 13 11:08:12 DB-2 dlm_controld[1616]: 91735 fence request 1 pid 508261 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:12 DB-2 dlm_controld[1616]: 91735 postgres_db_gfs wait for fencing
Feb 13 11:08:12 DB-2 dlm_controld[1616]: 91735 lvm_postgres_db_vg wait for fencing
Feb 13 11:08:12 DB-2 dlm_controld[1616]: 91735 lvm_global wait for fencing
Feb 13 11:08:13 DB-2 dlm_controld[1616]: 91737 fence result 1 pid 508261 result -1 term signal 6
Feb 13 11:08:13 DB-2 dlm_controld[1616]: 91737 fence status 1 receive -1 from 2 walltime 1613214493 local 91737
Feb 13 11:08:13 DB-2 dlm_controld[1616]: 91737 fence request 1 pid 508265 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:14 DB-2 dlm_controld[1616]: 91738 fence result 1 pid 508265 result -1 term signal 6
Feb 13 11:08:14 DB-2 dlm_controld[1616]: 91738 fence status 1 receive -1 from 2 walltime 1613214494 local 91738
Feb 13 11:08:14 DB-2 dlm_controld[1616]: 91738 fence request 1 pid 508267 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:15 DB-2 dlm_controld[1616]: 91739 fence result 1 pid 508267 result -1 term signal 6
Feb 13 11:08:15 DB-2 dlm_controld[1616]: 91739 fence status 1 receive -1 from 2 walltime 1613214495 local 91739
Feb 13 11:08:15 DB-2 dlm_controld[1616]: 91739 fence request 1 pid 508293 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:17 DB-2 dlm_controld[1616]: 91740 fence result 1 pid 508293 result -1 term signal 6
Feb 13 11:08:17 DB-2 dlm_controld[1616]: 91740 fence status 1 receive -1 from 2 walltime 1613214497 local 91740
Feb 13 11:08:17 DB-2 dlm_controld[1616]: 91740 fence request 1 pid 508310 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:18 DB-2 dlm_controld[1616]: 91741 fence result 1 pid 508310 result -1 term signal 6
Feb 13 11:08:18 DB-2 dlm_controld[1616]: 91741 fence status 1 receive -1 from 2 walltime 1613214498 local 91741
Feb 13 11:08:18 DB-2 dlm_controld[1616]: 91741 fence request 1 pid 508312 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:19 DB-2 dlm_controld[1616]: 91742 fence result 1 pid 508312 result -1 term signal 6
Feb 13 11:08:19 DB-2 dlm_controld[1616]: 91742 fence status 1 receive -1 from 2 walltime 1613214499 local 91742
Feb 13 11:08:19 DB-2 dlm_controld[1616]: 91742 fence request 1 pid 508314 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:20 DB-2 dlm_controld[1616]: 91744 fence result 1 pid 508314 result -1 term signal 6
Feb 13 11:08:20 DB-2 dlm_controld[1616]: 91744 fence status 1 receive -1 from 2 walltime 1613214500 local 91744
Feb 13 11:08:20 DB-2 dlm_controld[1616]: 91744 fence request 1 pid 508353 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:21 DB-2 dlm_controld[1616]: 91745 fence result 1 pid 508353 result -1 term signal 6
Feb 13 11:08:21 DB-2 dlm_controld[1616]: 91745 fence status 1 receive -1 from 2 walltime 1613214501 local 91745
Feb 13 11:08:21 DB-2 dlm_controld[1616]: 91745 fence request 1 pid 508355 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:22 DB-2 dlm_controld[1616]: 91746 fence result 1 pid 508355 result -1 term signal 6
Feb 13 11:08:22 DB-2 dlm_controld[1616]: 91746 fence status 1 receive -1 from 2 walltime 1613214502 local 91746
Feb 13 11:08:22 DB-2 dlm_controld[1616]: 91746 fence request 1 pid 508357 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:23 DB-2 dlm_controld[1616]: 91747 fence result 1 pid 508357 result -1 term signal 6
Feb 13 11:08:23 DB-2 dlm_controld[1616]: 91747 fence status 1 receive -1 from 2 walltime 1613214503 local 91747
Feb 13 11:08:23 DB-2 dlm_controld[1616]: 91747 fence request 1 pid 508359 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:25 DB-2 dlm_controld[1616]: 91748 fence result 1 pid 508359 result -1 term signal 6
Feb 13 11:08:25 DB-2 dlm_controld[1616]: 91748 fence status 1 receive -1 from 2 walltime 1613214505 local 91748
Feb 13 11:08:25 DB-2 dlm_controld[1616]: 91748 fence request 1 pid 508366 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:26 DB-2 dlm_controld[1616]: 91749 fence result 1 pid 508366 result -1 term signal 6
Feb 13 11:08:26 DB-2 dlm_controld[1616]: 91749 fence status 1 receive -1 from 2 walltime 1613214506 local 91749
Feb 13 11:08:26 DB-2 dlm_controld[1616]: 91749 fence request 1 pid 508409 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:27 DB-2 dlm_controld[1616]: 91751 fence result 1 pid 508409 result -1 term signal 6
Feb 13 11:08:27 DB-2 dlm_controld[1616]: 91751 fence status 1 receive -1 from 2 walltime 1613214507 local 91751
Feb 13 11:08:27 DB-2 dlm_controld[1616]: 91751 fence request 1 pid 508411 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:28 DB-2 dlm_controld[1616]: 91752 fence result 1 pid 508411 result -1 term signal 6
Feb 13 11:08:28 DB-2 dlm_controld[1616]: 91752 fence status 1 receive -1 from 2 walltime 1613214508 local 91752
Feb 13 11:08:28 DB-2 dlm_controld[1616]: 91752 fence request 1 pid 508413 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:28 DB-2 dlm_controld[1616]: 91752 fence result 1 pid 508411 result -1 term signal 6
Feb 13 11:08:28 DB-2 dlm_controld[1616]: 91752 fence status 1 receive -1 from 2 walltime 1613214508 local 91752
Feb 13 11:08:28 DB-2 dlm_controld[1616]: 91752 fence request 1 pid 508413 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:29 DB-2 dlm_controld[1616]: 91753 fence result 1 pid 508413 result -1 term signal 6
Feb 13 11:08:29 DB-2 dlm_controld[1616]: 91753 fence status 1 receive -1 from 2 walltime 1613214509 local 91753
Feb 13 11:08:29 DB-2 dlm_controld[1616]: 91753 fence request 1 pid 508452 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:30 DB-2 dlm_controld[1616]: 91754 fence result 1 pid 508452 result -1 term signal 6
Feb 13 11:08:30 DB-2 dlm_controld[1616]: 91754 fence status 1 receive -1 from 2 walltime 1613214510 local 91754
Feb 13 11:08:30 DB-2 dlm_controld[1616]: 91754 fence request 1 pid 508456 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:32 DB-2 dlm_controld[1616]: 91755 fence result 1 pid 508456 result -1 term signal 6
Feb 13 11:08:32 DB-2 dlm_controld[1616]: 91755 fence status 1 receive -1 from 2 walltime 1613214512 local 91755
Feb 13 11:08:32 DB-2 dlm_controld[1616]: 91755 fence request 1 pid 508458 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:33 DB-2 dlm_controld[1616]: 91756 fence result 1 pid 508458 result -1 term signal 6
Feb 13 11:08:33 DB-2 dlm_controld[1616]: 91756 fence status 1 receive -1 from 2 walltime 1613214513 local 91756
Feb 13 11:08:33 DB-2 dlm_controld[1616]: 91756 fence request 1 pid 508462 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:34 DB-2 dlm_controld[1616]: 91758 fence result 1 pid 508462 result -1 term signal 6
Feb 13 11:08:34 DB-2 dlm_controld[1616]: 91758 fence status 1 receive -1 from 2 walltime 1613214514 local 91758
Feb 13 11:08:34 DB-2 dlm_controld[1616]: 91758 fence request 1 pid 508464 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:35 DB-2 dlm_controld[1616]: 91759 fence result 1 pid 508464 result -1 term signal 6
Feb 13 11:08:35 DB-2 dlm_controld[1616]: 91759 fence status 1 receive -1 from 2 walltime 1613214515 local 91759
Feb 13 11:08:35 DB-2 dlm_controld[1616]: 91759 fence request 1 pid 508466 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:36 DB-2 dlm_controld[1616]: 91760 fence result 1 pid 508466 result -1 term signal 6
Feb 13 11:08:36 DB-2 dlm_controld[1616]: 91760 fence status 1 receive -1 from 2 walltime 1613214516 local 91760
Feb 13 11:08:36 DB-2 dlm_controld[1616]: 91760 fence request 1 pid 508468 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:38 DB-2 dlm_controld[1616]: 91761 fence result 1 pid 508468 result -1 term signal 6
Feb 13 11:08:38 DB-2 dlm_controld[1616]: 91761 fence status 1 receive -1 from 2 walltime 1613214518 local 91761
Feb 13 11:08:38 DB-2 dlm_controld[1616]: 91761 fence request 1 pid 508470 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:39 DB-2 dlm_controld[1616]: 91762 fence result 1 pid 508470 result -1 term signal 6
Feb 13 11:08:39 DB-2 dlm_controld[1616]: 91762 fence status 1 receive -1 from 2 walltime 1613214519 local 91762
Feb 13 11:08:39 DB-2 dlm_controld[1616]: 91762 fence request 1 pid 508472 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:40 DB-2 dlm_controld[1616]: 91763 fence result 1 pid 508472 result -1 term signal 6
Feb 13 11:08:40 DB-2 dlm_controld[1616]: 91763 fence status 1 receive -1 from 2 walltime 1613214520 local 91763
Feb 13 11:08:40 DB-2 dlm_controld[1616]: 91763 fence request 1 pid 508511 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:41 DB-2 dlm_controld[1616]: 91765 fence result 1 pid 508511 result -1 term signal 6
Feb 13 11:08:41 DB-2 dlm_controld[1616]: 91765 fence status 1 receive -1 from 2 walltime 1613214521 local 91765
Feb 13 11:08:41 DB-2 dlm_controld[1616]: 91765 fence request 1 pid 508513 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:42 DB-2 dlm_controld[1616]: 91766 fence result 1 pid 508513 result -1 term signal 6
Feb 13 11:08:42 DB-2 dlm_controld[1616]: 91766 fence status 1 receive -1 from 2 walltime 1613214522 local 91766
Feb 13 11:08:42 DB-2 dlm_controld[1616]: 91766 fence request 1 pid 508515 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:43 DB-2 dlm_controld[1616]: 91767 fence result 1 pid 508515 result -1 term signal 6
Feb 13 11:08:43 DB-2 dlm_controld[1616]: 91767 fence status 1 receive -1 from 2 walltime 1613214523 local 91767
Feb 13 11:08:43 DB-2 dlm_controld[1616]: 91767 fence request 1 pid 508517 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:45 DB-2 dlm_controld[1616]: 91768 fence result 1 pid 508517 result -1 term signal 6
Feb 13 11:08:45 DB-2 dlm_controld[1616]: 91768 fence status 1 receive -1 from 2 walltime 1613214525 local 91768
Feb 13 11:08:45 DB-2 dlm_controld[1616]: 91768 fence request 1 pid 508519 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:46 DB-2 dlm_controld[1616]: 91769 fence result 1 pid 508519 result -1 term signal 6
Feb 13 11:08:46 DB-2 dlm_controld[1616]: 91769 fence status 1 receive -1 from 2 walltime 1613214526 local 91769
Feb 13 11:08:46 DB-2 dlm_controld[1616]: 91769 fence request 1 pid 508560 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:47 DB-2 dlm_controld[1616]: 91770 fence result 1 pid 508560 result -1 term signal 6
Feb 13 11:08:47 DB-2 dlm_controld[1616]: 91770 fence status 1 receive -1 from 2 walltime 1613214527 local 91770
Feb 13 11:08:47 DB-2 dlm_controld[1616]: 91770 fence request 1 pid 508562 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:48 DB-2 dlm_controld[1616]: 91772 fence result 1 pid 508562 result -1 term signal 6
Feb 13 11:08:48 DB-2 dlm_controld[1616]: 91772 fence status 1 receive -1 from 2 walltime 1613214528 local 91772
Feb 13 11:08:48 DB-2 dlm_controld[1616]: 91772 fence request 1 pid 508564 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:49 DB-2 dlm_controld[1616]: 91773 fence result 1 pid 508564 result -1 term signal 6
Feb 13 11:08:49 DB-2 dlm_controld[1616]: 91773 fence status 1 receive -1 from 2 walltime 1613214529 local 91773
Feb 13 11:08:49 DB-2 dlm_controld[1616]: 91773 fence request 1 pid 508603 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:50 DB-2 dlm_controld[1616]: 91774 fence result 1 pid 508603 result -1 term signal 6
Feb 13 11:08:50 DB-2 dlm_controld[1616]: 91774 fence status 1 receive -1 from 2 walltime 1613214530 local 91774
Feb 13 11:08:50 DB-2 dlm_controld[1616]: 91774 fence request 1 pid 508607 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:52 DB-2 dlm_controld[1616]: 91775 fence result 1 pid 508607 result -1 term signal 6
Feb 13 11:08:52 DB-2 dlm_controld[1616]: 91775 fence status 1 receive -1 from 2 walltime 1613214532 local 91775
Feb 13 11:08:52 DB-2 dlm_controld[1616]: 91775 fence request 1 pid 508609 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:52 DB-2 dlm_controld[1616]: 91776 fence result 1 pid 508609 result -1 term signal 6
Feb 13 11:08:52 DB-2 dlm_controld[1616]: 91776 fence status 1 receive -1 from 2 walltime 1613214532 local 91776
Feb 13 11:08:52 DB-2 dlm_controld[1616]: 91776 fence request 1 pid 508611 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:53 DB-2 dlm_controld[1616]: 91777 fence result 1 pid 508611 result -1 term signal 6
Feb 13 11:08:54 DB-2 dlm_controld[1616]: 91777 fence status 1 receive -1 from 2 walltime 1613214533 local 91777
Feb 13 11:08:54 DB-2 dlm_controld[1616]: 91777 fence request 1 pid 508615 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:55 DB-2 dlm_controld[1616]: 91778 fence result 1 pid 508615 result -1 term signal 6
Feb 13 11:08:55 DB-2 dlm_controld[1616]: 91778 fence status 1 receive -1 from 2 walltime 1613214535 local 91778
Feb 13 11:08:55 DB-2 dlm_controld[1616]: 91778 fence request 1 pid 508622 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:55 DB-2 pacemaker-execd[2448]:  notice: Giving up on postgresdb stop (rc=196): timeout (elapsed=59987ms, remaining=13ms)
Feb 13 11:08:55 DB-2 pacemaker-controld[2451]:  error: Result of stop operation for postgresdb on node2: Timed Out
Feb 13 11:08:55 DB-2 pacemaker-controld[2451]:  notice: Transition 109 aborted by operation postgresdb_stop_0 'modify' on node2: Event failed
Feb 13 11:08:55 DB-2 pacemaker-controld[2451]:  notice: Transition 109 action 8 (postgresdb_stop_0 on node2): expected 'ok' but got 'OCF_TIMEOUT'
Feb 13 11:08:55 DB-2 pacemaker-attrd[2449]:  notice: Setting fail-count-postgresdb#stop_0[node2]: (unset) -> INFINITY
Feb 13 11:08:55 DB-2 pacemaker-controld[2451]:  notice: Transition 109 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-warn-14.bz2): Complete
Feb 13 11:08:55 DB-2 pacemaker-attrd[2449]:  notice: Setting last-failure-postgresdb#stop_0[node2]: (unset) -> 1613214535
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  warning: Unexpected result (OCF_TIMEOUT) was recorded for stop of postgresdb on node2 at Feb 13 11:08:55 2021
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  warning: Unexpected result (OCF_TIMEOUT) was recorded for stop of postgresdb on node2 at Feb 13 11:08:55 2021
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  warning: Cluster node node2 will be fenced: postgresdb failed there
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  warning: Forcing postgresdb away from node2 after 1000000 failures (max=1000000)
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice: Cannot pair postgres_db_lv:0 with instance of locking-clone
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice: Cannot pair postgres_db_lv:1 with instance of locking-clone
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  warning: Scheduling Node node2 for STONITH
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice: Stop of failed resource postgresdb is implicit after node2 is fenced
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Fence (reboot) node2 'postgresdb failed there'
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       lvmlockd:0     (          node2 )   due to node availability
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       dlm:0          (          node2 )   due to node availability
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       scsi           (          node2 )   due to node availability
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       postgres_db_lv:0   (          node2 )   due to node availability
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       postgresdbfs:0     (          node2 )   due to node availability
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       ClusterIP      (          node2 )   due to node availability
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       postgresdb         (          node2 )   due to node availability
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  warning: Calculated transition 110 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-15.bz2
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  warning: Unexpected result (OCF_TIMEOUT) was recorded for stop of postgresdb on node2 at Feb 13 11:08:55 2021
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  warning: Unexpected result (OCF_TIMEOUT) was recorded for stop of postgresdb on node2 at Feb 13 11:08:55 2021
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  warning: Cluster node node2 will be fenced: postgresdb failed there
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  warning: Forcing postgresdb away from node2 after 1000000 failures (max=1000000)
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice: Cannot pair postgres_db_lv:0 with instance of locking-clone
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice: Cannot pair postgres_db_lv:1 with instance of locking-clone
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  warning: Scheduling Node node2 for STONITH
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice: Stop of failed resource postgresdb is implicit after node2 is fenced
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Fence (reboot) node2 'postgresdb failed there'
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       lvmlockd:0     (          node2 )   due to node availability
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       dlm:0          (          node2 )   due to node availability
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       scsi           (          node2 )   due to node availability
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       postgres_db_lv:0   (          node2 )   due to node availability
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       postgresdbfs:0     (          node2 )   due to node availability
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       ClusterIP      (          node2 )   due to node availability
Feb 13 11:08:56 DB-2 pacemaker-schedulerd[2450]:  warning: Calculated transition 111 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-16.bz2
Feb 13 11:08:56 DB-2 pacemaker-controld[2451]:  notice: Requesting fencing (reboot) of node node2
Feb 13 11:08:56 DB-2 pacemaker-fenced[2447]:  notice: Client pacemaker-controld.2451.cae34137 wants to fence (reboot) 'node2' with device '(any)'
Feb 13 11:08:56 DB-2 pacemaker-fenced[2447]:  notice: Requesting peer fencing (reboot) targeting node2
Feb 13 11:08:56 DB-2 pacemaker-fenced[2447]:  notice: scsi is eligible to fence (reboot) node2: static-list
Feb 13 11:08:56 DB-2 pacemaker-fenced[2447]:  notice: Requesting that node2 perform 'reboot' action targeting node2
Feb 13 11:08:56 DB-2 pacemaker-fenced[2447]:  notice: scsi is eligible to fence (reboot) node2: static-list
Feb 13 11:08:56 DB-2 pacemaker-fenced[2447]:  warning: Agent 'fence_scsi' does not advertise support for 'reboot', performing 'off' action instead
Feb 13 11:08:56 DB-2 dlm_controld[1616]: 91779 fence result 1 pid 508622 result -1 term signal 6
Feb 13 11:08:56 DB-2 /fence_scsi: Failed: keys cannot be same. You can not fence yourself.
Feb 13 11:08:56 DB-2 dlm_controld[1616]: 91779 fence status 1 receive -1 from 2 walltime 1613214536 local 91779
Feb 13 11:08:56 DB-2 /fence_scsi: Please use '-h' for usage
Feb 13 11:08:56 DB-2 dlm_controld[1616]: 91779 fence request 1 pid 508678 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:56 DB-2 pacemaker-fenced[2447]:  notice: fence_scsi_off_1[508624] error output [ 2021-02-13 11:08:56,082 ERROR: Failed: keys cannot be same. You can not fence yourself. ]
Feb 13 11:08:56 DB-2 pacemaker-fenced[2447]:  notice: fence_scsi_off_1[508624] error output [  ]
Feb 13 11:08:56 DB-2 pacemaker-fenced[2447]:  notice: fence_scsi_off_1[508624] error output [ 2021-02-13 11:08:56,083 ERROR: Please use '-h' for usage ]
Feb 13 11:08:56 DB-2 pacemaker-fenced[2447]:  notice: fence_scsi_off_1[508624] error output [  ]
Feb 13 11:08:56 DB-2 pacemaker-fenced[2447]:  warning: fence_scsi[508624] stderr: [ 2021-02-13 11:08:56,082 ERROR: Failed: keys cannot be same. You can not fence yourself. ]
Feb 13 11:08:56 DB-2 pacemaker-fenced[2447]:  warning: fence_scsi[508624] stderr: [  ]
Feb 13 11:08:56 DB-2 pacemaker-fenced[2447]:  warning: fence_scsi[508624] stderr: [ 2021-02-13 11:08:56,083 ERROR: Please use '-h' for usage ]
Feb 13 11:08:56 DB-2 pacemaker-fenced[2447]:  warning: fence_scsi[508624] stderr: [  ]
Feb 13 11:08:57 DB-2 dlm_controld[1616]: 91780 fence result 1 pid 508678 result -1 term signal 6
Feb 13 11:08:57 DB-2 /fence_scsi: Failed: keys cannot be same. You can not fence yourself.
Feb 13 11:08:57 DB-2 /fence_scsi: Please use '-h' for usage
Feb 13 11:08:57 DB-2 dlm_controld[1616]: 91780 fence status 1 receive -1 from 2 walltime 1613214537 local 91780
Feb 13 11:08:57 DB-2 dlm_controld[1616]: 91780 fence request 1 pid 508701 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  notice: fence_scsi_off_2[508688] error output [ 2021-02-13 11:08:57,157 ERROR: Failed: keys cannot be same. You can not fence yourself. ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  notice: fence_scsi_off_2[508688] error output [  ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  notice: fence_scsi_off_2[508688] error output [ 2021-02-13 11:08:57,157 ERROR: Please use '-h' for usage ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  notice: fence_scsi_off_2[508688] error output [  ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  warning: fence_scsi[508688] stderr: [ 2021-02-13 11:08:57,157 ERROR: Failed: keys cannot be same. You can not fence yourself. ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  warning: fence_scsi[508688] stderr: [  ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  warning: fence_scsi[508688] stderr: [ 2021-02-13 11:08:57,157 ERROR: Please use '-h' for usage ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  warning: fence_scsi[508688] stderr: [  ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  error: Operation 'reboot' [508688] (call 7 from pacemaker-controld.2451) for host 'node2' with device 'scsi' returned: -201 (Generic Pacemaker error)
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  error: Operation 'reboot' targeting node2 on node2 for pacemaker-controld.2451 at node2.a04058ae: Generic Pacemaker error
Feb 13 11:08:57 DB-2 pacemaker-controld[2451]:  notice: Stonith operation 7/7:111:0:ef7f482f-a1f1-4769-82f6-641068aeca54: Generic Pacemaker error (-201)
Feb 13 11:08:57 DB-2 pacemaker-controld[2451]:  notice: Stonith operation 7 for node2 failed (Generic Pacemaker error): aborting transition.
Feb 13 11:08:57 DB-2 pacemaker-controld[2451]:  notice: Transition 111 aborted: Stonith failed
Feb 13 11:08:57 DB-2 pacemaker-controld[2451]:  notice: Peer node2 was not terminated (reboot) by node2 on behalf of pacemaker-controld.2451: Generic Pacemaker error
Feb 13 11:08:57 DB-2 pacemaker-controld[2451]:  notice: Transition 111 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=16, Source=/var/lib/pacemaker/pengine/pe-warn-16.bz2): Complete
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  warning: Unexpected result (OCF_TIMEOUT) was recorded for stop of postgresdb on node2 at Feb 13 11:08:55 2021
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  warning: Unexpected result (OCF_TIMEOUT) was recorded for stop of postgresdb on node2 at Feb 13 11:08:55 2021
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  warning: Cluster node node2 will be fenced: postgresdb failed there
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  warning: Forcing postgresdb away from node2 after 1000000 failures (max=1000000)
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  notice: Cannot pair postgres_db_lv:0 with instance of locking-clone
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  notice: Cannot pair postgres_db_lv:1 with instance of locking-clone
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  warning: Scheduling Node node2 for STONITH
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  notice: Stop of failed resource postgresdb is implicit after node2 is fenced
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  notice:  * Fence (reboot) node2 'postgresdb failed there'
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       lvmlockd:0     (          node2 )   due to node availability
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       dlm:0          (          node2 )   due to node availability
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       scsi           (          node2 )   due to node availability
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       postgres_db_lv:0   (          node2 )   due to node availability
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       postgresdbfs:0     (          node2 )   due to node availability
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       ClusterIP      (          node2 )   due to node availability
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  notice:  * Stop       postgresdb         (          node2 )   due to node availability
Feb 13 11:08:57 DB-2 pacemaker-schedulerd[2450]:  warning: Calculated transition 112 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-16.bz2
Feb 13 11:08:57 DB-2 pacemaker-controld[2451]:  notice: Requesting fencing (reboot) of node node2
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  notice: Client pacemaker-controld.2451.cae34137 wants to fence (reboot) 'node2' with device '(any)'
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  notice: Requesting peer fencing (reboot) targeting node2
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  notice: scsi is eligible to fence (reboot) node2: static-list
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  notice: Requesting that node2 perform 'reboot' action targeting node2
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  notice: scsi is eligible to fence (reboot) node2: static-list
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  warning: Agent 'fence_scsi' does not advertise support for 'reboot', performing 'off' action instead
Feb 13 11:08:57 DB-2 /fence_scsi: Failed: keys cannot be same. You can not fence yourself.
Feb 13 11:08:57 DB-2 /fence_scsi: Please use '-h' for usage
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  notice: fence_scsi_off_1[508712] error output [ 2021-02-13 11:08:57,292 ERROR: Failed: keys cannot be same. You can not fence yourself. ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  notice: fence_scsi_off_1[508712] error output [  ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  notice: fence_scsi_off_1[508712] error output [ 2021-02-13 11:08:57,293 ERROR: Please use '-h' for usage ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  notice: fence_scsi_off_1[508712] error output [  ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  warning: fence_scsi[508712] stderr: [ 2021-02-13 11:08:57,292 ERROR: Failed: keys cannot be same. You can not fence yourself. ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  warning: fence_scsi[508712] stderr: [  ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  warning: fence_scsi[508712] stderr: [ 2021-02-13 11:08:57,293 ERROR: Please use '-h' for usage ]
Feb 13 11:08:57 DB-2 pacemaker-fenced[2447]:  warning: fence_scsi[508712] stderr: [  ]
Feb 13 11:08:58 DB-2 dlm_controld[1616]: 91781 fence result 1 pid 508701 result -1 term signal 6
Feb 13 11:08:58 DB-2 dlm_controld[1616]: 91781 fence status 1 receive -1 from 2 walltime 1613214538 local 91781
Feb 13 11:08:58 DB-2 dlm_controld[1616]: 91781 fence request 1 pid 508733 nodedown time 1613214414 fence_all dlm_stonith
Feb 13 11:08:58 DB-2 /fence_scsi: Failed: keys cannot be same. You can not fence yourself.
Feb 13 11:08:58 DB-2 /fence_scsi: Please use '-h' for usage



More information about the Users mailing list