[ClusterLabs] Antw: Re: Antw: [EXT] Non recoverable state of cluster after exit of one node due to killing of processes by oom killer

Mon Feb 15 04:10:11 EST 2021

Kindly read "fencing is done using fence_scsi" from the previous message as
"fencing is configured".

As per the error messages we have analyzed node2 initiated fencing of node1
as many processes of node1 related to cluster have been killed by oom
killer and node1 marked as down.
Now many resources of node2 have waited for fencing of node1, as seen from
following messages of syslog of node2:

dlm_controld[1616]: 91659 lvm_postgres_db_vg wait for fencing
dlm_controld[1616]: 91659 lvm_global wait for fencing

These were messages when postgresql-12 service was being started on node2.

As postgresql service is dependent on these services(dlm,lvmlockd and
gfs2), it has not started in time on node2.

And node2 fenced itself after declaring that services can not be started on it.

On Mon, Feb 15, 2021 at 9:00 AM Ulrich Windl <
Ulrich.Windl at rz.uni-regensburg.de> wrote:

> >>> shivraj dongawe <shivraj198 at gmail.com> schrieb am 15.02.2021 um 08:27
> in
> Nachricht
> <CALpaHO_6LsYM=t76CifsRkFeLYDKQc+hY3kz7PRKp7b4se=-Aw at mail.gmail.com>:
> > Fencing is done using fence_scsi.
> > Config details are as follows:
> >  Resource: scsi (class=stonith type=fence_scsi)
> >   Attributes: devices=/dev/mapper/mpatha pcmk_host_list="node1 node2"
> > pcmk_monitor_action=metadata pcmk_reboot_action=off
> >   Meta Attrs: provides=unfencing
> >   Operations: monitor interval=60s (scsi-monitor-interval-60s)
> >
> > On Mon, Feb 15, 2021 at 7:17 AM Ulrich Windl <
> > Ulrich.Windl at rz.uni-regensburg.de> wrote:
> >
> >> >>> shivraj dongawe <shivraj198 at gmail.com> schrieb am 14.02.2021 um
> 12:03
> >> in
> >> Nachricht
> >> <CALpaHO--3ERfwST70mBL-Wm9g6yH3YtD-wDA1r_CKnbrsxu4Sg at mail.gmail.com>:
> >> > We are running a two node cluster on Ubuntu 20.04 LTS. Cluster related
> >> > package version details are as
> >> > follows: pacemaker/focal-updates,focal-security 2.0.3-3ubuntu4.1 amd64
> >> > pacemaker/focal 2.0.3-3ubuntu3 amd64
> >> > corosync/focal 3.0.3-2ubuntu2 amd64
> >> > pcs/focal 0.10.4-3 all
> >> > fence-agents/focal 4.5.2-1 amd64
> >> > gfs2-utils/focal 3.2.0-3 amd64
> >> > dlm-controld/focal 4.0.9-1build1 amd64
> >> > lvm2-lockd/focal 2.03.07-1ubuntu1 amd64
> >> >
> >> > Cluster configuration details:
> >> > 1. Cluster is having a shared storage mounted through gfs2 filesystem
> >> with
> >> > the help of dlm and lvmlockd.
> >> > 2. Corosync is configured to use knet for transport.
> >> > 3. Fencing is configured using fence_scsi on the shared storage which
> is
> >> > being used for gfs2 filesystem
> >> > 4. Two main resources configured are cluster/virtual ip and
> >> postgresql-12,
> >> > postgresql-12 is configured as a systemd resource.
> >> > We had done failover testing(rebooting/shutting down of a node, link
> >> > failure) of the cluster and had observed that resources were getting
> >> > migrated properly on the active node.
> >> >
> >> > Recently we came across an issue which has occurred repeatedly in
> span of
> >> > two days.
> >> > Details are below:
> >> > 1. Out of memory killer is getting invoked on active node and it
> starts
> >> > killing processes.
> >> > Sample is as follows:
> >> > postgres invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE),
> >> > order=0, oom_score_adj=0
> >> > 2. At one instance it started with killing of pacemaker and on another
> >> with
> >> > postgresql. It does not stop with the killing of a single process it
> goes
> >> > on killing others(more concerning is killing of cluster related
> >> processes)
> >> > as well. We have observed that swap space on that node is 2 GB against
> >> RAM
> >> > of 96 GB and are in the process of increasing swap space to see if
> this
> >> > resolves this issue. Postgres is configured with shared_buffers value
> of
> >> 32
> >> > GB(which is way less than 96 GB).
> >> > We are not yet sure which process is eating up that much memory
> suddenly.
> >> > 3. As a result of killing processes on node1, node2 is trying to fence
> >> > node1 and thereby initiating stopping of cluster resources on node1.
> >>
> >> How is fencing being done?
> >>
> >> > 4. At this point we go in a stage where it is assumed that node1 is
> down
> >> > and application resources, cluster IP and postgresql are being
> started on
> >> > node2.
>
> This is why I was asking: Is your fencing successful ("assumed that node1
> is down
> "), or isn't it?
>
> >> > 5. Postgresql on node 2 fails to start in 60 sec(start operation
> timeout)
> >> > and is declared as failed. During the start operation of postgres, we
> >> have
> >> > found many messages related to failure of fencing and other resources
> >> such
> >> > as dlm and vg waiting for fencing to complete.
> >> > Details of syslog messages of node2 during this event are attached in
> >> file.
> >> > 6. After this point we are in a state where node1 and node2 both go in
> >> > fenced state and resources are unrecoverable(all resources on both
> >> nodes).
> >> >
> >> > Now my question is out of memory issue of node1 can be taken care by
> >> > increasing swap and finding out the process responsible for such huge
> >> > memory usage and taking necessary actions to minimize that memory
> usage,
> >> > but the other issue that remains unclear is why cluster is not
> shifted to
> >> > node2 cleanly and become unrecoverable.
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> ClusterLabs home: https://www.clusterlabs.org/
> >>
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210215/de461b3c/attachment-0001.htm>