[ClusterLabs] Antw: Re: Antw: [EXT] Non recoverable state of cluster after exit of one node due to killing of processes by oom killer

Fri Feb 26 01:04:39 EST 2021

Thank you for your valuable feedback.

I will surely check that ordering part.

On Thu, Feb 25, 2021 at 5:21 PM Ken Gaillot <kgaillot at redhat.com> wrote:

> On Thu, 2021-02-25 at 06:34 +0000, shivraj dongawe wrote:
> >
> > @Ken Gaillot, Thanks for sharing your inputs on the possible behavior
> > of the cluster.
> > We have reconfirmed that dlm on a healthy node was waiting for
> > fencing of faulty node and shared storage access on the healthy node
> > was blocked during this process.
> > Kindly let me know whether this is the natural behavior or is it a
> > result of some misconfiguration.
>
> Your configuration looks perfect to me, except for one thing: I believe
> lvmlockd should be *after* dlm_controld in the group. I don't know if
> that's causing the problem, but it's worth trying.
>
> It is expected that DLM will wait for fencing, but it should be happy
> after fencing completes, so something is not right.
>
> > As asked by I am sharing configuration information as an attachment
> > to this mail.
> >
> >
> > On Fri, Feb 19, 2021 at 11:28 PM Ken Gaillot <kgaillot at redhat.com>
> > wrote:
> > > On Fri, 2021-02-19 at 07:48 +0530, shivraj dongawe wrote:
> > > > Any update on this .
> > > > Is there any issue in the configuration that we are using ?
> > > >
> > > > On Mon, Feb 15, 2021, 14:40 shivraj dongawe <shivraj198 at gmail.com
> > > >
> > > > wrote:
> > > > > Kindly read "fencing is done using fence_scsi" from the
> > > previous
> > > > > message as "fencing is configured".
> > > > >
> > > > > As per the error messages we have analyzed node2 initiated
> > > fencing
> > > > > of node1 as many processes of node1 related to cluster have
> > > been
> > > > > killed by oom killer and node1 marked as down.
> > > > > Now many resources of node2 have waited for fencing of node1,
> > > as
> > > > > seen from following messages of syslog of node2:
> > > > > dlm_controld[1616]: 91659 lvm_postgres_db_vg wait for fencing
> > > > > dlm_controld[1616]: 91659 lvm_global wait for fencing
> > > > >
> > > > > These were messages when postgresql-12 service was being
> > > started on
> > > > > node2.
> > > > > As postgresql service is dependent on these
> > > services(dlm,lvmlockd
> > > > > and gfs2), it has not started in time on node2.
> > > > > And node2 fenced itself after declaring that services can not
> > > be
> > > > > started on it.
> > > > >
> > > > > On Mon, Feb 15, 2021 at 9:00 AM Ulrich Windl <
> > > > > Ulrich.Windl at rz.uni-regensburg.de> wrote:
> > > > > > >>> shivraj dongawe <shivraj198 at gmail.com> schrieb am
> > > 15.02.2021
> > > > > > um 08:27 in
> > > > > > Nachricht
> > > > > > <
> > > > > >
> > > CALpaHO_6LsYM=t76CifsRkFeLYDKQc+hY3kz7PRKp7b4se=-Aw at mail.gmail.com
> > > > > > >:
> > > > > > > Fencing is done using fence_scsi.
> > > > > > > Config details are as follows:
> > > > > > >  Resource: scsi (class=stonith type=fence_scsi)
> > > > > > >   Attributes: devices=/dev/mapper/mpatha
> > > pcmk_host_list="node1
> > > > > > node2"
> > > > > > > pcmk_monitor_action=metadata pcmk_reboot_action=off
> > > > > > >   Meta Attrs: provides=unfencing
> > > > > > >   Operations: monitor interval=60s (scsi-monitor-interval-
> > > 60s)
> > > > > > >
> > > > > > > On Mon, Feb 15, 2021 at 7:17 AM Ulrich Windl <
> > > > > > > Ulrich.Windl at rz.uni-regensburg.de> wrote:
> > > > > > >
> > > > > > >> >>> shivraj dongawe <shivraj198 at gmail.com> schrieb am
> > > > > > 14.02.2021 um 12:03
> > > > > > >> in
> > > > > > >> Nachricht
> > > > > > >> <
> > > > > >
> > > CALpaHO--3ERfwST70mBL-Wm9g6yH3YtD-wDA1r_CKnbrsxu4Sg at mail.gmail.com
> > > > > > >:
> > > > > > >> > We are running a two node cluster on Ubuntu 20.04 LTS.
> > > > > > Cluster related
> > > > > > >> > package version details are as
> > > > > > >> > follows: pacemaker/focal-updates,focal-security 2.0.3-
> > > > > > 3ubuntu4.1 amd64
> > > > > > >> > pacemaker/focal 2.0.3-3ubuntu3 amd64
> > > > > > >> > corosync/focal 3.0.3-2ubuntu2 amd64
> > > > > > >> > pcs/focal 0.10.4-3 all
> > > > > > >> > fence-agents/focal 4.5.2-1 amd64
> > > > > > >> > gfs2-utils/focal 3.2.0-3 amd64
> > > > > > >> > dlm-controld/focal 4.0.9-1build1 amd64
> > > > > > >> > lvm2-lockd/focal 2.03.07-1ubuntu1 amd64
> > > > > > >> >
> > > > > > >> > Cluster configuration details:
> > > > > > >> > 1. Cluster is having a shared storage mounted through
> > > gfs2
> > > > > > filesystem
> > > > > > >> with
> > > > > > >> > the help of dlm and lvmlockd.
> > > > > > >> > 2. Corosync is configured to use knet for transport.
> > > > > > >> > 3. Fencing is configured using fence_scsi on the shared
> > > > > > storage which is
> > > > > > >> > being used for gfs2 filesystem
> > > > > > >> > 4. Two main resources configured are cluster/virtual ip
> > > and
> > > > > > >> postgresql-12,
> > > > > > >> > postgresql-12 is configured as a systemd resource.
> > > > > > >> > We had done failover testing(rebooting/shutting down of
> > > a
> > > > > > node, link
> > > > > > >> > failure) of the cluster and had observed that resources
> > > were
> > > > > > getting
> > > > > > >> > migrated properly on the active node.
> > > > > > >> >
> > > > > > >> > Recently we came across an issue which has occurred
> > > > > > repeatedly in span of
> > > > > > >> > two days.
> > > > > > >> > Details are below:
> > > > > > >> > 1. Out of memory killer is getting invoked on active
> > > node
> > > > > > and it starts
> > > > > > >> > killing processes.
> > > > > > >> > Sample is as follows:
> > > > > > >> > postgres invoked oom-killer:
> > > > > > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE),
> > > > > > >> > order=0, oom_score_adj=0
> > > > > > >> > 2. At one instance it started with killing of pacemaker
> > > and
> > > > > > on another
> > > > > > >> with
> > > > > > >> > postgresql. It does not stop with the killing of a
> > > single
> > > > > > process it goes
> > > > > > >> > on killing others(more concerning is killing of cluster
> > > > > > related
> > > > > > >> processes)
> > > > > > >> > as well. We have observed that swap space on that node
> > > is 2
> > > > > > GB against
> > > > > > >> RAM
> > > > > > >> > of 96 GB and are in the process of increasing swap space
> > > to
> > > > > > see if this
> > > > > > >> > resolves this issue. Postgres is configured with
> > > > > > shared_buffers value of
> > > > > > >> 32
> > > > > > >> > GB(which is way less than 96 GB).
> > > > > > >> > We are not yet sure which process is eating up that much
> > > > > > memory suddenly.
> > > > > > >> > 3. As a result of killing processes on node1, node2 is
> > > > > > trying to fence
> > > > > > >> > node1 and thereby initiating stopping of cluster
> > > resources
> > > > > > on node1.
> > > > > > >>
> > > > > > >> How is fencing being done?
> > > > > > >>
> > > > > > >> > 4. At this point we go in a stage where it is assumed
> > > that
> > > > > > node1 is down
> > > > > > >> > and application resources, cluster IP and postgresql are
> > > > > > being started on
> > > > > > >> > node2.
> > > > > >
> > > > > > This is why I was asking: Is your fencing successful
> > > ("assumed
> > > > > > that node1 is down
> > > > > > "), or isn't it?
> > > > > >
> > > > > > >> > 5. Postgresql on node 2 fails to start in 60 sec(start
> > > > > > operation timeout)
> > > > > > >> > and is declared as failed. During the start operation of
> > > > > > postgres, we
> > > > > > >> have
> > > > > > >> > found many messages related to failure of fencing and
> > > other
> > > > > > resources
> > > > > > >> such
> > > > > > >> > as dlm and vg waiting for fencing to complete.
> > >
> > > It does seem that DLM is where the problem occurs.
> > >
> > > Note that fencing is scheduled in two separate ways, once by DLM
> > > and
> > > once by the cluster itself, when node1 is lost.
> > >
> > > The fencing scheduled by the cluster completes successfully:
> > >
> > > Feb 13 11:07:56 DB-2 pacemaker-controld[2451]:  notice: Peer node1
> > > was
> > > terminated (reboot) by node2 on behalf of pacemaker-controld.2451:
> > > OK
> > >
> > > but DLM just attempts fencing over and over, eventually causing
> > > resource timeouts. Those timeouts cause the cluster to schedule
> > > resource recovery (stop+start), but the stops timeout for the same
> > > reason, and it is those stop timeouts that cause node2 to be
> > > fenced.
> > >
> > > I'm not familiar enough with DLM to know what might keep it from
> > > being
> > > able to contact Pacemaker for fencing.
> > >
> > > Can you attach your configuration as well (with any sensitive info
> > > removed)? I assume you've created an ocf:pacemaker:controld clone,
> > > and
> > > that the other resources are layered on top of that with colocation
> > > and
> > > ordering constraints.
> > >
> > > > > > >> > Details of syslog messages of node2 during this event
> > > are
> > > > > > attached in
> > > > > > >> file.
> > > > > > >> > 6. After this point we are in a state where node1 and
> > > node2
> > > > > > both go in
> > > > > > >> > fenced state and resources are unrecoverable(all
> > > resources
> > > > > > on both
> > > > > > >> nodes).
> > > > > > >> >
> > > > > > >> > Now my question is out of memory issue of node1 can be
> > > taken
> > > > > > care by
> > > > > > >> > increasing swap and finding out the process responsible
> > > for
> > > > > > such huge
> > > > > > >> > memory usage and taking necessary actions to minimize
> > > that
> > > > > > memory usage,
> > > > > > >> > but the other issue that remains unclear is why cluster
> > > is
> > > > > > not shifted to
> > > > > > >> > node2 cleanly and become unrecoverable.
> > > > > > >>
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > ClusterLabs home: https://www.clusterlabs.org/
> --
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210226/9ee1cac9/attachment.htm>