[ClusterLabs] Antw: Re: Antw: [EXT] Non recoverable state of cluster after exit of one node due to killing of processes by oom killer

shivraj dongawe shivraj198 at gmail.com
Thu Feb 25 01:34:18 EST 2021


@Ken Gaillot, Thanks for sharing your inputs on the possible behavior of
the cluster.
We have reconfirmed that dlm on a healthy node was waiting for fencing of
faulty node and shared storage access on the healthy node was blocked
during this process.
Kindly let me know whether this is the natural behavior or is it a result
of some misconfiguration.
As asked by I am sharing configuration information as an attachment to this
mail.


On Fri, Feb 19, 2021 at 11:28 PM Ken Gaillot <kgaillot at redhat.com> wrote:

> On Fri, 2021-02-19 at 07:48 +0530, shivraj dongawe wrote:
> > Any update on this .
> > Is there any issue in the configuration that we are using ?
> >
> > On Mon, Feb 15, 2021, 14:40 shivraj dongawe <shivraj198 at gmail.com>
> > wrote:
> > > Kindly read "fencing is done using fence_scsi" from the previous
> > > message as "fencing is configured".
> > >
> > > As per the error messages we have analyzed node2 initiated fencing
> > > of node1 as many processes of node1 related to cluster have been
> > > killed by oom killer and node1 marked as down.
> > > Now many resources of node2 have waited for fencing of node1, as
> > > seen from following messages of syslog of node2:
> > > dlm_controld[1616]: 91659 lvm_postgres_db_vg wait for fencing
> > > dlm_controld[1616]: 91659 lvm_global wait for fencing
> > >
> > > These were messages when postgresql-12 service was being started on
> > > node2.
> > > As postgresql service is dependent on these services(dlm,lvmlockd
> > > and gfs2), it has not started in time on node2.
> > > And node2 fenced itself after declaring that services can not be
> > > started on it.
> > >
> > > On Mon, Feb 15, 2021 at 9:00 AM Ulrich Windl <
> > > Ulrich.Windl at rz.uni-regensburg.de> wrote:
> > > > >>> shivraj dongawe <shivraj198 at gmail.com> schrieb am 15.02.2021
> > > > um 08:27 in
> > > > Nachricht
> > > > <
> > > > CALpaHO_6LsYM=t76CifsRkFeLYDKQc+hY3kz7PRKp7b4se=-Aw at mail.gmail.com
> > > > >:
> > > > > Fencing is done using fence_scsi.
> > > > > Config details are as follows:
> > > > >  Resource: scsi (class=stonith type=fence_scsi)
> > > > >   Attributes: devices=/dev/mapper/mpatha pcmk_host_list="node1
> > > > node2"
> > > > > pcmk_monitor_action=metadata pcmk_reboot_action=off
> > > > >   Meta Attrs: provides=unfencing
> > > > >   Operations: monitor interval=60s (scsi-monitor-interval-60s)
> > > > >
> > > > > On Mon, Feb 15, 2021 at 7:17 AM Ulrich Windl <
> > > > > Ulrich.Windl at rz.uni-regensburg.de> wrote:
> > > > >
> > > > >> >>> shivraj dongawe <shivraj198 at gmail.com> schrieb am
> > > > 14.02.2021 um 12:03
> > > > >> in
> > > > >> Nachricht
> > > > >> <
> > > > CALpaHO--3ERfwST70mBL-Wm9g6yH3YtD-wDA1r_CKnbrsxu4Sg at mail.gmail.com
> > > > >:
> > > > >> > We are running a two node cluster on Ubuntu 20.04 LTS.
> > > > Cluster related
> > > > >> > package version details are as
> > > > >> > follows: pacemaker/focal-updates,focal-security 2.0.3-
> > > > 3ubuntu4.1 amd64
> > > > >> > pacemaker/focal 2.0.3-3ubuntu3 amd64
> > > > >> > corosync/focal 3.0.3-2ubuntu2 amd64
> > > > >> > pcs/focal 0.10.4-3 all
> > > > >> > fence-agents/focal 4.5.2-1 amd64
> > > > >> > gfs2-utils/focal 3.2.0-3 amd64
> > > > >> > dlm-controld/focal 4.0.9-1build1 amd64
> > > > >> > lvm2-lockd/focal 2.03.07-1ubuntu1 amd64
> > > > >> >
> > > > >> > Cluster configuration details:
> > > > >> > 1. Cluster is having a shared storage mounted through gfs2
> > > > filesystem
> > > > >> with
> > > > >> > the help of dlm and lvmlockd.
> > > > >> > 2. Corosync is configured to use knet for transport.
> > > > >> > 3. Fencing is configured using fence_scsi on the shared
> > > > storage which is
> > > > >> > being used for gfs2 filesystem
> > > > >> > 4. Two main resources configured are cluster/virtual ip and
> > > > >> postgresql-12,
> > > > >> > postgresql-12 is configured as a systemd resource.
> > > > >> > We had done failover testing(rebooting/shutting down of a
> > > > node, link
> > > > >> > failure) of the cluster and had observed that resources were
> > > > getting
> > > > >> > migrated properly on the active node.
> > > > >> >
> > > > >> > Recently we came across an issue which has occurred
> > > > repeatedly in span of
> > > > >> > two days.
> > > > >> > Details are below:
> > > > >> > 1. Out of memory killer is getting invoked on active node
> > > > and it starts
> > > > >> > killing processes.
> > > > >> > Sample is as follows:
> > > > >> > postgres invoked oom-killer:
> > > > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE),
> > > > >> > order=0, oom_score_adj=0
> > > > >> > 2. At one instance it started with killing of pacemaker and
> > > > on another
> > > > >> with
> > > > >> > postgresql. It does not stop with the killing of a single
> > > > process it goes
> > > > >> > on killing others(more concerning is killing of cluster
> > > > related
> > > > >> processes)
> > > > >> > as well. We have observed that swap space on that node is 2
> > > > GB against
> > > > >> RAM
> > > > >> > of 96 GB and are in the process of increasing swap space to
> > > > see if this
> > > > >> > resolves this issue. Postgres is configured with
> > > > shared_buffers value of
> > > > >> 32
> > > > >> > GB(which is way less than 96 GB).
> > > > >> > We are not yet sure which process is eating up that much
> > > > memory suddenly.
> > > > >> > 3. As a result of killing processes on node1, node2 is
> > > > trying to fence
> > > > >> > node1 and thereby initiating stopping of cluster resources
> > > > on node1.
> > > > >>
> > > > >> How is fencing being done?
> > > > >>
> > > > >> > 4. At this point we go in a stage where it is assumed that
> > > > node1 is down
> > > > >> > and application resources, cluster IP and postgresql are
> > > > being started on
> > > > >> > node2.
> > > >
> > > > This is why I was asking: Is your fencing successful ("assumed
> > > > that node1 is down
> > > > "), or isn't it?
> > > >
> > > > >> > 5. Postgresql on node 2 fails to start in 60 sec(start
> > > > operation timeout)
> > > > >> > and is declared as failed. During the start operation of
> > > > postgres, we
> > > > >> have
> > > > >> > found many messages related to failure of fencing and other
> > > > resources
> > > > >> such
> > > > >> > as dlm and vg waiting for fencing to complete.
>
> It does seem that DLM is where the problem occurs.
>
> Note that fencing is scheduled in two separate ways, once by DLM and
> once by the cluster itself, when node1 is lost.
>
> The fencing scheduled by the cluster completes successfully:
>
> Feb 13 11:07:56 DB-2 pacemaker-controld[2451]:  notice: Peer node1 was
> terminated (reboot) by node2 on behalf of pacemaker-controld.2451: OK
>
> but DLM just attempts fencing over and over, eventually causing
> resource timeouts. Those timeouts cause the cluster to schedule
> resource recovery (stop+start), but the stops timeout for the same
> reason, and it is those stop timeouts that cause node2 to be fenced.
>
> I'm not familiar enough with DLM to know what might keep it from being
> able to contact Pacemaker for fencing.
>
> Can you attach your configuration as well (with any sensitive info
> removed)? I assume you've created an ocf:pacemaker:controld clone, and
> that the other resources are layered on top of that with colocation and
> ordering constraints.
>
> > > > >> > Details of syslog messages of node2 during this event are
> > > > attached in
> > > > >> file.
> > > > >> > 6. After this point we are in a state where node1 and node2
> > > > both go in
> > > > >> > fenced state and resources are unrecoverable(all resources
> > > > on both
> > > > >> nodes).
> > > > >> >
> > > > >> > Now my question is out of memory issue of node1 can be taken
> > > > care by
> > > > >> > increasing swap and finding out the process responsible for
> > > > such huge
> > > > >> > memory usage and taking necessary actions to minimize that
> > > > memory usage,
> > > > >> > but the other issue that remains unclear is why cluster is
> > > > not shifted to
> > > > >> > node2 cleanly and become unrecoverable.
> > > > >>
> --
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210225/ea3ed253/attachment-0001.htm>
-------------- next part --------------
pcs config
Cluster Name: ubuntucluster
Corosync Nodes:
 node1 node2
Pacemaker Nodes:
 node1 node2

Resources:
 Clone: locking-clone
  Meta Attrs: interleave=true
  Group: locking
   Resource: lvmlockd (class=ocf provider=heartbeat type=lvmlockd)
    Operations: monitor interval=30s on-fail=fence (lvmlockd-monitor-interval-30s)
                start interval=0s timeout=90s (lvmlockd-start-interval-0s)
                stop interval=0s timeout=90s (lvmlockd-stop-interval-0s)
   Resource: dlm (class=ocf provider=pacemaker type=controld)
    Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
                start interval=0s timeout=90s (dlm-start-interval-0s)
                stop interval=0s timeout=100s (dlm-stop-interval-0s)
 Clone: postgres_db_vg-clone
  Meta Attrs: interleave=true
  Group: postgres_db_vg
   Resource: postgres_db_lv (class=ocf provider=heartbeat type=LVM-activate)
    Attributes: activation_mode=shared lvname=postgres_db_lv vg_access_mode=lvmlockd vgname=postgres_db_vg
    Operations: monitor interval=30s timeout=90s (postgres_db_lv-monitor-interval-30s)
                start interval=0s timeout=90s (postgres_db_lv-start-interval-0s)
                stop interval=0s timeout=90s (postgres_db_lv-stop-interval-0s)
   Resource: postgresdbfs (class=ocf provider=heartbeat type=Filesystem)
    Attributes: device=/dev/postgres_db_vg/postgres_db_lv directory=/postgres/db fstype=gfs2 options=noatime
    Operations: monitor interval=10s on-fail=fence (postgresdbfs-monitor-interval-10s)
                start interval=0s timeout=60s (postgresdbfs-start-interval-0s)
                stop interval=0s timeout=60s (postgresdbfs-stop-interval-0s)
 Group: postgresdbservice
  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: cidr_netmask=24 ip=<IP> nic=bond0
   Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
               start interval=0s timeout=20s (ClusterIP-start-interval-0s)
               stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
  Resource: postgresdb (class=systemd type=postgresql-12)
   Operations: monitor interval=4s on-fail=restart timeout=60s (postgresdb-monitor-interval-4s)
               monitor interval=3s on-fail=restart role=Master timeout=60s (postgresdb-monitor-interval-3s)
               start interval=0s on-fail=restart timeout=60s (postgresdb-start-interval-0s)
               stop interval=0s timeout=60s (postgresdb-stop-interval-0s)

Stonith Devices:
 Resource: scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/mapper/mpatha pcmk_host_list="node1 node2" pcmk_monitor_action=metadata pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-monitor-interval-60s)
Fencing Levels:

Location Constraints:
  Resource: postgresdbservice
    Enabled on:
      Node: node2 (score:INFINITY) (role:Started) (id:cli-prefer-postgresdbservice)
Ordering Constraints:
  start locking-clone then start postgres_db_vg-clone (kind:Mandatory) (id:order-locking-clone-postgres_db_vg-clone-mandatory)
Colocation Constraints:
  postgres_db_vg-clone with locking-clone (score:INFINITY) (id:colocation-postgres_db_vg-clone-locking-clone-INFINITY)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: ubuntucluster
 dc-version: 2.0.3-4b1f869f0f
 have-watchdog: false
 last-lrm-refresh: 1613797312
 no-quorum-policy: freeze
 stonith-enabled: true

Quorum:
  Options:



More information about the Users mailing list