<div dir="ltr"><div><br></div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">@<span class="gmail-qu" tabindex="-1"><span name="Ken Gaillot" class="gmail-gD">Ken Gaillot, </span><span name="Ken Gaillot" class="gmail-gD">Thanks for sharing your inputs on the possible behavior of the cluster. <br></span></span></div><div dir="ltr"><span class="gmail-qu" tabindex="-1"><span name="Ken Gaillot" class="gmail-gD">We have reconfirmed that dlm on a healthy node was waiting for fencing of faulty node and shared storage access on the healthy node was blocked during this process. <br></span></span></div><div dir="ltr"><span class="gmail-qu" tabindex="-1"><span name="Ken Gaillot" class="gmail-gD"></span><span class="gmail-go"></span></span><div>Kindly let me know whether this is the natural behavior or is it a result of some misconfiguration. <br></div><div>As asked by I am sharing configuration information as an attachment to this mail. <br></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Feb 19, 2021 at 11:28 PM Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, 2021-02-19 at 07:48 +0530, shivraj dongawe wrote:<br>
> Any update on this . <br>
> Is there any issue in the configuration that we are using ?<br>
> <br>
> On Mon, Feb 15, 2021, 14:40 shivraj dongawe <<a href="mailto:shivraj198@gmail.com" target="_blank">shivraj198@gmail.com</a>><br>
> wrote:<br>
> > Kindly read "fencing is done using fence_scsi" from the previous<br>
> > message as "fencing is configured". <br>
> > <br>
> > As per the error messages we have analyzed node2 initiated fencing<br>
> > of node1 as many processes of node1 related to cluster have been<br>
> > killed by oom killer and node1 marked as down. <br>
> > Now many resources of node2 have waited for fencing of node1, as<br>
> > seen from following messages of syslog of node2: <br>
> > dlm_controld[1616]: 91659 lvm_postgres_db_vg wait for fencing<br>
> > dlm_controld[1616]: 91659 lvm_global wait for fencing<br>
> > <br>
> > These were messages when postgresql-12 service was being started on<br>
> > node2. <br>
> > As postgresql service is dependent on these services(dlm,lvmlockd<br>
> > and gfs2), it has not started in time on node2. <br>
> > And node2 fenced itself after declaring that services can not be<br>
> > started on it. <br>
> > <br>
> > On Mon, Feb 15, 2021 at 9:00 AM Ulrich Windl <<br>
> > <a href="mailto:Ulrich.Windl@rz.uni-regensburg.de" target="_blank">Ulrich.Windl@rz.uni-regensburg.de</a>> wrote:<br>
> > > >>> shivraj dongawe <<a href="mailto:shivraj198@gmail.com" target="_blank">shivraj198@gmail.com</a>> schrieb am 15.02.2021<br>
> > > um 08:27 in<br>
> > > Nachricht<br>
> > > <<br>
> > > CALpaHO_6LsYM=t76CifsRkFeLYDKQc+hY3kz7PRKp7b4se=-<a href="mailto:Aw@mail.gmail.com" target="_blank">Aw@mail.gmail.com</a><br>
> > > >:<br>
> > > > Fencing is done using fence_scsi.<br>
> > > > Config details are as follows:<br>
> > > > Resource: scsi (class=stonith type=fence_scsi)<br>
> > > > Attributes: devices=/dev/mapper/mpatha pcmk_host_list="node1<br>
> > > node2"<br>
> > > > pcmk_monitor_action=metadata pcmk_reboot_action=off<br>
> > > > Meta Attrs: provides=unfencing<br>
> > > > Operations: monitor interval=60s (scsi-monitor-interval-60s)<br>
> > > > <br>
> > > > On Mon, Feb 15, 2021 at 7:17 AM Ulrich Windl <<br>
> > > > <a href="mailto:Ulrich.Windl@rz.uni-regensburg.de" target="_blank">Ulrich.Windl@rz.uni-regensburg.de</a>> wrote:<br>
> > > > <br>
> > > >> >>> shivraj dongawe <<a href="mailto:shivraj198@gmail.com" target="_blank">shivraj198@gmail.com</a>> schrieb am<br>
> > > 14.02.2021 um 12:03<br>
> > > >> in<br>
> > > >> Nachricht<br>
> > > >> <<br>
> > > <a href="mailto:CALpaHO--3ERfwST70mBL-Wm9g6yH3YtD-wDA1r_CKnbrsxu4Sg@mail.gmail.com" target="_blank">CALpaHO--3ERfwST70mBL-Wm9g6yH3YtD-wDA1r_CKnbrsxu4Sg@mail.gmail.com</a><br>
> > > >:<br>
> > > >> > We are running a two node cluster on Ubuntu 20.04 LTS.<br>
> > > Cluster related<br>
> > > >> > package version details are as<br>
> > > >> > follows: pacemaker/focal-updates,focal-security 2.0.3-<br>
> > > 3ubuntu4.1 amd64<br>
> > > >> > pacemaker/focal 2.0.3-3ubuntu3 amd64<br>
> > > >> > corosync/focal 3.0.3-2ubuntu2 amd64<br>
> > > >> > pcs/focal 0.10.4-3 all<br>
> > > >> > fence-agents/focal 4.5.2-1 amd64<br>
> > > >> > gfs2-utils/focal 3.2.0-3 amd64<br>
> > > >> > dlm-controld/focal 4.0.9-1build1 amd64<br>
> > > >> > lvm2-lockd/focal 2.03.07-1ubuntu1 amd64<br>
> > > >> ><br>
> > > >> > Cluster configuration details:<br>
> > > >> > 1. Cluster is having a shared storage mounted through gfs2<br>
> > > filesystem<br>
> > > >> with<br>
> > > >> > the help of dlm and lvmlockd.<br>
> > > >> > 2. Corosync is configured to use knet for transport.<br>
> > > >> > 3. Fencing is configured using fence_scsi on the shared<br>
> > > storage which is<br>
> > > >> > being used for gfs2 filesystem<br>
> > > >> > 4. Two main resources configured are cluster/virtual ip and<br>
> > > >> postgresql-12,<br>
> > > >> > postgresql-12 is configured as a systemd resource.<br>
> > > >> > We had done failover testing(rebooting/shutting down of a<br>
> > > node, link<br>
> > > >> > failure) of the cluster and had observed that resources were<br>
> > > getting<br>
> > > >> > migrated properly on the active node.<br>
> > > >> ><br>
> > > >> > Recently we came across an issue which has occurred<br>
> > > repeatedly in span of<br>
> > > >> > two days.<br>
> > > >> > Details are below:<br>
> > > >> > 1. Out of memory killer is getting invoked on active node<br>
> > > and it starts<br>
> > > >> > killing processes.<br>
> > > >> > Sample is as follows:<br>
> > > >> > postgres invoked oom-killer:<br>
> > > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE),<br>
> > > >> > order=0, oom_score_adj=0<br>
> > > >> > 2. At one instance it started with killing of pacemaker and<br>
> > > on another<br>
> > > >> with<br>
> > > >> > postgresql. It does not stop with the killing of a single<br>
> > > process it goes<br>
> > > >> > on killing others(more concerning is killing of cluster<br>
> > > related<br>
> > > >> processes)<br>
> > > >> > as well. We have observed that swap space on that node is 2<br>
> > > GB against<br>
> > > >> RAM<br>
> > > >> > of 96 GB and are in the process of increasing swap space to<br>
> > > see if this<br>
> > > >> > resolves this issue. Postgres is configured with<br>
> > > shared_buffers value of<br>
> > > >> 32<br>
> > > >> > GB(which is way less than 96 GB).<br>
> > > >> > We are not yet sure which process is eating up that much<br>
> > > memory suddenly.<br>
> > > >> > 3. As a result of killing processes on node1, node2 is<br>
> > > trying to fence<br>
> > > >> > node1 and thereby initiating stopping of cluster resources<br>
> > > on node1.<br>
> > > >><br>
> > > >> How is fencing being done?<br>
> > > >><br>
> > > >> > 4. At this point we go in a stage where it is assumed that<br>
> > > node1 is down<br>
> > > >> > and application resources, cluster IP and postgresql are<br>
> > > being started on<br>
> > > >> > node2.<br>
> > > <br>
> > > This is why I was asking: Is your fencing successful ("assumed<br>
> > > that node1 is down<br>
> > > "), or isn't it?<br>
> > > <br>
> > > >> > 5. Postgresql on node 2 fails to start in 60 sec(start<br>
> > > operation timeout)<br>
> > > >> > and is declared as failed. During the start operation of<br>
> > > postgres, we<br>
> > > >> have<br>
> > > >> > found many messages related to failure of fencing and other<br>
> > > resources<br>
> > > >> such<br>
> > > >> > as dlm and vg waiting for fencing to complete.<br>
<br>
It does seem that DLM is where the problem occurs.<br>
<br>
Note that fencing is scheduled in two separate ways, once by DLM and<br>
once by the cluster itself, when node1 is lost.<br>
<br>
The fencing scheduled by the cluster completes successfully:<br>
<br>
Feb 13 11:07:56 DB-2 pacemaker-controld[2451]: notice: Peer node1 was<br>
terminated (reboot) by node2 on behalf of pacemaker-controld.2451: OK<br>
<br>
but DLM just attempts fencing over and over, eventually causing<br>
resource timeouts. Those timeouts cause the cluster to schedule<br>
resource recovery (stop+start), but the stops timeout for the same<br>
reason, and it is those stop timeouts that cause node2 to be fenced.<br>
<br>
I'm not familiar enough with DLM to know what might keep it from being<br>
able to contact Pacemaker for fencing.<br>
<br>
Can you attach your configuration as well (with any sensitive info<br>
removed)? I assume you've created an ocf:pacemaker:controld clone, and<br>
that the other resources are layered on top of that with colocation and<br>
ordering constraints.<br>
<br>
> > > >> > Details of syslog messages of node2 during this event are<br>
> > > attached in<br>
> > > >> file.<br>
> > > >> > 6. After this point we are in a state where node1 and node2<br>
> > > both go in<br>
> > > >> > fenced state and resources are unrecoverable(all resources<br>
> > > on both<br>
> > > >> nodes).<br>
> > > >> ><br>
> > > >> > Now my question is out of memory issue of node1 can be taken<br>
> > > care by<br>
> > > >> > increasing swap and finding out the process responsible for<br>
> > > such huge<br>
> > > >> > memory usage and taking necessary actions to minimize that<br>
> > > memory usage,<br>
> > > >> > but the other issue that remains unclear is why cluster is<br>
> > > not shifted to<br>
> > > >> > node2 cleanly and become unrecoverable.<br>
> > > >><br>
-- <br>
Ken Gaillot <<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>><br>
<br>
_______________________________________________<br>
Manage your subscription:<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>
</blockquote></div>