[ClusterLabs] Antw: [EXT] Resources too_active (active on all nodes of the cluster, instead of only 1 node)

Wed Mar 23 10:38:44 EDT 2022

Hi!

With these messages it's really hard to say, because you omitted message
logged before the split brain had occurred.
If a resource was running on FILE-2 and FILE-1 recovered first, it will be DC
and it will start resources (even if those were running on FILE-2 before.
However normal resources are gone when a node reboots. Maybe your rersources
are special. Maybe the monitor is not correct.

We need more details.

Regards,
Ulrich

>>> "Balotra, Priyanka" <Priyanka.Balotra at Dell.com> schrieb am 23.03.2022 um
06:30
in Nachricht
<MW4PR19MB549588C581D01842B480B9F5EA189 at MW4PR19MB5495.namprd19.prod.outlook.com>

> Hi All,
> 
> We have a scenario on SLES 12 SP3 cluster.
> The scenario is explained as follows in the order of events:
> 
>   *   There is a 2‑node cluster (FILE‑1, FILE‑2)
>   *   The cluster and the resources were up and running fine initially .
>   *   Then fencing request from pacemaker got issued on both nodes 
> simultaneously
> 
> Logs from 1st node:
> 2022‑02‑22T03:26:36.737075+00:00 FILE‑1 corosync[12304]: [TOTEM ] Failed to

> receive the leave message. failed: 2
> .
> .
> 2022‑02‑22T03:26:36.977888+00:00 FILE‑1 pacemaker‑fenced[12331]: notice: 
> Requesting that FILE‑1 perform 'off' action targeting FILE‑2
> 
> Logs from 2nd node:
> 2022‑02‑22T03:26:36.738080+00:00 FILE‑2 corosync[4989]: [TOTEM ] Failed to 
> receive the leave message. failed: 1
> .
> .
> Feb 22 03:26:38 FILE‑2 pacemaker‑fenced [5015] (call_remote_stonith) notice:

> Requesting that FILE‑2 perform 'off' action targeting FILE‑1
> 
> 
>   *   When the nodes came up after unfencing, the DC got set after election
>   *   After that the resources which were expected to run on only one node 
> became active on both (all) nodes of the cluster.
> 
> 27290 2022‑02‑22T04:16:31.699186+00:00 FILE‑2 pacemaker‑schedulerd[5018]:
error: 
> Resource stonith‑sbd is active on 2 nodes (attempting recovery)
> 27291 2022‑02‑22T04:16:31.699397+00:00 FILE‑2 pacemaker‑schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for 
> more information
> 27292 2022‑02‑22T04:16:31.699590+00:00 FILE‑2 pacemaker‑schedulerd[5018]:
error: 
> Resource FILE_Filesystem is active on 2 nodes (attem pting recovery)
> 27293 2022‑02‑22T04:16:31.699731+00:00 FILE‑2 pacemaker‑schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for 
> more information
> 27294 2022‑02‑22T04:16:31.699878+00:00 FILE‑2 pacemaker‑schedulerd[5018]:
error: 
> Resource IP_Floating is active on 2 nodes (attemptin g recovery)
> 27295 2022‑02‑22T04:16:31.700027+00:00 FILE‑2 pacemaker‑schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for 
> more information
> 27296 2022‑02‑22T04:16:31.700203+00:00 FILE‑2 pacemaker‑schedulerd[5018]:
error: 
> Resource Service_Postgresql is active on 2 nodes (at tempting recovery)
> 27297 2022‑02‑22T04:16:31.700354+00:00 FILE‑2 pacemaker‑schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for 
> more information
> 27298 2022‑02‑22T04:16:31.700501+00:00 FILE‑2 pacemaker‑schedulerd[5018]:
error: 
> Resource Service_Postgrest is active on 2 nodes (att empting recovery)
> 27299 2022‑02‑22T04:16:31.700648+00:00 FILE‑2 pacemaker‑schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for 
> more information
> 27300 2022‑02‑22T04:16:31.700792+00:00 FILE‑2 pacemaker‑schedulerd[5018]:
error: 
> Resource Service_esm_primary is active on 2 nodes (a ttempting recovery)
> 27301 2022‑02‑22T04:16:31.700939+00:00 FILE‑2 pacemaker‑schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for 
> more information
> 27302 2022‑02‑22T04:16:31.701086+00:00 FILE‑2 pacemaker‑schedulerd[5018]:
error: 
> Resource Shared_Cluster_Backup is active on 2 nodes (attempting recovery)
> 
> 
> Can you guys please help us understand if this is indeed a split‑brain 
> scenario ? Under what circumstances can such a scenario be observed?
> We can have very serious impact if such a case can re‑occur inspite of 
> stonith already configured. Hence the ask .
> In case this situation gets reproduced, how can it be handled?
> 
> Note: We have stonith configured and it has been working fine so far. In 
> this case also, the initial fencing happened from stonith only.
> 
> Thanks in advance!
> 
> 
> 
> 
> 
> Internal Use ‑ Confidential