[ClusterLabs] Resources too_active (active on all nodes of the cluster, instead of only 1 node)
Priyanka Balotra
priyanka.14balotra at gmail.com
Wed Mar 23 01:49:39 EDT 2022
Hi All,
We have a scenario on SLES 12 SP3 cluster.
The scenario is explained as follows in the order of events:
- There is a 2-node cluster (FILE-1, FILE-2)
- The cluster and the resources were up and running fine initially .
- Then fencing request from pacemaker got issued on both nodes
simultaneously
Logs from 1st node:
2022-02-22T03:26:36.737075+00:00 FILE-1 corosync[12304]: [TOTEM ] Failed to
receive the leave message. failed: 2
.
.
2022-02-22T03:26:36.977888+00:00 FILE-1 pacemaker-fenced[12331]: notice:
Requesting that FILE-1 perform 'off' action targeting FILE-2
Logs from 2nd node:
2022-02-22T03:26:36.738080+00:00 FILE-2 corosync[4989]: [TOTEM ] Failed to
receive the leave message. failed: 1
.
.
Feb 22 03:26:38 FILE-2 pacemaker-fenced [5015] (call_remote_stonith)
notice: Requesting that FILE-2 perform 'off' action targeting FILE-1
- When the nodes came up after unfencing, the DC got set after
election
- After that the resources which were expected to run on only one
node became active on both (all) nodes of the cluster.
27290 2022-02-22T04:16:31.699186+00:00 FILE-2
pacemaker-schedulerd[5018]: error:
Resource stonith-sbd is active on 2 nodes (attempting recovery)
27291 2022-02-22T04:16:31.699397+00:00 FILE-2 pacemaker-schedulerd[5018]:
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for more information
27292 2022-02-22T04:16:31.699590+00:00 FILE-2
pacemaker-schedulerd[5018]: error:
Resource FILE_Filesystem is active on 2 nodes (attem pting recovery)
27293 2022-02-22T04:16:31.699731+00:00 FILE-2 pacemaker-schedulerd[5018]:
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for more information
27294 2022-02-22T04:16:31.699878+00:00 FILE-2
pacemaker-schedulerd[5018]: error:
Resource IP_Floating is active on 2 nodes (attemptin g recovery)
27295 2022-02-22T04:16:31.700027+00:00 FILE-2 pacemaker-schedulerd[5018]:
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for more information
27296 2022-02-22T04:16:31.700203+00:00 FILE-2
pacemaker-schedulerd[5018]: error:
Resource Service_Postgresql is active on 2 nodes (at tempting recovery)
27297 2022-02-22T04:16:31.700354+00:00 FILE-2 pacemaker-schedulerd[5018]:
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for more information
27298 2022-02-22T04:16:31.700501+00:00 FILE-2 pacemaker-schedulerd[5018]:
error: Resource Service_Postgrest is active on 2 nodes (att empting
recovery)
27299 2022-02-22T04:16:31.700648+00:00 FILE-2 pacemaker-schedulerd[5018]:
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for more information
27300 2022-02-22T04:16:31.700792+00:00 FILE-2
pacemaker-schedulerd[5018]: error:
Resource Service_esm_primary is active on 2 nodes (a ttempting recovery)
27301 2022-02-22T04:16:31.700939+00:00 FILE-2 pacemaker-schedulerd[5018]:
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for more information
27302 2022-02-22T04:16:31.701086+00:00 FILE-2
pacemaker-schedulerd[5018]: error:
Resource Shared_Cluster_Backup is active on 2 nodes (attempting recovery)
Can you guys please help us understand if this is indeed a split-brain
scenario ? Under what circumstances can such a scenario be observed?
We can have very serious impact if such a case can re-occur inspite of
stonith already configured. Hence the ask .
In case this situation gets reproduced, how can it be handled?
Note: We have stonith configured and it has been working fine so far. In
this case also, the initial fencing happened from stonith only.
Thanks in advance!
Priyanka
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20220323/4b854ce4/attachment-0001.htm>
More information about the Users
mailing list