[ClusterLabs] Disable all resources in a group if one or more of them fail and are unable to reactivate

Wed Jan 27 13:25:16 EST 2021

Hi Andrei, Thanks for ur help.
if one of my resource in the group  fails or the primary node went down (
in my case acspcmk-02 ), the probe notices it and pacemaker tries to
restart the whole resource group on the second node.
if the second node cant run one of my grouped resources, it tries to stop
them.

i attached my cluster status; my primary node ( acspcmk-02 ) fails and the
resource group tries to restart on the acspcmk-01, i keep broken the
resource  "lta-subscription-backend-ope-s3" on purpose and as you can see
some grouped resources are still started..
i would like to know how achive a  condition that the resource group must
start properly for each resources, if not stop all the group without some
services still up and running.

2 nodes configured
28 resources configured

Online: [ acspcmk-01 ]
OFFLINE: [ acspcmk-02 ]

Full list of resources:

 Clone Set: lta-odata-frontend-ope-s1-clone [lta-odata-frontend-ope-s1]
     Started: [ acspcmk-01 ]
     Stopped: [ acspcmk-02 ]
 Clone Set: lta-odata-frontend-ope-s2-clone [lta-odata-frontend-ope-s2]
     Started: [ acspcmk-01 ]
     Stopped: [ acspcmk-02 ]
 Clone Set: lta-odata-frontend-ope-s3-clone [lta-odata-frontend-ope-s3]
     Started: [ acspcmk-01 ]
     Stopped: [ acspcmk-02 ]
 Clone Set: s1ltaestimationtime-clone [s1ltaestimationtime]
     Started: [ acspcmk-01 ]
     Stopped: [ acspcmk-02 ]
 Clone Set: s2ltaestimationtime-clone [s2ltaestimationtime]
     Started: [ acspcmk-01 ]
     Stopped: [ acspcmk-02 ]
 Clone Set: s3ltaestimationtime-clone [s3ltaestimationtime]
     Started: [ acspcmk-01 ]
     Stopped: [ acspcmk-02 ]
 Clone Set: openresty-clone [openresty]
     Started: [ acspcmk-01 ]
     Stopped: [ acspcmk-02 ]
 Resource Group: LTA_SINGLE_RESOURCES
     VIP        (ocf::heartbeat:IPaddr2):       Started acspcmk-01
     lta-subscription-backend-ope-s1
 (systemd:lta-subscription-backend-ope-s1):      Started acspcmk-01
     lta-subscription-backend-ope-s2
 (systemd:lta-subscription-backend-ope-s2):      Started acspcmk-01
     lta-subscription-backend-ope-s3
 (systemd:lta-subscription-backend-ope-s3):      Stopped
     s1ltaquotaservice  (systemd:s1ltaquotaservice):    Stopped
     s2ltaquotaservice  (systemd:s2ltaquotaservice):    Stopped
     s3ltaquotaservice  (systemd:s3ltaquotaservice):    Stopped
     s1ltarolling       (systemd:s1ltarolling): Stopped
     s2ltarolling       (systemd:s2ltarolling): Stopped
     s3ltarolling       (systemd:s3ltarolling): Stopped
     s1srvnotificationdispatcher
 (systemd:s1srvnotificationdispatcher):  Stopped
     s2srvnotificationdispatcher
 (systemd:s2srvnotificationdispatcher):  Stopped
     s3srvnotificationdispatcher
 (systemd:s3srvnotificationdispatcher):  Stopped

Failed Resource Actions:
* lta-subscription-backend-ope-s3_start_0 on acspcmk-01 'unknown error'
(1): call=466, status=complete, exitreason='',
    last-rc-change='Wed Jan 27 13:00:21 2021', queued=0ms, exec=2128ms

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
  sbd: active/enabled

  I hope i explained my problem at my best,

Thanks for your time and help.

Good Evening

Damiano

Il giorno mer 27 gen 2021 alle ore 19:03 Andrei Borzenkov <
arvidjaar at gmail.com> ha scritto:

> 27.01.2021 19:06, damiano giuliani пишет:
> > Hi all im pretty new to the clusters, im struggling trying to configure a
> > bounch of resources and test how they failover.my need is to start and
> > manage a group of resources as one (in order to archive this a resource
> > group has been created), and if one of them cant run and still fails, the
> > cluster will try to restart the resource group in the secondary node, if
> it
> > cant run the all the resource toghter disable all the resource group.
> > i would like to know if there is a way to set the cluster to disable all
> > the resources of the group (or the group itself) if it cant be run all
> the
> > resoruces somewhere.
> >
>
> That's what pacemaker group does. I am not sure what you mean with
> "disable all resources". If resource fail count on a node exceeds
> threshold, this node is banned from running resource. If resource failed
> on every node, no node can run it until you clear fail count.
>
> "Disable resource" in pacemaker would mean setting its target-role to
> stopped. That does not happen automatically (at least I am not aware of
> it).
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210127/63f6df13/attachment-0001.htm>