[ClusterLabs] Disable all resources in a group if one or more of them fail and are unable to reactivate

Ken Gaillot kgaillot at redhat.com
Wed Jan 27 14:03:03 EST 2021


On Wed, 2021-01-27 at 19:25 +0100, damiano giuliani wrote:
> Hi Andrei, Thanks for ur help.
> if one of my resource in the group  fails or the primary node went
> down ( in my case acspcmk-02 ), the probe notices it and pacemaker
> tries to restart the whole resource group on the second node.
> if the second node cant run one of my grouped resources, it tries to
> stop them.
> 
> 
> i attached my cluster status; my primary node ( acspcmk-02 ) fails
> and the resource group tries to restart on the acspcmk-01, i keep
> broken the resource  "lta-subscription-backend-ope-s3" on purpose and
> as you can see some grouped resources are still started..
> i would like to know how achive a  condition that the resource group
> must start properly for each resources, if not stop all the group
> without some services still up and running.

With a group, later members depend on earlier members. If an earlier
member can't run, then no members after it can run.

However we can't make the dependency go in both directions. If an
earlier member can't run unless a later member is active, and vice
versa, then how can anything be started?

By default, Pacemaker tries to recover failed resources on the same
node, up to its migration-threshold (which defaults to a million
times). Once a group member reaches its migration-threshold, Pacemaker
will move the entire group to another node if one is available. However
if no node is available for the failed member, then it will just remain
stopped (along with any later members in the group), and the earlier
members will stay active where they are.

I don't think there's any way to prevent earlier members from running
if a later member has no available node.

> 2 nodes configured
> 28 resources configured
> 
> Online: [ acspcmk-01 ]
> OFFLINE: [ acspcmk-02 ]
> 
> Full list of resources:
> 
>  Clone Set: lta-odata-frontend-ope-s1-clone [lta-odata-frontend-ope-
> s1]
>      Started: [ acspcmk-01 ]
>      Stopped: [ acspcmk-02 ]
>  Clone Set: lta-odata-frontend-ope-s2-clone [lta-odata-frontend-ope-
> s2]
>      Started: [ acspcmk-01 ]
>      Stopped: [ acspcmk-02 ]
>  Clone Set: lta-odata-frontend-ope-s3-clone [lta-odata-frontend-ope-
> s3]
>      Started: [ acspcmk-01 ]
>      Stopped: [ acspcmk-02 ]
>  Clone Set: s1ltaestimationtime-clone [s1ltaestimationtime]
>      Started: [ acspcmk-01 ]
>      Stopped: [ acspcmk-02 ]
>  Clone Set: s2ltaestimationtime-clone [s2ltaestimationtime]
>      Started: [ acspcmk-01 ]
>      Stopped: [ acspcmk-02 ]
>  Clone Set: s3ltaestimationtime-clone [s3ltaestimationtime]
>      Started: [ acspcmk-01 ]
>      Stopped: [ acspcmk-02 ]
>  Clone Set: openresty-clone [openresty]
>      Started: [ acspcmk-01 ]
>      Stopped: [ acspcmk-02 ]
>  Resource Group: LTA_SINGLE_RESOURCES
>      VIP        (ocf::heartbeat:IPaddr2):       Started acspcmk-01
>      lta-subscription-backend-ope-s1    (systemd:lta-subscription-
> backend-ope-s1):      Started acspcmk-01
>      lta-subscription-backend-ope-s2    (systemd:lta-subscription-
> backend-ope-s2):      Started acspcmk-01
>      lta-subscription-backend-ope-s3    (systemd:lta-subscription-
> backend-ope-s3):      Stopped
>      s1ltaquotaservice  (systemd:s1ltaquotaservice):    Stopped
>      s2ltaquotaservice  (systemd:s2ltaquotaservice):    Stopped
>      s3ltaquotaservice  (systemd:s3ltaquotaservice):    Stopped
>      s1ltarolling       (systemd:s1ltarolling): Stopped
>      s2ltarolling       (systemd:s2ltarolling): Stopped
>      s3ltarolling       (systemd:s3ltarolling): Stopped
>      s1srvnotificationdispatcher      
>  (systemd:s1srvnotificationdispatcher):  Stopped
>      s2srvnotificationdispatcher      
>  (systemd:s2srvnotificationdispatcher):  Stopped
>      s3srvnotificationdispatcher      
>  (systemd:s3srvnotificationdispatcher):  Stopped
> 
> Failed Resource Actions:
> * lta-subscription-backend-ope-s3_start_0 on acspcmk-01 'unknown
> error' (1): call=466, status=complete, exitreason='',
>     last-rc-change='Wed Jan 27 13:00:21 2021', queued=0ms,
> exec=2128ms
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
>   sbd: active/enabled
> 
> 
>   I hope i explained my problem at my best,
> 
> Thanks for your time and help.
> 
> Good Evening
> 
> Damiano  
> 
> Il giorno mer 27 gen 2021 alle ore 19:03 Andrei Borzenkov <
> arvidjaar at gmail.com> ha scritto:
> > 27.01.2021 19:06, damiano giuliani пишет:
> > > Hi all im pretty new to the clusters, im struggling trying to
> > configure a
> > > bounch of resources and test how they failover.my need is to
> > start and
> > > manage a group of resources as one (in order to archive this a
> > resource
> > > group has been created), and if one of them cant run and still
> > fails, the
> > > cluster will try to restart the resource group in the secondary
> > node, if it
> > > cant run the all the resource toghter disable all the resource
> > group.
> > > i would like to know if there is a way to set the cluster to
> > disable all
> > > the resources of the group (or the group itself) if it cant be
> > run all the
> > > resoruces somewhere.
> > > 
> > 
> > That's what pacemaker group does. I am not sure what you mean with
> > "disable all resources". If resource fail count on a node exceeds
> > threshold, this node is banned from running resource. If resource
> > failed
> > on every node, no node can run it until you clear fail count.
> > 
> > "Disable resource" in pacemaker would mean setting its target-role
> > to
> > stopped. That does not happen automatically (at least I am not
> > aware of it).
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list