[ClusterLabs] Antw: [EXT] Re: Disable all resources in a group if one or more of them fail and are unable to reactivate

Ken Gaillot kgaillot at redhat.com
Thu Jan 28 13:31:28 EST 2021


I've opened a feature request for this:

https://bugs.clusterlabs.org/show_bug.cgi?id=5465

Realistically, developer time is tight for the foreseeable future, so
it's more a wish-list item unless someone volunteers to work on it.

On Thu, 2021-01-28 at 17:42 +0100, damiano giuliani wrote:
> Hi Ulrich, thanks for the answer, 
> as Ken explained me, there isnt any way to prevent earlier members
> from running
> if a later member has no available node, 
> if no node is available for the failed member, then it will just
> remain
> stopped,and the earlier
> members will stay active where they are.  
> i really hope was a solution or workaorund for this, but as ken
> clarify, pacemaker cant hadle this exceptions.
> 
> Many thanks for your quick and effective support.
> 
> Have a good evening!
> 
> Damiano
> 
> 
> Il giorno gio 28 gen 2021 alle ore 11:15 Ulrich Windl <
> Ulrich.Windl at rz.uni-regensburg.de> ha scritto:
> > >>> damiano giuliani <damianogiuliani87 at gmail.com> schrieb am
> > 27.01.2021 um
> > 19:25
> > in Nachricht
> > <CAG=zYNOx-R=wKbhtm=4N7qaoYKE=ofORVQ7jA0jr17oYjgqOhQ at mail.gmail.com
> > >:
> > > Hi Andrei, Thanks for ur help.
> > > if one of my resource in the group  fails or the primary node
> > went down (
> > > in my case acspcmk-02 ), the probe notices it and pacemaker tries
> > to
> > > restart the whole resource group on the second node.
> > > if the second node cant run one of my grouped resources, it tries
> > to stop
> > > them.
> > 
> > And what exactly is what you want? The behavior described it how
> > the cluster
> > handles it normally.
> > 
> > > 
> > > 
> > > i attached my cluster status; my primary node ( acspcmk-02 )
> > fails and the
> > > resource group tries to restart on the acspcmk-01, i keep broken
> > the
> > > resource  "lta-subscription-backend-ope-s3" on purpose and as you
> > can see
> > > some grouped resources are still started..
> > > i would like to know how achive a  condition that the resource
> > group must
> > > start properly for each resources, if not stop all the group
> > without some
> > > services still up and running.
> > > 
> > > 
> > > 2 nodes configured
> > > 28 resources configured
> > > 
> > > Online: [ acspcmk-01 ]
> > > OFFLINE: [ acspcmk-02 ]
> > > 
> > > Full list of resources:
> > > 
> > >  Clone Set: lta-odata-frontend-ope-s1-clone [lta-odata-frontend-
> > ope-s1]
> > >      Started: [ acspcmk-01 ]
> > >      Stopped: [ acspcmk-02 ]
> > >  Clone Set: lta-odata-frontend-ope-s2-clone [lta-odata-frontend-
> > ope-s2]
> > >      Started: [ acspcmk-01 ]
> > >      Stopped: [ acspcmk-02 ]
> > >  Clone Set: lta-odata-frontend-ope-s3-clone [lta-odata-frontend-
> > ope-s3]
> > >      Started: [ acspcmk-01 ]
> > >      Stopped: [ acspcmk-02 ]
> > >  Clone Set: s1ltaestimationtime-clone [s1ltaestimationtime]
> > >      Started: [ acspcmk-01 ]
> > >      Stopped: [ acspcmk-02 ]
> > >  Clone Set: s2ltaestimationtime-clone [s2ltaestimationtime]
> > >      Started: [ acspcmk-01 ]
> > >      Stopped: [ acspcmk-02 ]
> > >  Clone Set: s3ltaestimationtime-clone [s3ltaestimationtime]
> > >      Started: [ acspcmk-01 ]
> > >      Stopped: [ acspcmk-02 ]
> > >  Clone Set: openresty-clone [openresty]
> > >      Started: [ acspcmk-01 ]
> > >      Stopped: [ acspcmk-02 ]
> > >  Resource Group: LTA_SINGLE_RESOURCES
> > >      VIP        (ocf::heartbeat:IPaddr2):       Started acspcmk-
> > 01
> > >      lta-subscription-backend-ope-s1
> > >  (systemd:lta-subscription-backend-ope-s1):      Started acspcmk-
> > 01
> > >      lta-subscription-backend-ope-s2
> > >  (systemd:lta-subscription-backend-ope-s2):      Started acspcmk-
> > 01
> > >      lta-subscription-backend-ope-s3
> > >  (systemd:lta-subscription-backend-ope-s3):      Stopped
> > >      s1ltaquotaservice  (systemd:s1ltaquotaservice):    Stopped
> > >      s2ltaquotaservice  (systemd:s2ltaquotaservice):    Stopped
> > >      s3ltaquotaservice  (systemd:s3ltaquotaservice):    Stopped
> > >      s1ltarolling       (systemd:s1ltarolling): Stopped
> > >      s2ltarolling       (systemd:s2ltarolling): Stopped
> > >      s3ltarolling       (systemd:s3ltarolling): Stopped
> > >      s1srvnotificationdispatcher
> > >  (systemd:s1srvnotificationdispatcher):  Stopped
> > >      s2srvnotificationdispatcher
> > >  (systemd:s2srvnotificationdispatcher):  Stopped
> > >      s3srvnotificationdispatcher
> > >  (systemd:s3srvnotificationdispatcher):  Stopped
> > > 
> > > Failed Resource Actions:
> > > * lta-subscription-backend-ope-s3_start_0 on acspcmk-01 'unknown
> > error'
> > > (1): call=466, status=complete, exitreason='',
> > >     last-rc-change='Wed Jan 27 13:00:21 2021', queued=0ms,
> > exec=2128ms
> > > 
> > > Daemon Status:
> > >   corosync: active/disabled
> > >   pacemaker: active/disabled
> > >   pcsd: active/enabled
> > >   sbd: active/enabled
> > > 
> > > 
> > >   I hope i explained my problem at my best,
> > > 
> > > Thanks for your time and help.
> > > 
> > > Good Evening
> > > 
> > > Damiano
> > > 
> > > Il giorno mer 27 gen 2021 alle ore 19:03 Andrei Borzenkov <
> > > arvidjaar at gmail.com> ha scritto:
> > > 
> > >> 27.01.2021 19:06, damiano giuliani пишет:
> > >> > Hi all im pretty new to the clusters, im struggling trying to
> > configure
> > a
> > >> > bounch of resources and test how they failover.my need is to
> > start and
> > >> > manage a group of resources as one (in order to archive this a
> > resource
> > >> > group has been created), and if one of them cant run and still
> > fails,
> > the
> > >> > cluster will try to restart the resource group in the
> > secondary node, if
> > >> it
> > >> > cant run the all the resource toghter disable all the resource
> > group.
> > >> > i would like to know if there is a way to set the cluster to
> > disable all
> > >> > the resources of the group (or the group itself) if it cant be
> > run all
> > >> the
> > >> > resoruces somewhere.
> > >> >
> > >>
> > >> That's what pacemaker group does. I am not sure what you mean
> > with
> > >> "disable all resources". If resource fail count on a node
> > exceeds
> > >> threshold, this node is banned from running resource. If
> > resource failed
> > >> on every node, no node can run it until you clear fail count.
> > >>
> > >> "Disable resource" in pacemaker would mean setting its target-
> > role to
> > >> stopped. That does not happen automatically (at least I am not
> > aware of
> > >> it).
> > >> _______________________________________________
> > >> Manage your subscription:
> > >> https://lists.clusterlabs.org/mailman/listinfo/users 
> > >>
> > >> ClusterLabs home: https://www.clusterlabs.org/ 
> > >>
> > 
> > 
> > 
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list