[ClusterLabs] Antw: Re: Antw: [EXT] Re: Disable all resources in a group if one or more of them fail and are unable to reactivate

Fri Jan 29 05:18:39 EST 2021

>>> damiano giuliani <damianogiuliani87 at gmail.com> schrieb am 28.01.2021 um
17:42
in Nachricht
<CAG=zYNNcso+nhWsEbvJaqKd5cMyUoJg0Cgtc9KvpuJJ_g9-T6w at mail.gmail.com>:
> Hi Ulrich, thanks for the answer,
> as Ken explained me, there isnt any way to prevent earlier members from
> running
> if a later member has no available node,
> if no node is available for the failed member, then it will just remain
> stopped,and the earlier
> members will stay active where they are.

When starting with the cluster, I had such, too, but we did not have them for
a long time. So I could guess the real problem is some kind of configuration
error ;-)

Can you give a specific example?

Regards,
Ulrich

> i really hope was a solution or workaorund for this, but as ken clarify,
> pacemaker cant hadle this exceptions.
> 
> Many thanks for your quick and effective support.
> 
> Have a good evening!
> 
> Damiano
> 
> 
> Il giorno gio 28 gen 2021 alle ore 11:15 Ulrich Windl <
> Ulrich.Windl at rz.uni-regensburg.de> ha scritto:
> 
>> >>> damiano giuliani <damianogiuliani87 at gmail.com> schrieb am 27.01.2021
>> um
>> 19:25
>> in Nachricht
>> <CAG=zYNOx-R=wKbhtm=4N7qaoYKE=ofORVQ7jA0jr17oYjgqOhQ at mail.gmail.com>:
>> > Hi Andrei, Thanks for ur help.
>> > if one of my resource in the group  fails or the primary node went down
(
>> > in my case acspcmk-02 ), the probe notices it and pacemaker tries to
>> > restart the whole resource group on the second node.
>> > if the second node cant run one of my grouped resources, it tries to
stop
>> > them.
>>
>> And what exactly is what you want? The behavior described it how the
>> cluster
>> handles it normally.
>>
>> >
>> >
>> > i attached my cluster status; my primary node ( acspcmk-02 ) fails and
>> the
>> > resource group tries to restart on the acspcmk-01, i keep broken the
>> > resource  "lta-subscription-backend-ope-s3" on purpose and as you can
see
>> > some grouped resources are still started..
>> > i would like to know how achive a  condition that the resource group
must
>> > start properly for each resources, if not stop all the group without
some
>> > services still up and running.
>> >
>> >
>> > 2 nodes configured
>> > 28 resources configured
>> >
>> > Online: [ acspcmk-01 ]
>> > OFFLINE: [ acspcmk-02 ]
>> >
>> > Full list of resources:
>> >
>> >  Clone Set: lta-odata-frontend-ope-s1-clone [lta-odata-frontend-ope-s1]
>> >      Started: [ acspcmk-01 ]
>> >      Stopped: [ acspcmk-02 ]
>> >  Clone Set: lta-odata-frontend-ope-s2-clone [lta-odata-frontend-ope-s2]
>> >      Started: [ acspcmk-01 ]
>> >      Stopped: [ acspcmk-02 ]
>> >  Clone Set: lta-odata-frontend-ope-s3-clone [lta-odata-frontend-ope-s3]
>> >      Started: [ acspcmk-01 ]
>> >      Stopped: [ acspcmk-02 ]
>> >  Clone Set: s1ltaestimationtime-clone [s1ltaestimationtime]
>> >      Started: [ acspcmk-01 ]
>> >      Stopped: [ acspcmk-02 ]
>> >  Clone Set: s2ltaestimationtime-clone [s2ltaestimationtime]
>> >      Started: [ acspcmk-01 ]
>> >      Stopped: [ acspcmk-02 ]
>> >  Clone Set: s3ltaestimationtime-clone [s3ltaestimationtime]
>> >      Started: [ acspcmk-01 ]
>> >      Stopped: [ acspcmk-02 ]
>> >  Clone Set: openresty-clone [openresty]
>> >      Started: [ acspcmk-01 ]
>> >      Stopped: [ acspcmk-02 ]
>> >  Resource Group: LTA_SINGLE_RESOURCES
>> >      VIP        (ocf::heartbeat:IPaddr2):       Started acspcmk-01
>> >      lta-subscription-backend-ope-s1
>> >  (systemd:lta-subscription-backend-ope-s1):      Started acspcmk-01
>> >      lta-subscription-backend-ope-s2
>> >  (systemd:lta-subscription-backend-ope-s2):      Started acspcmk-01
>> >      lta-subscription-backend-ope-s3
>> >  (systemd:lta-subscription-backend-ope-s3):      Stopped
>> >      s1ltaquotaservice  (systemd:s1ltaquotaservice):    Stopped
>> >      s2ltaquotaservice  (systemd:s2ltaquotaservice):    Stopped
>> >      s3ltaquotaservice  (systemd:s3ltaquotaservice):    Stopped
>> >      s1ltarolling       (systemd:s1ltarolling): Stopped
>> >      s2ltarolling       (systemd:s2ltarolling): Stopped
>> >      s3ltarolling       (systemd:s3ltarolling): Stopped
>> >      s1srvnotificationdispatcher
>> >  (systemd:s1srvnotificationdispatcher):  Stopped
>> >      s2srvnotificationdispatcher
>> >  (systemd:s2srvnotificationdispatcher):  Stopped
>> >      s3srvnotificationdispatcher
>> >  (systemd:s3srvnotificationdispatcher):  Stopped
>> >
>> > Failed Resource Actions:
>> > * lta-subscription-backend-ope-s3_start_0 on acspcmk-01 'unknown error'
>> > (1): call=466, status=complete, exitreason='',
>> >     last-rc-change='Wed Jan 27 13:00:21 2021', queued=0ms, exec=2128ms
>> >
>> > Daemon Status:
>> >   corosync: active/disabled
>> >   pacemaker: active/disabled
>> >   pcsd: active/enabled
>> >   sbd: active/enabled
>> >
>> >
>> >   I hope i explained my problem at my best,
>> >
>> > Thanks for your time and help.
>> >
>> > Good Evening
>> >
>> > Damiano
>> >
>> > Il giorno mer 27 gen 2021 alle ore 19:03 Andrei Borzenkov <
>> > arvidjaar at gmail.com> ha scritto:
>> >
>> >> 27.01.2021 19:06, damiano giuliani пишет:
>> >> > Hi all im pretty new to the clusters, im struggling trying to
>> configure
>> a
>> >> > bounch of resources and test how they failover.my need is to start
and
>> >> > manage a group of resources as one (in order to archive this a
>> resource
>> >> > group has been created), and if one of them cant run and still fails,
>> the
>> >> > cluster will try to restart the resource group in the secondary node,
>> if
>> >> it
>> >> > cant run the all the resource toghter disable all the resource group.
>> >> > i would like to know if there is a way to set the cluster to disable
>> all
>> >> > the resources of the group (or the group itself) if it cant be run
all
>> >> the
>> >> > resoruces somewhere.
>> >> >
>> >>
>> >> That's what pacemaker group does. I am not sure what you mean with
>> >> "disable all resources". If resource fail count on a node exceeds
>> >> threshold, this node is banned from running resource. If resource
failed
>> >> on every node, no node can run it until you clear fail count.
>> >>
>> >> "Disable resource" in pacemaker would mean setting its target-role to
>> >> stopped. That does not happen automatically (at least I am not aware of
>> >> it).
>> >> _______________________________________________
>> >> Manage your subscription:
>> >> https://lists.clusterlabs.org/mailman/listinfo/users 
>> >>
>> >> ClusterLabs home: https://www.clusterlabs.org/ 
>> >>
>>
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>