[ClusterLabs] Antw: [EXT] Re: Disable all resources in a group if one or more of them fail and are unable to reactivate

Thu Jan 28 05:15:26 EST 2021

>>> damiano giuliani <damianogiuliani87 at gmail.com> schrieb am 27.01.2021 um
19:25
in Nachricht
<CAG=zYNOx-R=wKbhtm=4N7qaoYKE=ofORVQ7jA0jr17oYjgqOhQ at mail.gmail.com>:
> Hi Andrei, Thanks for ur help.
> if one of my resource in the group  fails or the primary node went down (
> in my case acspcmk-02 ), the probe notices it and pacemaker tries to
> restart the whole resource group on the second node.
> if the second node cant run one of my grouped resources, it tries to stop
> them.

And what exactly is what you want? The behavior described it how the cluster
handles it normally.

> 
> 
> i attached my cluster status; my primary node ( acspcmk-02 ) fails and the
> resource group tries to restart on the acspcmk-01, i keep broken the
> resource  "lta-subscription-backend-ope-s3" on purpose and as you can see
> some grouped resources are still started..
> i would like to know how achive a  condition that the resource group must
> start properly for each resources, if not stop all the group without some
> services still up and running.
> 
> 
> 2 nodes configured
> 28 resources configured
> 
> Online: [ acspcmk-01 ]
> OFFLINE: [ acspcmk-02 ]
> 
> Full list of resources:
> 
>  Clone Set: lta-odata-frontend-ope-s1-clone [lta-odata-frontend-ope-s1]
>      Started: [ acspcmk-01 ]
>      Stopped: [ acspcmk-02 ]
>  Clone Set: lta-odata-frontend-ope-s2-clone [lta-odata-frontend-ope-s2]
>      Started: [ acspcmk-01 ]
>      Stopped: [ acspcmk-02 ]
>  Clone Set: lta-odata-frontend-ope-s3-clone [lta-odata-frontend-ope-s3]
>      Started: [ acspcmk-01 ]
>      Stopped: [ acspcmk-02 ]
>  Clone Set: s1ltaestimationtime-clone [s1ltaestimationtime]
>      Started: [ acspcmk-01 ]
>      Stopped: [ acspcmk-02 ]
>  Clone Set: s2ltaestimationtime-clone [s2ltaestimationtime]
>      Started: [ acspcmk-01 ]
>      Stopped: [ acspcmk-02 ]
>  Clone Set: s3ltaestimationtime-clone [s3ltaestimationtime]
>      Started: [ acspcmk-01 ]
>      Stopped: [ acspcmk-02 ]
>  Clone Set: openresty-clone [openresty]
>      Started: [ acspcmk-01 ]
>      Stopped: [ acspcmk-02 ]
>  Resource Group: LTA_SINGLE_RESOURCES
>      VIP        (ocf::heartbeat:IPaddr2):       Started acspcmk-01
>      lta-subscription-backend-ope-s1
>  (systemd:lta-subscription-backend-ope-s1):      Started acspcmk-01
>      lta-subscription-backend-ope-s2
>  (systemd:lta-subscription-backend-ope-s2):      Started acspcmk-01
>      lta-subscription-backend-ope-s3
>  (systemd:lta-subscription-backend-ope-s3):      Stopped
>      s1ltaquotaservice  (systemd:s1ltaquotaservice):    Stopped
>      s2ltaquotaservice  (systemd:s2ltaquotaservice):    Stopped
>      s3ltaquotaservice  (systemd:s3ltaquotaservice):    Stopped
>      s1ltarolling       (systemd:s1ltarolling): Stopped
>      s2ltarolling       (systemd:s2ltarolling): Stopped
>      s3ltarolling       (systemd:s3ltarolling): Stopped
>      s1srvnotificationdispatcher
>  (systemd:s1srvnotificationdispatcher):  Stopped
>      s2srvnotificationdispatcher
>  (systemd:s2srvnotificationdispatcher):  Stopped
>      s3srvnotificationdispatcher
>  (systemd:s3srvnotificationdispatcher):  Stopped
> 
> Failed Resource Actions:
> * lta-subscription-backend-ope-s3_start_0 on acspcmk-01 'unknown error'
> (1): call=466, status=complete, exitreason='',
>     last-rc-change='Wed Jan 27 13:00:21 2021', queued=0ms, exec=2128ms
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
>   sbd: active/enabled
> 
> 
>   I hope i explained my problem at my best,
> 
> Thanks for your time and help.
> 
> Good Evening
> 
> Damiano
> 
> Il giorno mer 27 gen 2021 alle ore 19:03 Andrei Borzenkov <
> arvidjaar at gmail.com> ha scritto:
> 
>> 27.01.2021 19:06, damiano giuliani пишет:
>> > Hi all im pretty new to the clusters, im struggling trying to configure
a
>> > bounch of resources and test how they failover.my need is to start and
>> > manage a group of resources as one (in order to archive this a resource
>> > group has been created), and if one of them cant run and still fails,
the
>> > cluster will try to restart the resource group in the secondary node, if
>> it
>> > cant run the all the resource toghter disable all the resource group.
>> > i would like to know if there is a way to set the cluster to disable all
>> > the resources of the group (or the group itself) if it cant be run all
>> the
>> > resoruces somewhere.
>> >
>>
>> That's what pacemaker group does. I am not sure what you mean with
>> "disable all resources". If resource fail count on a node exceeds
>> threshold, this node is banned from running resource. If resource failed
>> on every node, no node can run it until you clear fail count.
>>
>> "Disable resource" in pacemaker would mean setting its target-role to
>> stopped. That does not happen automatically (at least I am not aware of
>> it).
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>