[ClusterLabs] Disable all resources in a group if one or more of them fail and are unable to reactivate
damiano giuliani
damianogiuliani87 at gmail.com
Wed Jan 27 13:25:16 EST 2021
Hi Andrei, Thanks for ur help.
if one of my resource in the group fails or the primary node went down (
in my case acspcmk-02 ), the probe notices it and pacemaker tries to
restart the whole resource group on the second node.
if the second node cant run one of my grouped resources, it tries to stop
them.
i attached my cluster status; my primary node ( acspcmk-02 ) fails and the
resource group tries to restart on the acspcmk-01, i keep broken the
resource "lta-subscription-backend-ope-s3" on purpose and as you can see
some grouped resources are still started..
i would like to know how achive a condition that the resource group must
start properly for each resources, if not stop all the group without some
services still up and running.
2 nodes configured
28 resources configured
Online: [ acspcmk-01 ]
OFFLINE: [ acspcmk-02 ]
Full list of resources:
Clone Set: lta-odata-frontend-ope-s1-clone [lta-odata-frontend-ope-s1]
Started: [ acspcmk-01 ]
Stopped: [ acspcmk-02 ]
Clone Set: lta-odata-frontend-ope-s2-clone [lta-odata-frontend-ope-s2]
Started: [ acspcmk-01 ]
Stopped: [ acspcmk-02 ]
Clone Set: lta-odata-frontend-ope-s3-clone [lta-odata-frontend-ope-s3]
Started: [ acspcmk-01 ]
Stopped: [ acspcmk-02 ]
Clone Set: s1ltaestimationtime-clone [s1ltaestimationtime]
Started: [ acspcmk-01 ]
Stopped: [ acspcmk-02 ]
Clone Set: s2ltaestimationtime-clone [s2ltaestimationtime]
Started: [ acspcmk-01 ]
Stopped: [ acspcmk-02 ]
Clone Set: s3ltaestimationtime-clone [s3ltaestimationtime]
Started: [ acspcmk-01 ]
Stopped: [ acspcmk-02 ]
Clone Set: openresty-clone [openresty]
Started: [ acspcmk-01 ]
Stopped: [ acspcmk-02 ]
Resource Group: LTA_SINGLE_RESOURCES
VIP (ocf::heartbeat:IPaddr2): Started acspcmk-01
lta-subscription-backend-ope-s1
(systemd:lta-subscription-backend-ope-s1): Started acspcmk-01
lta-subscription-backend-ope-s2
(systemd:lta-subscription-backend-ope-s2): Started acspcmk-01
lta-subscription-backend-ope-s3
(systemd:lta-subscription-backend-ope-s3): Stopped
s1ltaquotaservice (systemd:s1ltaquotaservice): Stopped
s2ltaquotaservice (systemd:s2ltaquotaservice): Stopped
s3ltaquotaservice (systemd:s3ltaquotaservice): Stopped
s1ltarolling (systemd:s1ltarolling): Stopped
s2ltarolling (systemd:s2ltarolling): Stopped
s3ltarolling (systemd:s3ltarolling): Stopped
s1srvnotificationdispatcher
(systemd:s1srvnotificationdispatcher): Stopped
s2srvnotificationdispatcher
(systemd:s2srvnotificationdispatcher): Stopped
s3srvnotificationdispatcher
(systemd:s3srvnotificationdispatcher): Stopped
Failed Resource Actions:
* lta-subscription-backend-ope-s3_start_0 on acspcmk-01 'unknown error'
(1): call=466, status=complete, exitreason='',
last-rc-change='Wed Jan 27 13:00:21 2021', queued=0ms, exec=2128ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
sbd: active/enabled
I hope i explained my problem at my best,
Thanks for your time and help.
Good Evening
Damiano
Il giorno mer 27 gen 2021 alle ore 19:03 Andrei Borzenkov <
arvidjaar at gmail.com> ha scritto:
> 27.01.2021 19:06, damiano giuliani пишет:
> > Hi all im pretty new to the clusters, im struggling trying to configure a
> > bounch of resources and test how they failover.my need is to start and
> > manage a group of resources as one (in order to archive this a resource
> > group has been created), and if one of them cant run and still fails, the
> > cluster will try to restart the resource group in the secondary node, if
> it
> > cant run the all the resource toghter disable all the resource group.
> > i would like to know if there is a way to set the cluster to disable all
> > the resources of the group (or the group itself) if it cant be run all
> the
> > resoruces somewhere.
> >
>
> That's what pacemaker group does. I am not sure what you mean with
> "disable all resources". If resource fail count on a node exceeds
> threshold, this node is banned from running resource. If resource failed
> on every node, no node can run it until you clear fail count.
>
> "Disable resource" in pacemaker would mean setting its target-role to
> stopped. That does not happen automatically (at least I am not aware of
> it).
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210127/63f6df13/attachment-0001.htm>
More information about the Users
mailing list