<div dir="ltr">Hi Ken, thanks for the answer and explanation.<div>So i stop strugglin myself finding a solution!</div><div><br></div><div>your clarifications are very uselfull and appreciated.</div><div><br></div><div>Many Thanks</div><div><br></div><div>Have a good day.</div><div><br></div><div>Damiano </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno mer 27 gen 2021 alle ore 20:03 Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>> ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, 2021-01-27 at 19:25 +0100, damiano giuliani wrote:<br>
> Hi Andrei, Thanks for ur help.<br>
> if one of my resource in the group fails or the primary node went<br>
> down ( in my case acspcmk-02 ), the probe notices it and pacemaker<br>
> tries to restart the whole resource group on the second node.<br>
> if the second node cant run one of my grouped resources, it tries to<br>
> stop them.<br>
> <br>
> <br>
> i attached my cluster status; my primary node ( acspcmk-02 ) fails<br>
> and the resource group tries to restart on the acspcmk-01, i keep<br>
> broken the resource "lta-subscription-backend-ope-s3" on purpose and<br>
> as you can see some grouped resources are still started..<br>
> i would like to know how achive a condition that the resource group<br>
> must start properly for each resources, if not stop all the group<br>
> without some services still up and running.<br>
<br>
With a group, later members depend on earlier members. If an earlier<br>
member can't run, then no members after it can run.<br>
<br>
However we can't make the dependency go in both directions. If an<br>
earlier member can't run unless a later member is active, and vice<br>
versa, then how can anything be started?<br>
<br>
By default, Pacemaker tries to recover failed resources on the same<br>
node, up to its migration-threshold (which defaults to a million<br>
times). Once a group member reaches its migration-threshold, Pacemaker<br>
will move the entire group to another node if one is available. However<br>
if no node is available for the failed member, then it will just remain<br>
stopped (along with any later members in the group), and the earlier<br>
members will stay active where they are.<br>
<br>
I don't think there's any way to prevent earlier members from running<br>
if a later member has no available node.<br>
<br>
> 2 nodes configured<br>
> 28 resources configured<br>
> <br>
> Online: [ acspcmk-01 ]<br>
> OFFLINE: [ acspcmk-02 ]<br>
> <br>
> Full list of resources:<br>
> <br>
> Clone Set: lta-odata-frontend-ope-s1-clone [lta-odata-frontend-ope-<br>
> s1]<br>
> Started: [ acspcmk-01 ]<br>
> Stopped: [ acspcmk-02 ]<br>
> Clone Set: lta-odata-frontend-ope-s2-clone [lta-odata-frontend-ope-<br>
> s2]<br>
> Started: [ acspcmk-01 ]<br>
> Stopped: [ acspcmk-02 ]<br>
> Clone Set: lta-odata-frontend-ope-s3-clone [lta-odata-frontend-ope-<br>
> s3]<br>
> Started: [ acspcmk-01 ]<br>
> Stopped: [ acspcmk-02 ]<br>
> Clone Set: s1ltaestimationtime-clone [s1ltaestimationtime]<br>
> Started: [ acspcmk-01 ]<br>
> Stopped: [ acspcmk-02 ]<br>
> Clone Set: s2ltaestimationtime-clone [s2ltaestimationtime]<br>
> Started: [ acspcmk-01 ]<br>
> Stopped: [ acspcmk-02 ]<br>
> Clone Set: s3ltaestimationtime-clone [s3ltaestimationtime]<br>
> Started: [ acspcmk-01 ]<br>
> Stopped: [ acspcmk-02 ]<br>
> Clone Set: openresty-clone [openresty]<br>
> Started: [ acspcmk-01 ]<br>
> Stopped: [ acspcmk-02 ]<br>
> Resource Group: LTA_SINGLE_RESOURCES<br>
> VIP (ocf::heartbeat:IPaddr2): Started acspcmk-01<br>
> lta-subscription-backend-ope-s1 (systemd:lta-subscription-<br>
> backend-ope-s1): Started acspcmk-01<br>
> lta-subscription-backend-ope-s2 (systemd:lta-subscription-<br>
> backend-ope-s2): Started acspcmk-01<br>
> lta-subscription-backend-ope-s3 (systemd:lta-subscription-<br>
> backend-ope-s3): Stopped<br>
> s1ltaquotaservice (systemd:s1ltaquotaservice): Stopped<br>
> s2ltaquotaservice (systemd:s2ltaquotaservice): Stopped<br>
> s3ltaquotaservice (systemd:s3ltaquotaservice): Stopped<br>
> s1ltarolling (systemd:s1ltarolling): Stopped<br>
> s2ltarolling (systemd:s2ltarolling): Stopped<br>
> s3ltarolling (systemd:s3ltarolling): Stopped<br>
> s1srvnotificationdispatcher <br>
> (systemd:s1srvnotificationdispatcher): Stopped<br>
> s2srvnotificationdispatcher <br>
> (systemd:s2srvnotificationdispatcher): Stopped<br>
> s3srvnotificationdispatcher <br>
> (systemd:s3srvnotificationdispatcher): Stopped<br>
> <br>
> Failed Resource Actions:<br>
> * lta-subscription-backend-ope-s3_start_0 on acspcmk-01 'unknown<br>
> error' (1): call=466, status=complete, exitreason='',<br>
> last-rc-change='Wed Jan 27 13:00:21 2021', queued=0ms,<br>
> exec=2128ms<br>
> <br>
> Daemon Status:<br>
> corosync: active/disabled<br>
> pacemaker: active/disabled<br>
> pcsd: active/enabled<br>
> sbd: active/enabled<br>
> <br>
> <br>
> I hope i explained my problem at my best,<br>
> <br>
> Thanks for your time and help.<br>
> <br>
> Good Evening<br>
> <br>
> Damiano <br>
> <br>
> Il giorno mer 27 gen 2021 alle ore 19:03 Andrei Borzenkov <<br>
> <a href="mailto:arvidjaar@gmail.com" target="_blank">arvidjaar@gmail.com</a>> ha scritto:<br>
> > 27.01.2021 19:06, damiano giuliani пишет:<br>
> > > Hi all im pretty new to the clusters, im struggling trying to<br>
> > configure a<br>
> > > bounch of resources and test how they failover.my need is to<br>
> > start and<br>
> > > manage a group of resources as one (in order to archive this a<br>
> > resource<br>
> > > group has been created), and if one of them cant run and still<br>
> > fails, the<br>
> > > cluster will try to restart the resource group in the secondary<br>
> > node, if it<br>
> > > cant run the all the resource toghter disable all the resource<br>
> > group.<br>
> > > i would like to know if there is a way to set the cluster to<br>
> > disable all<br>
> > > the resources of the group (or the group itself) if it cant be<br>
> > run all the<br>
> > > resoruces somewhere.<br>
> > > <br>
> > <br>
> > That's what pacemaker group does. I am not sure what you mean with<br>
> > "disable all resources". If resource fail count on a node exceeds<br>
> > threshold, this node is banned from running resource. If resource<br>
> > failed<br>
> > on every node, no node can run it until you clear fail count.<br>
> > <br>
> > "Disable resource" in pacemaker would mean setting its target-role<br>
> > to<br>
> > stopped. That does not happen automatically (at least I am not<br>
> > aware of it).<br>
> > _______________________________________________<br>
> > Manage your subscription:<br>
> > <a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
> > <br>
> > ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>
> <br>
> _______________________________________________<br>
> Manage your subscription:<br>
> <a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
> <br>
> ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>
-- <br>
Ken Gaillot <<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>><br>
<br>
_______________________________________________<br>
Manage your subscription:<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>
</blockquote></div>