[ClusterLabs] pcs status show all the resource state are on stopped status after one resource failed to start

Mon Mar 2 12:37:38 EST 2020

On March 2, 2020 2:58:32 PM GMT+02:00, Amit Nakum <amit.nakum at ecosmob.com> wrote:
>Dear users,
>
>I have been facing one strange issue on active active HA configuration.
>when any resource service failed to start,pacemaker is unable to start
>resource on another cluster host.All the resource in cluster showing
>stopped status.when i run pcs resource refresh, resource move and start
>on
>another cluster host. I am unable to find issue,why cluster is not able
>to
>start on other host without resource refresh.
>
>Below is log i see when resource is failed
>-------------------------------------------
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:  warning: Processing
>failed monitor of php-fpm:1 on testsrv2: not running
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:  warning: Forcing
>php-fpm-clone away from testsrv2 after 1 failures (max=1)
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:  warning: Forcing
>php-fpm-clone away from testsrv2 after 1 failures (max=1)
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:   notice:  * Stop
>Cluster_eno4           ( testsrv2 )   due to node availability
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:   notice:  * Stop
>Cluster_eno1           ( testsrv2 )   due to node availability
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:   notice:  * Stop
>Cluster_eno3           ( testsrv2 )   due to node availability
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:   notice:  * Stop
>php-fpm:1              ( testsrv2 )   due to node availability
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:  warning: Calculated
>transition 14 (with warnings), saving inputs in
>/var/lib/pacemaker/pengine/pe-warn-1055.bz2
>Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]:   notice: Initiating
>stop
>operation Cluster_eno4_stop_0 locally on testsrv2
>Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]:   notice: Initiating
>stop
>operation Cluster_eno1_stop_0 locally on testsrv2
>Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]:   notice: Initiating
>stop
>operation Cluster_eno3_stop_0 locally on testsrv2
>Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]:   notice: Initiating
>stop
>operation php-fpm_stop_0 locally on testsrv2
>Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]:   notice: Result of stop
>operation for Cluster_eno1 on testsrv2: 0 (ok)
>Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]:   notice: Result of stop
>operation for Cluster_eno4 on testsrv2: 0 (ok)
>Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]:   notice: Result of stop
>operation for Cluster_eno3 on testsrv2: 0 (ok)
>Mar 02 07:25:50 olp-gen-sbca02.lan crmd[4522]:   notice: Result of stop
>operation for php-fpm on testsrv2: 0 (ok)
>Mar 02 07:25:50 olp-gen-sbca02.lan crmd[4522]:   notice: Transition 14
>(Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>Source=/var/lib/pacemaker/pengine/pe-warn-1055.bz2): Complete
>Mar 02 07:25:50 olp-gen-sbca02.lan crmd[4522]:   notice: State
>transition
>S_TRANSITION_ENGINE -> S_IDLE
>
>
>Below is version of OS and pacemaker configuration
>-----------------------------------------------------
>Corosync Cluster Engine, version '2.4.3'
>Pace Maker version '1.1.20'
>OS: CentOS Linux release 7.6.1810
>
>
>I have configured 3 VIP for different interface and clone all other
>resources as active active. Below is step i try to configured for
>cluster
>
>pcs property set stonith-enabled=false
>pcs property set no-quorum-policy=ignore
>pcs resource defaults resource-stickiness=INFINITY
>pcs resource create Cluster_eno4 ocf:heartbeat:IPaddr2 ip=38.xx.xxx.xxx
>cidr_netmask=32 op monitor interval=20s
>pcs resource create Cluster_eno1 ocf:heartbeat:IPaddr2 ip=10.xxx.x.xxx
>cidr_netmask=32 op monitor interval=20s
>pcs resource create Cluster_eno3 ocf:heartbeat:IPaddr2 ip=209.xx.xx.xxx
>cidr_netmask=32 op monitor interval=20s
>
>pcs resource create haproxy systemd:haproxy.service op monitor
>interval=5s
>meta failure-timeout=60s migration-threshold=1
>pcs resource create cluster_rtpengine systemd:rtpengine.service op
>monitor
>interval=5s meta failure-timeout=60s migration-threshold=1
>pcs resource create cluster_opensips systemd:opensips.service op
>monitor
>interval=5s meta failure-timeout=60s migration-threshold=1
>pcs resource create nginx systemd:nginx.service op monitor interval=5s
>meta
>failure-timeout=60s migration-threshold=1
>pcs resource create php-fpm systemd:php73-php-fpm.service op monitor
>interval=5s meta failure-timeout=60s migration-threshold=1
>pcs resource create consumer systemd:consumer.service op monitor
>interval=5s meta failure-timeout=60s migration-threshold=1
>
>pcs resource clone haproxy globally-unique=true clone-max=2
>clone-node-max=1
>pcs resource clone cluster_rtpengine globally-unique=true clone-max=1
>clone-node-max=1
>pcs resource clone cluster_opensips globally-unique=true clone-max=2
>clone-node-max=1
>pcs resource clone nginx globally-unique=true clone-max=2
>clone-node-max=1
>pcs resource clone php-fpm globally-unique=true clone-max=2
>clone-node-max=1
>pcs resource clone consumer globally-unique=true clone-max=2
>clone-node-max=1
>
>
>pcs constraint colocation add Cluster_eno4 with haproxy-clone
>pcs constraint colocation add Cluster_eno1 with haproxy-clone
>pcs constraint colocation add Cluster_eno3 with haproxy-clone
>pcs constraint colocation add Cluster_eno4 with cluster_rtpengine-clone
>pcs constraint colocation add Cluster_eno1 with cluster_rtpengine-clone
>pcs constraint colocation add Cluster_eno3 with cluster_rtpengine-clone
>pcs constraint colocation add Cluster_eno4 with cluster_opensips-clone
>pcs constraint colocation add Cluster_eno1 with cluster_opensips-clone
>pcs constraint colocation add Cluster_eno3 with cluster_opensips-clone
>pcs constraint colocation add Cluster_eno4 with nginx-clone
>pcs constraint colocation add Cluster_eno1 with nginx-clone
>pcs constraint colocation add Cluster_eno3 with nginx-clone
>pcs constraint colocation add Cluster_eno4 with php-fpm-clone
>pcs constraint colocation add Cluster_eno1 with php-fpm-clone
>pcs constraint colocation add Cluster_eno3 with php-fpm-clone
>pcs constraint colocation add Cluster_eno4 with consumer-clone
>pcs constraint colocation add Cluster_eno1 with consumer-clone
>pcs constraint colocation add Cluster_eno3 with consumer-clone
>
>
> Cluster_eno4   (ocf::heartbeat:IPaddr2):       Stopped testsrv1
> Cluster_eno1   (ocf::heartbeat:IPaddr2):       Stopped testsrv1
> Cluster_eno3   (ocf::heartbeat:IPaddr2):       Stopped testsrv1
> Clone Set: haproxy-clone [haproxy]
>     haproxy    (systemd:haproxy.service):      stopped testsrv1
>     haproxy    (systemd:haproxy.service):      Stopped testsrv2
>     Started: [ testsrv1 testsrv2 ]
> Clone Set: cluster_rtpengine-clone [cluster_rtpengine]
>    cluster_rtpengine  (systemd:rtpengine.service):    Stopped testsrv1
>     Started: [ testsrv1 ]
> Clone Set: cluster_opensips-clone [cluster_opensips]
>    cluster_opensips   (systemd:opensips.service):     Stopped testsrv1
>    cluster_opensips   (systemd:opensips.service):     Stopped testsrv2
>     Started: [ testsrv1 testsrv2 ]
> Clone Set: nginx-clone [nginx]
>     nginx      (systemd:nginx.service):        Stopped testsrv1
>     nginx      (systemd:nginx.service):        Stopped testsrv2
>     Started: [ testsrv1 testsrv2 ]
> Clone Set: php-fpm-clone [php-fpm]
>    php-fpm    (systemd:php73-php-fpm.service):        Stopped testsrv1
>     php-fpm    (systemd:php73-php-fpm.service):        failed testsrv2
>     Started: [ testsrv1 testsrv2 ]
> Clone Set: consumer-clone [consumer]
>     consumer   (systemd:consumer.service):     Stopped testsrv1
>     consumer   (systemd:consumer.service):     Stopped testsrv2
>     Started: [ testsrv1 testsrv2 ]
>
>Can anyone guide me in solving this problem

I don't know for the others but my first thought was 'what a mess!?!' .
Don't get insulted, but this doesn't  make sense:
pcs constraint colocation add Cluster_eno4 with haproxy-clone

pcs constraint colocation add Cluster_eno4 with cluster_rtpengine-clone

pcs constraint colocation add Cluster_eno4 with cluster_opensips-clone

...
And  many other  rules. So with which one should the IP resource  'Cluster_eno4'  be colocated  ?

Also,  no stonith ?!?  Not very smart.

What are you trying to do? 

Each node to have an IP and then the rest to be cloned ?

Best Regards,
Strahil Nikolov