[ClusterLabs] pcs status show all the resource state are on stopped status after one resource failed to start
Strahil Nikolov
hunter86_bg at yahoo.com
Mon Mar 2 12:37:38 EST 2020
On March 2, 2020 2:58:32 PM GMT+02:00, Amit Nakum <amit.nakum at ecosmob.com> wrote:
>Dear users,
>
>I have been facing one strange issue on active active HA configuration.
>when any resource service failed to start,pacemaker is unable to start
>resource on another cluster host.All the resource in cluster showing
>stopped status.when i run pcs resource refresh, resource move and start
>on
>another cluster host. I am unable to find issue,why cluster is not able
>to
>start on other host without resource refresh.
>
>Below is log i see when resource is failed
>-------------------------------------------
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]: warning: Processing
>failed monitor of php-fpm:1 on testsrv2: not running
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]: warning: Forcing
>php-fpm-clone away from testsrv2 after 1 failures (max=1)
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]: warning: Forcing
>php-fpm-clone away from testsrv2 after 1 failures (max=1)
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]: notice: * Stop
>Cluster_eno4 ( testsrv2 ) due to node availability
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]: notice: * Stop
>Cluster_eno1 ( testsrv2 ) due to node availability
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]: notice: * Stop
>Cluster_eno3 ( testsrv2 ) due to node availability
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]: notice: * Stop
>php-fpm:1 ( testsrv2 ) due to node availability
>Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]: warning: Calculated
>transition 14 (with warnings), saving inputs in
>/var/lib/pacemaker/pengine/pe-warn-1055.bz2
>Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]: notice: Initiating
>stop
>operation Cluster_eno4_stop_0 locally on testsrv2
>Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]: notice: Initiating
>stop
>operation Cluster_eno1_stop_0 locally on testsrv2
>Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]: notice: Initiating
>stop
>operation Cluster_eno3_stop_0 locally on testsrv2
>Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]: notice: Initiating
>stop
>operation php-fpm_stop_0 locally on testsrv2
>Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]: notice: Result of stop
>operation for Cluster_eno1 on testsrv2: 0 (ok)
>Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]: notice: Result of stop
>operation for Cluster_eno4 on testsrv2: 0 (ok)
>Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]: notice: Result of stop
>operation for Cluster_eno3 on testsrv2: 0 (ok)
>Mar 02 07:25:50 olp-gen-sbca02.lan crmd[4522]: notice: Result of stop
>operation for php-fpm on testsrv2: 0 (ok)
>Mar 02 07:25:50 olp-gen-sbca02.lan crmd[4522]: notice: Transition 14
>(Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>Source=/var/lib/pacemaker/pengine/pe-warn-1055.bz2): Complete
>Mar 02 07:25:50 olp-gen-sbca02.lan crmd[4522]: notice: State
>transition
>S_TRANSITION_ENGINE -> S_IDLE
>
>
>Below is version of OS and pacemaker configuration
>-----------------------------------------------------
>Corosync Cluster Engine, version '2.4.3'
>Pace Maker version '1.1.20'
>OS: CentOS Linux release 7.6.1810
>
>
>I have configured 3 VIP for different interface and clone all other
>resources as active active. Below is step i try to configured for
>cluster
>
>pcs property set stonith-enabled=false
>pcs property set no-quorum-policy=ignore
>pcs resource defaults resource-stickiness=INFINITY
>pcs resource create Cluster_eno4 ocf:heartbeat:IPaddr2 ip=38.xx.xxx.xxx
>cidr_netmask=32 op monitor interval=20s
>pcs resource create Cluster_eno1 ocf:heartbeat:IPaddr2 ip=10.xxx.x.xxx
>cidr_netmask=32 op monitor interval=20s
>pcs resource create Cluster_eno3 ocf:heartbeat:IPaddr2 ip=209.xx.xx.xxx
>cidr_netmask=32 op monitor interval=20s
>
>pcs resource create haproxy systemd:haproxy.service op monitor
>interval=5s
>meta failure-timeout=60s migration-threshold=1
>pcs resource create cluster_rtpengine systemd:rtpengine.service op
>monitor
>interval=5s meta failure-timeout=60s migration-threshold=1
>pcs resource create cluster_opensips systemd:opensips.service op
>monitor
>interval=5s meta failure-timeout=60s migration-threshold=1
>pcs resource create nginx systemd:nginx.service op monitor interval=5s
>meta
>failure-timeout=60s migration-threshold=1
>pcs resource create php-fpm systemd:php73-php-fpm.service op monitor
>interval=5s meta failure-timeout=60s migration-threshold=1
>pcs resource create consumer systemd:consumer.service op monitor
>interval=5s meta failure-timeout=60s migration-threshold=1
>
>pcs resource clone haproxy globally-unique=true clone-max=2
>clone-node-max=1
>pcs resource clone cluster_rtpengine globally-unique=true clone-max=1
>clone-node-max=1
>pcs resource clone cluster_opensips globally-unique=true clone-max=2
>clone-node-max=1
>pcs resource clone nginx globally-unique=true clone-max=2
>clone-node-max=1
>pcs resource clone php-fpm globally-unique=true clone-max=2
>clone-node-max=1
>pcs resource clone consumer globally-unique=true clone-max=2
>clone-node-max=1
>
>
>pcs constraint colocation add Cluster_eno4 with haproxy-clone
>pcs constraint colocation add Cluster_eno1 with haproxy-clone
>pcs constraint colocation add Cluster_eno3 with haproxy-clone
>pcs constraint colocation add Cluster_eno4 with cluster_rtpengine-clone
>pcs constraint colocation add Cluster_eno1 with cluster_rtpengine-clone
>pcs constraint colocation add Cluster_eno3 with cluster_rtpengine-clone
>pcs constraint colocation add Cluster_eno4 with cluster_opensips-clone
>pcs constraint colocation add Cluster_eno1 with cluster_opensips-clone
>pcs constraint colocation add Cluster_eno3 with cluster_opensips-clone
>pcs constraint colocation add Cluster_eno4 with nginx-clone
>pcs constraint colocation add Cluster_eno1 with nginx-clone
>pcs constraint colocation add Cluster_eno3 with nginx-clone
>pcs constraint colocation add Cluster_eno4 with php-fpm-clone
>pcs constraint colocation add Cluster_eno1 with php-fpm-clone
>pcs constraint colocation add Cluster_eno3 with php-fpm-clone
>pcs constraint colocation add Cluster_eno4 with consumer-clone
>pcs constraint colocation add Cluster_eno1 with consumer-clone
>pcs constraint colocation add Cluster_eno3 with consumer-clone
>
>
> Cluster_eno4 (ocf::heartbeat:IPaddr2): Stopped testsrv1
> Cluster_eno1 (ocf::heartbeat:IPaddr2): Stopped testsrv1
> Cluster_eno3 (ocf::heartbeat:IPaddr2): Stopped testsrv1
> Clone Set: haproxy-clone [haproxy]
> haproxy (systemd:haproxy.service): stopped testsrv1
> haproxy (systemd:haproxy.service): Stopped testsrv2
> Started: [ testsrv1 testsrv2 ]
> Clone Set: cluster_rtpengine-clone [cluster_rtpengine]
> cluster_rtpengine (systemd:rtpengine.service): Stopped testsrv1
> Started: [ testsrv1 ]
> Clone Set: cluster_opensips-clone [cluster_opensips]
> cluster_opensips (systemd:opensips.service): Stopped testsrv1
> cluster_opensips (systemd:opensips.service): Stopped testsrv2
> Started: [ testsrv1 testsrv2 ]
> Clone Set: nginx-clone [nginx]
> nginx (systemd:nginx.service): Stopped testsrv1
> nginx (systemd:nginx.service): Stopped testsrv2
> Started: [ testsrv1 testsrv2 ]
> Clone Set: php-fpm-clone [php-fpm]
> php-fpm (systemd:php73-php-fpm.service): Stopped testsrv1
> php-fpm (systemd:php73-php-fpm.service): failed testsrv2
> Started: [ testsrv1 testsrv2 ]
> Clone Set: consumer-clone [consumer]
> consumer (systemd:consumer.service): Stopped testsrv1
> consumer (systemd:consumer.service): Stopped testsrv2
> Started: [ testsrv1 testsrv2 ]
>
>Can anyone guide me in solving this problem
I don't know for the others but my first thought was 'what a mess!?!' .
Don't get insulted, but this doesn't make sense:
pcs constraint colocation add Cluster_eno4 with haproxy-clone
pcs constraint colocation add Cluster_eno4 with cluster_rtpengine-clone
pcs constraint colocation add Cluster_eno4 with cluster_opensips-clone
...
And many other rules. So with which one should the IP resource 'Cluster_eno4' be colocated ?
Also, no stonith ?!? Not very smart.
What are you trying to do?
Each node to have an IP and then the rest to be cloned ?
Best Regards,
Strahil Nikolov
More information about the Users
mailing list