[ClusterLabs] pcs status show all the resource state are on stopped status after one resource failed to start

Mon Mar 2 07:58:32 EST 2020

Dear users,

I have been facing one strange issue on active active HA configuration.
when any resource service failed to start,pacemaker is unable to start
resource on another cluster host.All the resource in cluster showing
stopped status.when i run pcs resource refresh, resource move and start on
another cluster host. I am unable to find issue,why cluster is not able to
start on other host without resource refresh.

Below is log i see when resource is failed
-------------------------------------------
Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:  warning: Processing
failed monitor of php-fpm:1 on testsrv2: not running
Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:  warning: Forcing
php-fpm-clone away from testsrv2 after 1 failures (max=1)
Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:  warning: Forcing
php-fpm-clone away from testsrv2 after 1 failures (max=1)
Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:   notice:  * Stop
Cluster_eno4           ( testsrv2 )   due to node availability
Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:   notice:  * Stop
Cluster_eno1           ( testsrv2 )   due to node availability
Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:   notice:  * Stop
Cluster_eno3           ( testsrv2 )   due to node availability
Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:   notice:  * Stop
php-fpm:1              ( testsrv2 )   due to node availability
Mar 02 07:25:48 olp-gen-sbca02.lan pengine[4521]:  warning: Calculated
transition 14 (with warnings), saving inputs in
/var/lib/pacemaker/pengine/pe-warn-1055.bz2
Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]:   notice: Initiating stop
operation Cluster_eno4_stop_0 locally on testsrv2
Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]:   notice: Initiating stop
operation Cluster_eno1_stop_0 locally on testsrv2
Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]:   notice: Initiating stop
operation Cluster_eno3_stop_0 locally on testsrv2
Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]:   notice: Initiating stop
operation php-fpm_stop_0 locally on testsrv2
Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]:   notice: Result of stop
operation for Cluster_eno1 on testsrv2: 0 (ok)
Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]:   notice: Result of stop
operation for Cluster_eno4 on testsrv2: 0 (ok)
Mar 02 07:25:48 olp-gen-sbca02.lan crmd[4522]:   notice: Result of stop
operation for Cluster_eno3 on testsrv2: 0 (ok)
Mar 02 07:25:50 olp-gen-sbca02.lan crmd[4522]:   notice: Result of stop
operation for php-fpm on testsrv2: 0 (ok)
Mar 02 07:25:50 olp-gen-sbca02.lan crmd[4522]:   notice: Transition 14
(Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-1055.bz2): Complete
Mar 02 07:25:50 olp-gen-sbca02.lan crmd[4522]:   notice: State transition
S_TRANSITION_ENGINE -> S_IDLE

Below is version of OS and pacemaker configuration
-----------------------------------------------------
Corosync Cluster Engine, version '2.4.3'
Pace Maker version '1.1.20'
OS: CentOS Linux release 7.6.1810

I have configured 3 VIP for different interface and clone all other
resources as active active. Below is step i try to configured for cluster

pcs property set stonith-enabled=false
pcs property set no-quorum-policy=ignore
pcs resource defaults resource-stickiness=INFINITY
pcs resource create Cluster_eno4 ocf:heartbeat:IPaddr2 ip=38.xx.xxx.xxx
cidr_netmask=32 op monitor interval=20s
pcs resource create Cluster_eno1 ocf:heartbeat:IPaddr2 ip=10.xxx.x.xxx
cidr_netmask=32 op monitor interval=20s
pcs resource create Cluster_eno3 ocf:heartbeat:IPaddr2 ip=209.xx.xx.xxx
cidr_netmask=32 op monitor interval=20s

pcs resource create haproxy systemd:haproxy.service op monitor interval=5s
meta failure-timeout=60s migration-threshold=1
pcs resource create cluster_rtpengine systemd:rtpengine.service op monitor
interval=5s meta failure-timeout=60s migration-threshold=1
pcs resource create cluster_opensips systemd:opensips.service op monitor
interval=5s meta failure-timeout=60s migration-threshold=1
pcs resource create nginx systemd:nginx.service op monitor interval=5s meta
failure-timeout=60s migration-threshold=1
pcs resource create php-fpm systemd:php73-php-fpm.service op monitor
interval=5s meta failure-timeout=60s migration-threshold=1
pcs resource create consumer systemd:consumer.service op monitor
interval=5s meta failure-timeout=60s migration-threshold=1

pcs resource clone haproxy globally-unique=true clone-max=2 clone-node-max=1
pcs resource clone cluster_rtpengine globally-unique=true clone-max=1
clone-node-max=1
pcs resource clone cluster_opensips globally-unique=true clone-max=2
clone-node-max=1
pcs resource clone nginx globally-unique=true clone-max=2 clone-node-max=1
pcs resource clone php-fpm globally-unique=true clone-max=2 clone-node-max=1
pcs resource clone consumer globally-unique=true clone-max=2
clone-node-max=1

pcs constraint colocation add Cluster_eno4 with haproxy-clone
pcs constraint colocation add Cluster_eno1 with haproxy-clone
pcs constraint colocation add Cluster_eno3 with haproxy-clone
pcs constraint colocation add Cluster_eno4 with cluster_rtpengine-clone
pcs constraint colocation add Cluster_eno1 with cluster_rtpengine-clone
pcs constraint colocation add Cluster_eno3 with cluster_rtpengine-clone
pcs constraint colocation add Cluster_eno4 with cluster_opensips-clone
pcs constraint colocation add Cluster_eno1 with cluster_opensips-clone
pcs constraint colocation add Cluster_eno3 with cluster_opensips-clone
pcs constraint colocation add Cluster_eno4 with nginx-clone
pcs constraint colocation add Cluster_eno1 with nginx-clone
pcs constraint colocation add Cluster_eno3 with nginx-clone
pcs constraint colocation add Cluster_eno4 with php-fpm-clone
pcs constraint colocation add Cluster_eno1 with php-fpm-clone
pcs constraint colocation add Cluster_eno3 with php-fpm-clone
pcs constraint colocation add Cluster_eno4 with consumer-clone
pcs constraint colocation add Cluster_eno1 with consumer-clone
pcs constraint colocation add Cluster_eno3 with consumer-clone

 Cluster_eno4   (ocf::heartbeat:IPaddr2):       Stopped testsrv1
 Cluster_eno1   (ocf::heartbeat:IPaddr2):       Stopped testsrv1
 Cluster_eno3   (ocf::heartbeat:IPaddr2):       Stopped testsrv1
 Clone Set: haproxy-clone [haproxy]
     haproxy    (systemd:haproxy.service):      stopped testsrv1
     haproxy    (systemd:haproxy.service):      Stopped testsrv2
     Started: [ testsrv1 testsrv2 ]
 Clone Set: cluster_rtpengine-clone [cluster_rtpengine]
     cluster_rtpengine  (systemd:rtpengine.service):    Stopped testsrv1
     Started: [ testsrv1 ]
 Clone Set: cluster_opensips-clone [cluster_opensips]
     cluster_opensips   (systemd:opensips.service):     Stopped testsrv1
     cluster_opensips   (systemd:opensips.service):     Stopped testsrv2
     Started: [ testsrv1 testsrv2 ]
 Clone Set: nginx-clone [nginx]
     nginx      (systemd:nginx.service):        Stopped testsrv1
     nginx      (systemd:nginx.service):        Stopped testsrv2
     Started: [ testsrv1 testsrv2 ]
 Clone Set: php-fpm-clone [php-fpm]
     php-fpm    (systemd:php73-php-fpm.service):        Stopped testsrv1
     php-fpm    (systemd:php73-php-fpm.service):        failed testsrv2
     Started: [ testsrv1 testsrv2 ]
 Clone Set: consumer-clone [consumer]
     consumer   (systemd:consumer.service):     Stopped testsrv1
     consumer   (systemd:consumer.service):     Stopped testsrv2
     Started: [ testsrv1 testsrv2 ]

Can anyone guide me in solving this problem

-- 
Thanks & Regards,
Amit Nakum | Sr. Support Engineer
+91 982482283 | Hangout & Skype: amit.nakum at ecosmob.com
[image: Ecosmob Technologies Pvt. Ltd.] <http://www.ecosmob.com/>

Ecosmob Technologies Pvt. Ltd.
https://www.ecosmob.com

VoIP | Web | Mobile | IoT | Big Data

ssdsds
<https://twitter.com/ecosmob>

sasadsdasdasdasdasdas  <https://twitter.com/ecosmob>
<https://www.facebook.com/Ecosmob>   <https://plus.google.com/+Ecosmob>
<https://www.linkedin.com/company/ecosmob>
<https://in.pinterest.com/ecosmob>

-- 
*Disclaimer*
In addition to generic Disclaimer which you have agreed on our 
website, any views or opinions presented in this email are solely those of 
the originator and do not necessarily represent those of the Company or its 
sister concerns. Any liability (in negligence, contract or otherwise) 
arising from any third party taking any action, or refraining from taking 
any action on the basis of any of the information contained in this email 
is hereby excluded.

*Confidentiality*
This communication (including any 
attachment/s) is intended only for the use of the addressee(s) and contains 
information that is PRIVILEGED AND CONFIDENTIAL. Unauthorized reading, 
dissemination, distribution, or copying of this communication is 
prohibited. Please inform originator if you have received it in error.

*Caution for viruses, malware etc.*
This communication, including any 
attachments, may not be free of viruses, trojans, similar or new 
contaminants/malware, interceptions or interference, and may not be 
compatible with your systems. You shall carry out virus/malware scanning on 
your own before opening any attachment to this e-mail. The sender of this 
e-mail and Company including its sister concerns shall not be liable for 
any damage that may incur to you as a result of viruses, incompleteness of 
this message, a delay in receipt of this message or any other computer 
problems. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200302/0b54f6b9/attachment-0001.htm>