[ClusterLabs] Unable to perform resource failover.

Tue Nov 7 10:20:07 EST 2017

On Tue, 2017-11-07 at 10:30 +0000, Garima wrote:
> Hi All,
>  
> I am new in pacemaker corosync.
>  
> I have created a simple environment with 2 nodes(Active/Passive)
> having 2 resources.
> Resources:
> One resource is added on VIP.
> Other resource is added as Httpd apache service.
>  
> [root at node1 ~]# pcs resource show Httpd
> Resource: Httpd (class=ocf provider=heartbeat type=apache)
>   Attributes: configfile=/etc/httpd/conf/httpd.conf
>   Operations: monitor interval=30s (Httpd-monitor-interval-30s)
>               start interval=0s timeout=40s (Httpd-start-interval-0s)
>               stop interval=0s timeout=60s (Httpd-stop-interval-0s)
> [root at node1 ~]# pcs resource show Cluster_VIP
> Resource: Cluster_VIP (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: cidr_netmask=32 ip=10.0.4.99
>   Operations: monitor interval=20s (Cluster_VIP-monitor-interval-20s)
>               start interval=0s timeout=20s (Cluster_VIP-start-
> interval-0s)
>               stop interval=0s timeout=20s (Cluster_VIP-stop-
> interval-0s)
>  
> [root at node1 ~]# pcs status
> Cluster name: Cluster
> Stack: corosync
> Current DC: node2 (version 1.1.16-12.el7_4.4-94ff4df) - partition
> with quorum
> Last updated: Tue Nov  7 15:09:40 2017
> Last change: Tue Nov  7 15:03:22 2017 by root via cibadmin on node1
> 2 nodes configured
> 2 resources configured
> Online: [ node1 node2 ]
> Full list of resources:
> Cluster_VIP    (ocf::heartbeat:IPaddr2):       Started node1
> Httpd  (ocf::heartbeat:apache):        Started node1
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
>  
> To check and kill  process ID(pid) of httpd by using command:
> ·         ps –aef | grep httpd
>  
> [root at node1 ~]# ps -aef | grep httpd
> root      4392     1  0 15:03 ?        00:00:00 /sbin/httpd -DSTATUS
> -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
> apache    4393  4392  0 15:03 ?        00:00:00 /sbin/httpd -DSTATUS
> -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
> apache    4394  4392  0 15:03 ?        00:00:00 /sbin/httpd -DSTATUS
> -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
> apache    4395  4392  0 15:03 ?        00:00:00 /sbin/httpd -DSTATUS
> -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
> apache    4396  4392  0 15:03 ?        00:00:00 /sbin/httpd -DSTATUS
> -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
> apache    4397  4392  0 15:03 ?        00:00:00 /sbin/httpd -DSTATUS
> -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
>  
> [root at node1 ~]# kill -9 4392
>  
> I am trying to do resource failover by killing pid of httpd.
> Observation:
> I observed that resource failover is not happing after killing the
> pid. Status of resource(Httpd) remain started on node1.
> We don’t want to use resource move ”pcs resource move Httpd” and
> resource disable”pcs resource disable httpd” command for this.
>  
> Query:
> What is the issue in our approach ?

Pacemaker's default recovery behavior for service failures is not
failover, but restart. Chances are, pacemaker restarted httpd in the
above situation, and the outage was short enough that you didn't notice
it. You could check the pid of httpd afterward to see if it's the same
or a new one.

As discussed elsewhere in this thread, you also want to make sure that
your operating system is not managing the httpd process (via systemd,
upstart, lsb init, etc.).

> How we can achieve a resources failover?

migration-threshold=1

>  
> Further I will use this environment for testing the migration-
> threshold.
> Any suggestions regarding this also welcome.
>  
> TIA
>  
> Regards,
> Garima
-- 
Ken Gaillot <kgaillot at redhat.com>