[ClusterLabs] Issue in resource constraints and fencing - RHEL7 - AWS EC2

Fri May 20 21:53:51 UTC 2016

On 05/20/2016 10:02 AM, Pratip Ghosh wrote:
> Hi All,
> 
> I am implementing 2 node RedHat (RHEL 7.2) HA cluster on Amazon EC2
> instance. For floating IP I am using a shell script provided by AWS so
> that virtual IP float to another instance if any one server failed with
> health check. In basic level cluster is working but I have 2 issues on
> that which I describe in bellow.
> 
> ISSUE 1
> =====
> Now I need to configure fencing/STONITH to avoid split brain scenario in
> storage cluster. I want to use multi-primari (Active/Active) DRBD in my
> cluster for distributed storage. Is it possible to configure power
> fencing on AWS EC2 instance? Can any one please guide me on this?

There has been some discussion about this on this list before -- see
http://search.gmane.org/?query=ec2&group=gmane.comp.clustering.clusterlabs.user

Basically, there is an outdated agent available at
https://github.com/beekhof/fence_ec2 and a newer fork of it in the
(RHEL-incompatible) cluster-glue package. So with some work you may be
able to get something working.

> 
> ISSUE2
> =====
> Currently I am using single  primary DRBD distributed storage. I added
> cluster resources so that if any cluster node goes down then another
> cluster node will promoted DRBD volume as primary and mount it on
> /var/www/html.
> 
> This configuration is working but for only if cluster node1 goes down.
> If cluster node2 goes down all cluster resources fails over towards
> cluster node1 but whenever cluster node2 again become on-line then
> virtual_ip (cluster ip) ownership automatically goes towards cluster
> node2 again. All the remaining resources not failed over like that. In
> that case secondary IP stays with Node1 and ownership goes to Node2.
> 
> I think this is an issue with resource stickiness or resource constraint
> but here I am totally clueless. Can any one please help me on this?
> 
> 
> My cluster details:
> ===========
> 
> [root at drbd01 ~]# pcs config
> Cluster Name: web_cluster
> Corosync Nodes:
>  ec2-52-24-8-124.us-west-2.compute.amazonaws.com
> ec2-52-27-70-12.us-west-2.compute.amazonaws.com
> Pacemaker Nodes:
>  ec2-52-24-8-124.us-west-2.compute.amazonaws.com
> ec2-52-27-70-12.us-west-2.compute.amazonaws.com
> 
> Resources:
>  Resource: virtual_ip (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=10.98.70.100 cidr_netmask=24
>   Operations: start interval=0s timeout=20s (virtual_ip-start-interval-0s)
>               stop interval=0s timeout=20s (virtual_ip-stop-interval-0s)
>               monitor interval=30s (virtual_ip-monitor-interval-30s)
>  Resource: WebSite (class=ocf provider=heartbeat type=apache)
>   Attributes: configfile=/etc/httpd/conf/httpd.conf
> statusurl=http://10.98.70.100/server-status
>   Operations: start interval=0s timeout=40s (WebSite-start-interval-0s)
>               stop interval=0s timeout=60s (WebSite-stop-interval-0s)
>               monitor interval=1min (WebSite-monitor-interval-1min)
>  Master: WebDataClone
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
> clone-node-max=1 notify=true
>   Resource: WebData (class=ocf provider=linbit type=drbd)
>    Attributes: drbd_resource=r1
>    Operations: start interval=0s timeout=240 (WebData-start-interval-0s)
>                promote interval=0s timeout=90 (WebData-promote-interval-0s)
>                demote interval=0s timeout=90 (WebData-demote-interval-0s)
>                stop interval=0s timeout=100 (WebData-stop-interval-0s)
>                monitor interval=60s (WebData-monitor-interval-60s)
>  Resource: WebFS (class=ocf provider=heartbeat type=Filesystem)
>   Attributes: device=/dev/drbd1 directory=/var/www/html fstype=xfs
>   Operations: start interval=0s timeout=60 (WebFS-start-interval-0s)
>               stop interval=0s timeout=60 (WebFS-stop-interval-0s)
>               monitor interval=20 timeout=40 (WebFS-monitor-interval-20)
> 
> Stonith Devices:
> Fencing Levels:
> 
> Location Constraints:
> Ordering Constraints:
>   promote WebDataClone then start WebFS (kind:Mandatory)
> (id:order-WebDataClone-WebFS-mandatory)
>   start WebFS then start virtual_ip (kind:Mandatory)
> (id:order-WebFS-virtual_ip-mandatory)
>   start virtual_ip then start WebSite (kind:Mandatory)
> (id:order-virtual_ip-WebSite-mandatory)
> Colocation Constraints:
>   WebSite with virtual_ip (score:INFINITY)
> (id:colocation-WebSite-virtual_ip-INFINITY)
>   WebFS with WebDataClone (score:INFINITY) (with-rsc-role:Master)
> (id:colocation-WebFS-WebDataClone-INFINITY)
>   WebSite with WebFS (score:INFINITY)
> (id:colocation-WebSite-WebFS-INFINITY)
> 
> Resources Defaults:
>  resource-stickiness: INFINITY

You don't have any constraints requiring virtual_ip to stay with any
other resource. So it doesn't.

You could colocate virtual_ip with WebFS, and drop the colocation of
WebSite with WebFS, but it would probably be easier to configure a group
with WebFS, virtual_ip, WebSite, and WebFS. Then you would only need
promote WebDataClone then start the new group, and you could get rid of
all the other constraints.

> Operations Defaults:
>  timeout: 240s
> 
> Cluster Properties:
>  cluster-infrastructure: corosync
>  cluster-name: web_cluster
>  dc-version: 1.1.13-10.el7-44eb2dd
>  default-resource-stickiness: INFINITY
>  have-watchdog: false
>  no-quorum-policy: ignore
>  stonith-action: poweroff
>  stonith-enabled: false
> 
> 
> 
> Regards,
> Pratip Ghosh.
>