[ClusterLabs] Active/Active Cloned resources Asterisk+GFS2+DLM+fence_xvm Cluster

Mon Jul 18 05:57:28 EDT 2016

On 07/16/2016 04:12 PM, TEG AMJG wrote:
> Dear list
> I am quite new to PaceMaker and i am configuring a two node
> active/active cluster which consist basically on something like this:
>
> My whole configuration is this one:
>
> Stack: corosync
> Current DC: pbx2vs3 (version 1.1.13-10.el7_2.2-44eb2dd) - partition
> with quorum
> 2 nodes and 10 resources configured
>
> Online: [ pbx1vs3 pbx2vs3 ]
>
> Full list of resources:
>
>  Clone Set: dlm-clone [dlm]
>      Started: [ pbx1vs3 pbx2vs3 ]
>  Clone Set: asteriskfs-clone [asteriskfs]
>      Started: [ pbx1vs3 pbx2vs3 ]
>  Clone Set: asterisk-clone [asterisk]
>      Started: [ pbx1vs3 pbx2vs3 ]
>  fence_pbx2_xvm    (stonith:fence_xvm):    Started pbx2vs3
>  fence_pbx1_xvm    (stonith:fence_xvm):    Started pbx1vs3
>  Clone Set: clvmd-clone [clvmd]
>      Started: [ pbx1vs3 pbx2vs3 ]
>
> PCSD Status:
>   pbx1vs3: Online
>   pbx2vs3: Online
>
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
> [root at pbx1 ~]# pcs config show
> Cluster Name: asteriskcluster
> Corosync Nodes:
>  pbx1vs3 pbx2vs3 
> Pacemaker Nodes:
>  pbx1vs3 pbx2vs3 
>
> Resources: 
>  Clone: dlm-clone
>   Meta Attrs: clone-max=2 clone-node-max=1 interleave=true 
>   Resource: dlm (class=ocf provider=pacemaker type=controld)
>    Attributes: allow_stonith_disabled=false 
>    Operations: start interval=0s timeout=90 (dlm-start-interval-0s)
>                stop interval=0s on-fail=fence (dlm-stop-interval-0s)
>                monitor interval=60s on-fail=fence
> (dlm-monitor-interval-60s)
>  Clone: asteriskfs-clone
>   Meta Attrs: interleave=true clone-max=2 clone-node-max=1 
>   Resource: asteriskfs (class=ocf provider=heartbeat type=Filesystem)
>    Attributes: device=/dev/vg_san1/lv_pbx directory=/mnt/asterisk
> fstype=gfs2 
>    Operations: start interval=0s timeout=60 (asteriskfs-start-interval-0s)
>                stop interval=0s on-fail=fence
> (asteriskfs-stop-interval-0s)
>                monitor interval=60s on-fail=fence
> (asteriskfs-monitor-interval-60s)
>  Clone: asterisk-clone
>   Meta Attrs: interleaved=true
> sipp_monitor=/root/scripts/haasterisk.sh
> sipp_binary=/usr/local/src/sipp-3.4.1/bin/sipp globally-unique=false
> ordered=false interleave=true clone-max=2 clone-node-max=1 notify=true 
>   Resource: asterisk (class=ocf provider=heartbeat type=asterisk)
>    Attributes: user=root group=root
> config=/mnt/asterisk/etc/asterisk.conf
> sipp_monitor=/root/scripts/haasterisk.sh
> sipp_binary=/usr/local/src/sipp-3.4.1/bin/sipp maxfiles=65535 
>    Operations: start interval=0s timeout=40s (asterisk-start-interval-0s)
>                stop interval=0s on-fail=fence (asterisk-stop-interval-0s)
>                monitor interval=10s (asterisk-monitor-interval-10s)
>  Clone: clvmd-clone
>   Meta Attrs: clone-max=2 clone-node-max=1 interleave=true 
>   Resource: clvmd (class=ocf provider=heartbeat type=clvm)
>    Operations: start interval=0s timeout=90 (clvmd-start-interval-0s)
>                monitor interval=30s on-fail=fence
> (clvmd-monitor-interval-30s)
>                stop interval=0s on-fail=fence (clvmd-stop-interval-0s)
>
> Stonith Devices: 
>  Resource: fence_pbx2_xvm (class=stonith type=fence_xvm)
>   Attributes: port=tegamjg_pbx2 pcmk_host_list=pbx2vs3 
>   Operations: monitor interval=60s (fence_pbx2_xvm-monitor-interval-60s)
>  Resource: fence_pbx1_xvm (class=stonith type=fence_xvm)
>   Attributes: port=tegamjg_pbx1 pcmk_host_list=pbx1vs3 
>   Operations: monitor interval=60s (fence_pbx1_xvm-monitor-interval-60s)
> Fencing Levels: 
>
> Location Constraints:
> Ordering Constraints:
>   start fence_pbx1_xvm then start fence_pbx2_xvm (kind:Mandatory)
> (id:order-fence_pbx1_xvm-fence_pbx2_xvm-mandatory)
>   start fence_pbx2_xvm then start dlm-clone (kind:Mandatory)
> (id:order-fence_pbx2_xvm-dlm-clone-mandatory)
>   start dlm-clone then start clvmd-clone (kind:Mandatory)
> (id:order-dlm-clone-clvmd-clone-mandatory)
>   start clvmd-clone then start asteriskfs-clone (kind:Mandatory)
> (id:order-clvmd-clone-asteriskfs-clone-mandatory)
>   start asteriskfs-clone then start asterisk-clone (kind:Mandatory)
> (id:order-asteriskfs-clone-asterisk-clone-mandatory)
> Colocation Constraints:
>   clvmd-clone with dlm-clone (score:INFINITY)
> (id:colocation-clvmd-clone-dlm-clone-INFINITY)
>   asteriskfs-clone with clvmd-clone (score:INFINITY)
> (id:colocation-asteriskfs-clone-clvmd-clone-INFINITY)
>   asterisk-clone with asteriskfs-clone (score:INFINITY)
> (id:colocation-asterisk-clone-asteriskfs-clone-INFINITY)
>
> Resources Defaults:
>  migration-threshold: 2
>  failure-timeout: 10m
>  start-failure-is-fatal: false
> Operations Defaults:
>  No defaults set
>
> Cluster Properties:
>  cluster-infrastructure: corosync
>  cluster-name: asteriskcluster
>  dc-version: 1.1.13-10.el7_2.2-44eb2dd
>  have-watchdog: false
>  last-lrm-refresh: 1468598829
>  no-quorum-policy: ignore
>  stonith-action: reboot
>  stonith-enabled: true
>
> Now my problem is that, for example, when i fence one of the nodes,
> the other one restarts every clone resource and start them back again,
> same thing happens when i stop pacemaker and corosync in one node only
> (pcs cluster stop). That would mean that if i have a problem in one of
> my Asterisk (for example in DLM resource or CLVMD) that would require
> fencing right away, for example node pbx2vs3, the other node (pbx1vs3)
> will restart every service which will drop all my calls in a well
> functioning node. To be even more general, this happens every time a
> resource needs stop/start or restart on any node it requires to be
> done on every node in the cluster.

Guess this behavior is due to the order-constraints you defined for the
stonith-resources.
You probably have one of them running on each node if everything is fine
and when
you remove a node one of the stonith-resources is gone - everything else
depends on that -
so everything is shut down - the stonith-resource is moved - everything
is started again.
Why do you have separate resources for fencing the nodes? fence_xvm can
be used
for a list of nodes. You should be able to clone the stonith-resources
as well so that
you have one that can fence both nodes on each of the nodes.

>
> All this leads to a basic question, is this a strict way for clone
> resources to behave?, is it possible to configure them so they would
> behave, dare i say, in a more unique way (i know about the option
> globally-unique but as far as i understand that doesnt do the work). I
> have been reading about clone resources for a while but there are no
> many examples about what it cant do.
>
> There are some meta operations that doesnt make sense, sorry about
> that, the problem is that i dont know how to delete them with PCSD :).
> Now, I found something interesting about constraint ordering with
> clone resources in "Pacemaker Explained" documentation, which
> describes something like this:
> /
> "<constraints>
> <rsc_location id="clone-prefers-node1" rsc="apache-clone" node="node1"
> score="500"/>
> <rsc_colocation id="stats-with-clone" rsc="apache-stats"
> with="apache-clone"/>
> <rsc_order id="start-clone-then-stats" first="apache-clone"
> then="apache-stats"/>
> </constraints>"
>
> "Ordering constraints behave slightly differently for clones. In the
> example above, apache-stats will
> wait until all copies of apache-clone that need to be started have
> done so before being started itself.
> Only if no copies can be started will apache-stats be prevented from
> being active. Additionally, the
> clone will wait for apache-stats to be stopped before stopping itself".
>
> /
> I am not sure if that has something to do with it, but i cannot
> destroy the whole cluster to test it and probably in vain.
>
> Thank you very much. Regards
>
> Alejandro
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org