[ClusterLabs] Active/Active Cloned resources Asterisk+GFS2+DLM+fence_xvm Cluster

Sat Jul 16 10:12:27 EDT 2016

Dear list
I am quite new to PaceMaker and i am configuring a two node active/active
cluster which consist basically on something like this:

My whole configuration is this one:

Stack: corosync
Current DC: pbx2vs3 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with
quorum
2 nodes and 10 resources configured

Online: [ pbx1vs3 pbx2vs3 ]

Full list of resources:

 Clone Set: dlm-clone [dlm]
     Started: [ pbx1vs3 pbx2vs3 ]
 Clone Set: asteriskfs-clone [asteriskfs]
     Started: [ pbx1vs3 pbx2vs3 ]
 Clone Set: asterisk-clone [asterisk]
     Started: [ pbx1vs3 pbx2vs3 ]
 fence_pbx2_xvm    (stonith:fence_xvm):    Started pbx2vs3
 fence_pbx1_xvm    (stonith:fence_xvm):    Started pbx1vs3
 Clone Set: clvmd-clone [clvmd]
     Started: [ pbx1vs3 pbx2vs3 ]

PCSD Status:
  pbx1vs3: Online
  pbx2vs3: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root at pbx1 ~]# pcs config show
Cluster Name: asteriskcluster
Corosync Nodes:
 pbx1vs3 pbx2vs3
Pacemaker Nodes:
 pbx1vs3 pbx2vs3

Resources:
 Clone: dlm-clone
  Meta Attrs: clone-max=2 clone-node-max=1 interleave=true
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Attributes: allow_stonith_disabled=false
   Operations: start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s on-fail=fence (dlm-stop-interval-0s)
               monitor interval=60s on-fail=fence (dlm-monitor-interval-60s)
 Clone: asteriskfs-clone
  Meta Attrs: interleave=true clone-max=2 clone-node-max=1
  Resource: asteriskfs (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/vg_san1/lv_pbx directory=/mnt/asterisk
fstype=gfs2
   Operations: start interval=0s timeout=60 (asteriskfs-start-interval-0s)
               stop interval=0s on-fail=fence (asteriskfs-stop-interval-0s)
               monitor interval=60s on-fail=fence
(asteriskfs-monitor-interval-60s)
 Clone: asterisk-clone
  Meta Attrs: interleaved=true sipp_monitor=/root/scripts/haasterisk.sh
sipp_binary=/usr/local/src/sipp-3.4.1/bin/sipp globally-unique=false
ordered=false interleave=true clone-max=2 clone-node-max=1 notify=true
  Resource: asterisk (class=ocf provider=heartbeat type=asterisk)
   Attributes: user=root group=root config=/mnt/asterisk/etc/asterisk.conf
sipp_monitor=/root/scripts/haasterisk.sh
sipp_binary=/usr/local/src/sipp-3.4.1/bin/sipp maxfiles=65535
   Operations: start interval=0s timeout=40s (asterisk-start-interval-0s)
               stop interval=0s on-fail=fence (asterisk-stop-interval-0s)
               monitor interval=10s (asterisk-monitor-interval-10s)
 Clone: clvmd-clone
  Meta Attrs: clone-max=2 clone-node-max=1 interleave=true
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Operations: start interval=0s timeout=90 (clvmd-start-interval-0s)
               monitor interval=30s on-fail=fence
(clvmd-monitor-interval-30s)
               stop interval=0s on-fail=fence (clvmd-stop-interval-0s)

Stonith Devices:
 Resource: fence_pbx2_xvm (class=stonith type=fence_xvm)
  Attributes: port=tegamjg_pbx2 pcmk_host_list=pbx2vs3
  Operations: monitor interval=60s (fence_pbx2_xvm-monitor-interval-60s)
 Resource: fence_pbx1_xvm (class=stonith type=fence_xvm)
  Attributes: port=tegamjg_pbx1 pcmk_host_list=pbx1vs3
  Operations: monitor interval=60s (fence_pbx1_xvm-monitor-interval-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
  start fence_pbx1_xvm then start fence_pbx2_xvm (kind:Mandatory)
(id:order-fence_pbx1_xvm-fence_pbx2_xvm-mandatory)
  start fence_pbx2_xvm then start dlm-clone (kind:Mandatory)
(id:order-fence_pbx2_xvm-dlm-clone-mandatory)
  start dlm-clone then start clvmd-clone (kind:Mandatory)
(id:order-dlm-clone-clvmd-clone-mandatory)
  start clvmd-clone then start asteriskfs-clone (kind:Mandatory)
(id:order-clvmd-clone-asteriskfs-clone-mandatory)
  start asteriskfs-clone then start asterisk-clone (kind:Mandatory)
(id:order-asteriskfs-clone-asterisk-clone-mandatory)
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY)
(id:colocation-clvmd-clone-dlm-clone-INFINITY)
  asteriskfs-clone with clvmd-clone (score:INFINITY)
(id:colocation-asteriskfs-clone-clvmd-clone-INFINITY)
  asterisk-clone with asteriskfs-clone (score:INFINITY)
(id:colocation-asterisk-clone-asteriskfs-clone-INFINITY)

Resources Defaults:
 migration-threshold: 2
 failure-timeout: 10m
 start-failure-is-fatal: false
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: asteriskcluster
 dc-version: 1.1.13-10.el7_2.2-44eb2dd
 have-watchdog: false
 last-lrm-refresh: 1468598829
 no-quorum-policy: ignore
 stonith-action: reboot
 stonith-enabled: true

Now my problem is that, for example, when i fence one of the nodes, the
other one restarts every clone resource and start them back again, same
thing happens when i stop pacemaker and corosync in one node only (pcs
cluster stop). That would mean that if i have a problem in one of my
Asterisk (for example in DLM resource or CLVMD) that would require fencing
right away, for example node pbx2vs3, the other node (pbx1vs3) will restart
every service which will drop all my calls in a well functioning node. To
be even more general, this happens every time a resource needs stop/start
or restart on any node it requires to be done on every node in the cluster.

All this leads to a basic question, is this a strict way for clone
resources to behave?, is it possible to configure them so they would
behave, dare i say, in a more unique way (i know about the option
globally-unique but as far as i understand that doesnt do the work). I have
been reading about clone resources for a while but there are no many
examples about what it cant do.

There are some meta operations that doesnt make sense, sorry about that,
the problem is that i dont know how to delete them with PCSD :). Now, I
found something interesting about constraint ordering with clone resources
in "Pacemaker Explained" documentation, which describes something like this:

*"<constraints><rsc_location id="clone-prefers-node1" rsc="apache-clone"
node="node1" score="500"/><rsc_colocation id="stats-with-clone"
rsc="apache-stats" with="apache-clone"/><rsc_order
id="start-clone-then-stats" first="apache-clone"
then="apache-stats"/></constraints>""Ordering constraints behave slightly
differently for clones. In the example above, apache-stats willwait until
all copies of apache-clone that need to be started have done so before
being started itself.Only if no copies can be started will apache-stats be
prevented from being active. Additionally, theclone will wait for
apache-stats to be stopped before stopping itself".*
I am not sure if that has something to do with it, but i cannot destroy the
whole cluster to test it and probably in vain.

Thank you very much. Regards

Alejandro
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160716/c7bb63b2/attachment-0002.html>