[ClusterLabs] Two node Active/Active Asterisk+GFS2+DLM+fence_xvm Cluster

Fri Jul 15 17:32:10 UTC 2016

Hi

Thank you very much for your quick answer, i didnt put the whole
configuration because i though that maybe is a limitation of clone
resources since it happens in any start/restart operation and when a node
or a resource of a node has any problem. Also all my clone resources has
interleave=true specify.

My whole configuration is this one:

Stack: corosync
Current DC: pbx2vs3 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with
quorum
2 nodes and 10 resources configured

Online: [ pbx1vs3 pbx2vs3 ]

Full list of resources:

 Clone Set: dlm-clone [dlm]
     Started: [ pbx1vs3 pbx2vs3 ]
 Clone Set: asteriskfs-clone [asteriskfs]
     Started: [ pbx1vs3 pbx2vs3 ]
 Clone Set: asterisk-clone [asterisk]
     Started: [ pbx1vs3 pbx2vs3 ]
 fence_pbx2_xvm    (stonith:fence_xvm):    Started pbx2vs3
 fence_pbx1_xvm    (stonith:fence_xvm):    Started pbx1vs3
 Clone Set: clvmd-clone [clvmd]
     Started: [ pbx1vs3 pbx2vs3 ]

PCSD Status:
  pbx1vs3: Online
  pbx2vs3: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root at pbx1 ~]# pcs config show
Cluster Name: asteriskcluster
Corosync Nodes:
 pbx1vs3 pbx2vs3
Pacemaker Nodes:
 pbx1vs3 pbx2vs3

Resources:
 Clone: dlm-clone
  Meta Attrs: clone-max=2 clone-node-max=1 interleave=true
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Attributes: allow_stonith_disabled=false
   Operations: start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s on-fail=fence (dlm-stop-interval-0s)
               monitor interval=60s on-fail=fence (dlm-monitor-interval-60s)
 Clone: asteriskfs-clone
  Meta Attrs: interleave=true clone-max=2 clone-node-max=1
  Resource: asteriskfs (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/vg_san1/lv_pbx directory=/mnt/asterisk
fstype=gfs2
   Operations: start interval=0s timeout=60 (asteriskfs-start-interval-0s)
               stop interval=0s on-fail=fence (asteriskfs-stop-interval-0s)
               monitor interval=60s on-fail=fence
(asteriskfs-monitor-interval-60s)
 Clone: asterisk-clone
  Meta Attrs: interleaved=true sipp_monitor=/root/scripts/haasterisk.sh
sipp_binary=/usr/local/src/sipp-3.4.1/bin/sipp globally-unique=false
ordered=false interleave=true clone-max=2 clone-node-max=1 notify=true
  Resource: asterisk (class=ocf provider=heartbeat type=asterisk)
   Attributes: user=root group=root config=/mnt/asterisk/etc/asterisk.conf
sipp_monitor=/root/scripts/haasterisk.sh
sipp_binary=/usr/local/src/sipp-3.4.1/bin/sipp maxfiles=65535
   Operations: start interval=0s timeout=40s (asterisk-start-interval-0s)
               stop interval=0s on-fail=fence (asterisk-stop-interval-0s)
               monitor interval=10s (asterisk-monitor-interval-10s)
 Clone: clvmd-clone
  Meta Attrs: clone-max=2 clone-node-max=1 interleave=true
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Operations: start interval=0s timeout=90 (clvmd-start-interval-0s)
               monitor interval=30s on-fail=fence
(clvmd-monitor-interval-30s)
               stop interval=0s on-fail=fence (clvmd-stop-interval-0s)

Stonith Devices:
 Resource: fence_pbx2_xvm (class=stonith type=fence_xvm)
  Attributes: port=tegamjg_pbx2 pcmk_host_list=pbx2vs3
  Operations: monitor interval=60s (fence_pbx2_xvm-monitor-interval-60s)
 Resource: fence_pbx1_xvm (class=stonith type=fence_xvm)
  Attributes: port=tegamjg_pbx1 pcmk_host_list=pbx1vs3
  Operations: monitor interval=60s (fence_pbx1_xvm-monitor-interval-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
  start fence_pbx1_xvm then start fence_pbx2_xvm (kind:Mandatory)
(id:order-fence_pbx1_xvm-fence_pbx2_xvm-mandatory)
  start fence_pbx2_xvm then start dlm-clone (kind:Mandatory)
(id:order-fence_pbx2_xvm-dlm-clone-mandatory)
  start dlm-clone then start clvmd-clone (kind:Mandatory)
(id:order-dlm-clone-clvmd-clone-mandatory)
  start clvmd-clone then start asteriskfs-clone (kind:Mandatory)
(id:order-clvmd-clone-asteriskfs-clone-mandatory)
  start asteriskfs-clone then start asterisk-clone (kind:Mandatory)
(id:order-asteriskfs-clone-asterisk-clone-mandatory)
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY)
(id:colocation-clvmd-clone-dlm-clone-INFINITY)
  asteriskfs-clone with clvmd-clone (score:INFINITY)
(id:colocation-asteriskfs-clone-clvmd-clone-INFINITY)
  asterisk-clone with asteriskfs-clone (score:INFINITY)
(id:colocation-asterisk-clone-asteriskfs-clone-INFINITY)

Resources Defaults:
 migration-threshold: 2
 failure-timeout: 10m
 start-failure-is-fatal: false
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: asteriskcluster
 dc-version: 1.1.13-10.el7_2.2-44eb2dd
 have-watchdog: false
 last-lrm-refresh: 1468598829
 no-quorum-policy: ignore
 stonith-action: reboot
 stonith-enabled: true

There are some meta operations that doesnt make sense, sorry about that,
the problem is that i dont know how to delete them with PCSD :). Now, I
found something interesting about constraint ordering with clone resources
in "Pacemaker Explained" documentation, which describes something like this:

*"<constraints><rsc_location id="clone-prefers-node1" rsc="apache-clone"
node="node1" score="500"/><rsc_colocation id="stats-with-clone"
rsc="apache-stats" with="apache-clone"/><rsc_order
id="start-clone-then-stats" first="apache-clone"
then="apache-stats"/></constraints>""Ordering constraints behave slightly
differently for clones. In the example above, apache-stats willwait until
all copies of apache-clone that need to be started have done so before
being started itself.Only if no copies can be started will apache-stats be
prevented from being active. Additionally, theclone will wait for
apache-stats to be stopped before stopping itself".*
I am not sure if that has something to do with it, but i cannot destroy the
whole cluster to test it and probably in vain.

Thank you very much again. Regards

Alejandro

2016-07-15 3:35 GMT-04:00 Kristoffer Grönlund <kgronlund at suse.com>:

> TEG AMJG <tegamjg at gmail.com> writes:
>
> > Dear list
> >
> > I am quite new to PaceMaker and i am configuring a two node active/active
> > cluster which consist basically on something like this:
> >
> > I am using pcsd Pacemaker/Corosync:
> >
> >  Clone Set: dlm-clone [dlm]
> >      Started: [ pbx1vs3 pbx2vs3 ]
> >  Clone Set: asteriskfs-clone [asteriskfs]
> >      Started: [ pbx1vs3 pbx2vs3 ]
> >  Clone Set: asterisk-clone [asterisk]
> >      Started: [ pbx1vs3 pbx2vs3 ]
> >  fence_pbx2_xvm    (stonith:fence_xvm):    Started pbx1vs3
> >  fence_pbx1_xvm    (stonith:fence_xvm):    Started pbx2vs3
> >  Clone Set: clvmd-clone [clvmd]
> >      Started: [ pbx1vs3 pbx2vs3]
> >
> > Now my problem is that, for example, when i fence one of the nodes, the
> > other one restarts every clone resource and start them back again, same
> > thing happens when i stop pacemaker and corosync in one node only (pcs
> > cluster stop). That would mean that if i have a problem in one of my
> > Asterisk (for example in DLM resource or CLVMD) that would require
> fencing
> > right away, for example node pbx2vs3, the other node (pbx1vs3) will
> restart
> > every service which will drop all my calls in a well functioning node.
>
> The pcsd output doesn't really give any hint as to what your
> configuration looks like, but it sounds like the issue may be not setting
> interleave=true for a clone which other resources depend on. See this
> article for more information:
>
>
> https://www.hastexo.com/resources/hints-and-kinks/interleaving-pacemaker-clones/
>
> Cheers,
> Kristoffer
>
> --
> // Kristoffer Grönlund
> // kgronlund at suse.com
>

-- 
-
Saludos a todos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160715/55fc6fe4/attachment-0002.html>