[Pacemaker] Asymmetric cluster, clones, and location constraints

Tue Oct 22 17:19:11 EDT 2013

I am getting rather unexpected behavior when I combine clones, location
constraints, and remote nodes in an asymmetric cluster.  My cluster is
configured to be asymmetric, distinguishing between vmhosts and various
sorts of remote nodes.  Currently I am running upstream version b6d42ed.  I
am simplifying my description to avoid confusion, hoping in so doing I
don't miss any salient points...

My physical cluster nodes, also the VM hosts, have the attribute
"nodetype=vmhost".  They also have Infiniband interfaces, which take some
time to come up.  I don't want my shared file system (which needs IB), or
libvirtd (which needs the file system), to come up before IB...  So I have
this in my configuration:

primitive p-watch-ib0 ocf:heartbeat:ethmonitor \
    params \
        interface="ib0" \
    op monitor timeout="100s" interval="10s"
clone c-watch-ib0 p-watch-ib0 \
    meta interleave="true"
#
location loc-watch-ib-only-vmhosts c-watch-ib0 \
    rule 0: nodetype eq "vmhost"

Something broke between upstream versions 0a2570a and c68919f -- the
c-watch-ib0 clone never starts.  I've found that if I run "crm_resource
--force-start -r p-watch-ib0" when IB is running, the ethmonitor-ib0
attribute is not set like it used to be.  Oh well, I can set it manually.
 So let's.

We use GPFS for a shared file system, so I have an agent to start it and
wait for a file system to mount.  It should only run on VM hosts, and only
when IB is running.  So I have this:

primitive p-fs-gpfs ocf:ccni:gpfs \
    params \
        fspath="/gpfs/lb/utility" \
    op monitor timeout="20s" interval="30s" \
    op start timeout="180s" \
    op stop timeout="120s"
clone c-fs-gpfs p-fs-gpfs \
    meta interleave="true"
location loc-fs-gpfs-needs-ib0 c-fs-gpfs \
    rule -inf: not_defined "ethmonitor-ib0" or "ethmonitor-ib0" eq 0
location loc-fs-gpfs-on-vmhosts c-fs-gpfs \
    rule 0: nodetype eq "vmhost"

That all used to start nicely.  Now even if I set the ethmonitor-ib0
attribute, it doesn't.  However, I can use "crm_resource --force-start -r
p-fs-gpfs" on each of my VM hosts, then issue "crm resource cleanup
c-fs-gpfs", and all is well.  I can use "crm status" to see something like:

Last updated: Tue Oct 22 16:35:43 2013
Last change: Tue Oct 22 15:50:52 2013 via crmd on cvmh01
Stack: cman
Current DC: cvmh04 - partition with quorum
Version: 1.1.10-19.el6.ccni-b6d42ed
8 Nodes configured
92 Resources configured

Online: [ cvmh01 cvmh02 cvmh03 cvmh04 ]

 fence-cvmh01   (stonith:fence_ipmilan):        Started cvmh04
 fence-cvmh02   (stonith:fence_ipmilan):        Started cvmh01
 fence-cvmh03   (stonith:fence_ipmilan):        Started cvmh01
 fence-cvmh04   (stonith:fence_ipmilan):        Started cvmh01
Clone Set: c-fs-gpfs [p-fs-gpfs]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]

which is what I would expect (other than I expect pacemaker to have started
these for me, like it used to).

Now I also have clone resources to NFS-mount another file system, and
actually do a bind mount out of the GPFS file system, which behave like the
GPFS resource -- they used to just work, now I need to use "crm_resource
--force-start" and clean up.  That finally lets me start libvirtd, using
this configuration:

primitive p-libvirtd lsb:libvirtd \
    op monitor interval="30s"
clone c-p-libvirtd p-libvirtd \
    meta interleave="true"
order o-libvirtd-after-storage inf: \
    ( c-fs-libvirt-VM-xcm c-fs-bind-libvirt-VM-cvmh ) \
    c-p-libvirtd
location loc-libvirtd-on-vmhosts  c-p-libvirtd \
    rule 0: nodetype eq "vmhost"

Of course that used to just work, but now, like the other clones, I need to
force-start libvirtd on the VM hosts, and clean up.  Once I do that, all my
VM resources, which are not clones, just start up like they are supposed
to!  Several of these are configured as remote nodes, and they have
services configured to run in them.  But now other strange things happen:

Last updated: Tue Oct 22 16:46:29 2013
Last change: Tue Oct 22 15:50:52 2013 via crmd on cvmh01
Stack: cman
Current DC: cvmh04 - partition with quorum
Version: 1.1.10-19.el6.ccni-b6d42ed
8 Nodes configured
92 Resources configured

ContainerNode slurmdb02:vm-slurmdb02: UNCLEAN (offline)
Online: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
Containers: [ db02:vm-db02 ldap01:vm-ldap01 ldap02:vm-ldap02 ]

 fence-cvmh01   (stonith:fence_ipmilan):        Started cvmh04
 fence-cvmh02   (stonith:fence_ipmilan):        Started cvmh01
 fence-cvmh03   (stonith:fence_ipmilan):        Started cvmh01
 fence-cvmh04   (stonith:fence_ipmilan):        Started cvmh01
 Clone Set: c-p-libvirtd [p-libvirtd]
     p-libvirtd (lsb:libvirtd): FAILED slurmdb02
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 ]
 Clone Set: c-watch-ib0 [p-watch-ib0]
     p-watch-ib0        (ocf::heartbeat:ethmonitor):    FAILED slurmdb02
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 ]
 Clone Set: c-fs-gpfs [p-fs-gpfs]
     p-fs-gpfs  (ocf::ccni:gpfs):       FAILED slurmdb02
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 ]
 vm-compute-test        (ocf::ccni:xcatVirtualDomain):  FAILED [ cvmh04
slurmdb0
2 ]
 vm-swbuildsl6  (ocf::ccni:xcatVirtualDomain):  FAILED slurmdb02
 vm-db02        (ocf::ccni:xcatVirtualDomain):  Started cvmh01
 vm-ldap01      (ocf::ccni:xcatVirtualDomain):  Started cvmh02
 vm-ldap02      (ocf::ccni:xcatVirtualDomain):  Started cvmh03
 p-postgres     (ocf::heartbeat:pgsql): FAILED [ db02 slurmdb02 ]
 p-mysql        (ocf::heartbeat:mysql): FAILED [ db02 slurmdb02 ]
 Clone Set: c-fs-share-config-data [fs-share-config-data]
     fs-share-config-data       (ocf::heartbeat:Filesystem):    FAILED
slurmdb02

     Stopped: [ cvmh01 cvmh02 cvmh03 cvmh04 db02 ldap01 ldap02 ]
 p-mysql-slurm  (ocf::heartbeat:mysql): FAILED slurmdb02
 p-slurmdbd     (ocf::ccni:SlurmDBD):   FAILED slurmdb02
 Clone Set: c-ldapagent [s-ldapagent]
     s-ldapagent        (ocf::ccni:WrapInitScript):     FAILED slurmdb02
     Stopped: [ cvmh01 cvmh02 cvmh03 cvmh04 db02 ldap01 ldap02 ]
 Clone Set: c-ldap [s-ldap]
     s-ldap     (ocf::ccni:WrapInitScript):     FAILED slurmdb02
     Started: [ ldap01 ldap02 ]
     Stopped: [ cvmh01 cvmh02 cvmh03 cvmh04 db02 ]

Now this is unexpected for a couple of reasons.  I do have constraints like:

location loc-vm-swbuildsl6 vm-swbuildsl6 \
        rule $id="loc-vm-swbuildsl6-rule" 0: nodetype eq vmhost
order o-vm-swbuildsl6 inf: c-p-libvirtd vm-swbuildsl6

And it is not the case that slurmdb02 has the vmhost attribute set; using
"crm_mon -o -1 -N -A" we see:

Node Attributes:
* Node cvmh01:
    + ethmonitor-ib0                    : 1
    + nodetype                          : vmhost
* Node cvmh02:
    + ethmonitor-ib0                    : 1
    + nodetype                          : vmhost
* Node cvmh03:
    + ethmonitor-ib0                    : 1
    + nodetype                          : vmhost
* Node cvmh04:
    + ethmonitor-ib0                    : 1
    + nodetype                          : vmhost
* Node db02:
* Node ldap01:
* Node ldap02:
* Node slurmdb02:

The results are unexpected to me also because I (perhaps naively) wouldn't
expect it to show me the new nodes on the "stopped" lines -- I kind of
expected a location rule to limit where clones would even be attempted.
 For example, with the rule limiting c-p-libvirtd to the vmhosts, I don't
really expect to be told that the clones are stopped on the remote VM nodes
db02, ldap01, and ldap02 (let alone be started on slurmdb02!).

Until I wrote this note, even the cloned ldap resource c-ldap needed to be
started using force-start.  Not sure why this time it started on its own...
 Perhaps this stack trace in the core dump pacemaker left on one of the VM
hosts has a clue?

#0  0x00007f121e9ac8e5 in raise () from /lib64/libc.so.6
#1  0x00007f121e9ae0c5 in abort () from /lib64/libc.so.6
#2  0x00007f121e9ea7f7 in __libc_message () from /lib64/libc.so.6
#3  0x00007f121e9f0126 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f121e9f05ad in malloc_consolidate () from /lib64/libc.so.6
#5  0x00007f121e9f33c5 in _int_malloc () from /lib64/libc.so.6
#6  0x00007f121e9f45e6 in calloc () from /lib64/libc.so.6
#7  0x00007f121e9e91ed in open_memstream () from /lib64/libc.so.6
#8  0x00007f121ea5ebdb in __vsyslog_chk () from /lib64/libc.so.6
#9  0x00007f121ea5f1b3 in __syslog_chk () from /lib64/libc.so.6
#10 0x00007f121e72b9fb in ?? () from /usr/lib64/libqb.so.0
#11 0x00007f121e72a6a2 in qb_log_real_va_ () from /usr/lib64/libqb.so.0
#12 0x00007f121e72a91d in qb_log_real_ () from /usr/lib64/libqb.so.0
#13 0x000000000042e994 in te_rsc_command (graph=0x20c7b40, action=0x23b0c90)
    at te_actions.c:412
#14 0x0000003a64404019 in initiate_action (graph=0x20c7b40) at graph.c:172
#15 fire_synapse (graph=0x20c7b40) at graph.c:211
#16 run_graph (graph=0x20c7b40) at graph.c:366
#17 0x000000000042f8cd in te_graph_trigger (user_data=<value optimized out>)
    at te_utils.c:331
#18 0x0000003a6202b283 in crm_trigger_dispatch (source=<value optimized
out>,
    callback=<value optimized out>, userdata=<value optimized out>)
    at mainloop.c:105
#19 0x00000038b3c38f0e in g_main_context_dispatch ()
   from /lib64/libglib-2.0.so.0
#20 0x00000038b3c3c938 in ?? () from /lib64/libglib-2.0.so.0
#21 0x00000038b3c3cd55 in g_main_loop_run () from /lib64/libglib-2.0.so.0
#22 0x00000000004058ee in crmd_init () at main.c:154
#23 0x0000000000405c2c in main (argc=1, argv=0x7fffdc207528) at main.c:121

Not sure how to take this further.  It has been difficult to characterize
what exactly is or isn't happening, and hopefully I've not left out some
critical detail.  Thanks.

/Lindsay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131022/bb7a79c3/attachment-0002.html>