[ClusterLabs] resource fails manual failover

Tue Dec 12 08:50:32 EST 2023

Is there a detailed explanation for resource monitor and start timeouts and
intervals with examples, for dummies?

my resource configured s follows:
[root at lustre-mds1 ~]# pcs resource show MDT00
Warning: This command is deprecated and will be removed. Please use 'pcs
resource config' instead.
Resource: MDT00 (class=ocf provider=heartbeat type=Filesystem)
  Attributes: MDT00-instance_attributes
    device=/dev/mapper/mds00
    directory=/lustre/mds00
    force_unmount=safe
    fstype=lustre
  Operations:
    monitor: MDT00-monitor-interval-20s
      interval=20s
      timeout=40s
    start: MDT00-start-interval-0s
      interval=0s
      timeout=60s
    stop: MDT00-stop-interval-0s
      interval=0s
      timeout=60s

I issued manual failover with the following commands:
crm_resource --move -r MDT00 -H lustre-mds1

resource tried but returned back with the entries in pacemaker.log like
these:
Dec 12 15:53:23  Filesystem(MDT00)[1886100]:    INFO: Running start for
/dev/mapper/mds00 on /lustre/mds00
Dec 12 15:53:45  Filesystem(MDT00)[1886100]:    ERROR: Couldn't mount
device [/dev/mapper/mds00] as /lustre/mds00

tried again with the same result:
Dec 12 16:11:04  Filesystem(MDT00)[1891333]:    INFO: Running start for
/dev/mapper/mds00 on /lustre/mds00
Dec 12 16:11:26  Filesystem(MDT00)[1891333]:    ERROR: Couldn't mount
device [/dev/mapper/mds00] as /lustre/mds00

Why it cannot move?

Does this 20 sec interval (between start and error) have anything to do
with monitor interval settings?

[root at lustre-mgs ~]# pcs constraint show --full
Location Constraints:
  Resource: MDT00
    Enabled on:
      Node: lustre-mds1 (score:100) (id:location-MDT00-lustre-mds1-100)
      Node: lustre-mds2 (score:100) (id:location-MDT00-lustre-mds2-100)
    Disabled on:
      Node: lustre-mgs (score:-INFINITY)
(id:location-MDT00-lustre-mgs--INFINITY)
      Node: lustre1 (score:-INFINITY) (id:location-MDT00-lustre1--INFINITY)
      Node: lustre2 (score:-INFINITY) (id:location-MDT00-lustre2--INFINITY)
      Node: lustre3 (score:-INFINITY) (id:location-MDT00-lustre3--INFINITY)
      Node: lustre4 (score:-INFINITY) (id:location-MDT00-lustre4--INFINITY)
Ordering Constraints:
  start MGT then start MDT00 (kind:Optional) (id:order-MGT-MDT00-Optional)
  start MDT00 then start OST1 (kind:Optional) (id:order-MDT00-OST1-Optional)
  start MDT00 then start OST2 (kind:Optional) (id:order-MDT00-OST2-Optional)

with regards to ordering constraint: OST1 and OST2 are started now, while
I'm exercising MDT00 failover.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20231212/b6b9ccdc/attachment.htm>