[ClusterLabs] resource fails manual failover

Andrei Borzenkov arvidjaar at gmail.com
Tue Dec 12 09:03:03 EST 2023


On Tue, Dec 12, 2023 at 4:50 PM Artem <tyomikh at gmail.com> wrote:
>
> Is there a detailed explanation for resource monitor and start timeouts and intervals with examples, for dummies?
>
> my resource configured s follows:
> [root at lustre-mds1 ~]# pcs resource show MDT00
> Warning: This command is deprecated and will be removed. Please use 'pcs resource config' instead.
> Resource: MDT00 (class=ocf provider=heartbeat type=Filesystem)
>   Attributes: MDT00-instance_attributes
>     device=/dev/mapper/mds00
>     directory=/lustre/mds00
>     force_unmount=safe
>     fstype=lustre
>   Operations:
>     monitor: MDT00-monitor-interval-20s
>       interval=20s
>       timeout=40s
>     start: MDT00-start-interval-0s
>       interval=0s
>       timeout=60s
>     stop: MDT00-stop-interval-0s
>       interval=0s
>       timeout=60s
>
> I issued manual failover with the following commands:
> crm_resource --move -r MDT00 -H lustre-mds1
>
> resource tried but returned back with the entries in pacemaker.log like these:
> Dec 12 15:53:23  Filesystem(MDT00)[1886100]:    INFO: Running start for /dev/mapper/mds00 on /lustre/mds00
> Dec 12 15:53:45  Filesystem(MDT00)[1886100]:    ERROR: Couldn't mount device [/dev/mapper/mds00] as /lustre/mds00
>
> tried again with the same result:
> Dec 12 16:11:04  Filesystem(MDT00)[1891333]:    INFO: Running start for /dev/mapper/mds00 on /lustre/mds00
> Dec 12 16:11:26  Filesystem(MDT00)[1891333]:    ERROR: Couldn't mount device [/dev/mapper/mds00] as /lustre/mds00
>
> Why it cannot move?
>

Because it failed to start this resource on the node selected to run
this resource. Maybe the device is missing, maybe the mount point is
missing, maybe something else.

> Does this 20 sec interval (between start and error) have anything to do with monitor interval settings?
>
> [root at lustre-mgs ~]# pcs constraint show --full
> Location Constraints:
>   Resource: MDT00
>     Enabled on:
>       Node: lustre-mds1 (score:100) (id:location-MDT00-lustre-mds1-100)
>       Node: lustre-mds2 (score:100) (id:location-MDT00-lustre-mds2-100)
>     Disabled on:
>       Node: lustre-mgs (score:-INFINITY) (id:location-MDT00-lustre-mgs--INFINITY)
>       Node: lustre1 (score:-INFINITY) (id:location-MDT00-lustre1--INFINITY)
>       Node: lustre2 (score:-INFINITY) (id:location-MDT00-lustre2--INFINITY)
>       Node: lustre3 (score:-INFINITY) (id:location-MDT00-lustre3--INFINITY)
>       Node: lustre4 (score:-INFINITY) (id:location-MDT00-lustre4--INFINITY)
> Ordering Constraints:
>   start MGT then start MDT00 (kind:Optional) (id:order-MGT-MDT00-Optional)
>   start MDT00 then start OST1 (kind:Optional) (id:order-MDT00-OST1-Optional)
>   start MDT00 then start OST2 (kind:Optional) (id:order-MDT00-OST2-Optional)
>
> with regards to ordering constraint: OST1 and OST2 are started now, while I'm exercising MDT00 failover.
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/


More information about the Users mailing list