[ClusterLabs] LVM and Filesystem resources - ordering and starting/stopping as a unit

Neitzert, Greg A greg.neitzert at unisys.com
Fri Nov 5 01:28:12 EDT 2021


Hello,

With a Pacemaker 1.1.13/Corosync 2.3.5 cluster is it possible to define a relationship between two resources so that:

1.      B depends on A (a normal order constraint)

AND

2.      If either fails, they both need to be stopped and restarted, in the order defined above (B stops, A stops, A starts, then B starts)



In the normal configuration, if A fails, then A and B will be restarted, because B depends on A.  However, if B fails, only B is restarted because A does not depend on it.  In most cases this is going to be fine, but we have a case where in some situations B is failing precisely because A above it is having a failure (but we don't know it yet).



The order attribute takes care of the ordering of the start/stop (along with adding colocation so they stay on the same node).



The problem I am trying to address is the case where the monitor for B fires first, and B is attempted to be restarted, but it won't work until A is.



Case in point, LVM and Filesystem2 resources.



If LVM needs to be refreshed, the Filesystem above it stops working (e.g. I/O fails).  However, Filesystem noticed a problem first, and LVM didn't have a chance to see it also had a problem.  Therefore, Filesystem will try to restart itself until it exhausts its retries.  At that point, a cleanup is required to get things going again, and LVM has to be manually restarted.



We have a case where the LVM cache needs to be refreshed and the volumes reactivated to clear up a problem caused by paths going down and coming back up in a SAN causing the LVM VG to get in a compromised state, and the LVM problem causes the Filesystem I/O to fail, and Filesystem notices first, monitor fails, it stops itself, and tries in vain to restart, because it will not until the LVM resource is restarted.



I made the monitor interval longer for Filesystem than LVM which makes LVM find the problem first, but that isn't foolproof.

If it was a rule that if a Filesystem resource needs to be stopped and started that the LVM resource it depends on has to be restarted first, I should be able to avoid the problem entirely.



In essence, what I'm asking is if I can make two resource start and stop in a particular order, but also define that if one has to be started or stopped the other must as well (in my defined order).



Thanks.



Greg Neitzert | Lead Software Engineer | RTC Software Engineering 2B - Middleware

Unisys Corp



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20211105/8ce11864/attachment-0001.htm>


More information about the Users mailing list