[ClusterLabs] LVM and Filesystem resources - ordering and starting/stopping as a unit
Neitzert, Greg A
greg.neitzert at unisys.com
Fri Nov 5 01:28:12 EDT 2021
Hello,
With a Pacemaker 1.1.13/Corosync 2.3.5 cluster is it possible to define a relationship between two resources so that:
1. B depends on A (a normal order constraint)
AND
2. If either fails, they both need to be stopped and restarted, in the order defined above (B stops, A stops, A starts, then B starts)
In the normal configuration, if A fails, then A and B will be restarted, because B depends on A. However, if B fails, only B is restarted because A does not depend on it. In most cases this is going to be fine, but we have a case where in some situations B is failing precisely because A above it is having a failure (but we don't know it yet).
The order attribute takes care of the ordering of the start/stop (along with adding colocation so they stay on the same node).
The problem I am trying to address is the case where the monitor for B fires first, and B is attempted to be restarted, but it won't work until A is.
Case in point, LVM and Filesystem2 resources.
If LVM needs to be refreshed, the Filesystem above it stops working (e.g. I/O fails). However, Filesystem noticed a problem first, and LVM didn't have a chance to see it also had a problem. Therefore, Filesystem will try to restart itself until it exhausts its retries. At that point, a cleanup is required to get things going again, and LVM has to be manually restarted.
We have a case where the LVM cache needs to be refreshed and the volumes reactivated to clear up a problem caused by paths going down and coming back up in a SAN causing the LVM VG to get in a compromised state, and the LVM problem causes the Filesystem I/O to fail, and Filesystem notices first, monitor fails, it stops itself, and tries in vain to restart, because it will not until the LVM resource is restarted.
I made the monitor interval longer for Filesystem than LVM which makes LVM find the problem first, but that isn't foolproof.
If it was a rule that if a Filesystem resource needs to be stopped and started that the LVM resource it depends on has to be restarted first, I should be able to avoid the problem entirely.
In essence, what I'm asking is if I can make two resource start and stop in a particular order, but also define that if one has to be started or stopped the other must as well (in my defined order).
Thanks.
Greg Neitzert | Lead Software Engineer | RTC Software Engineering 2B - Middleware
Unisys Corp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20211105/8ce11864/attachment-0001.htm>
More information about the Users
mailing list