[ClusterLabs] Antw: [EXT] LVM and Filesystem resources ‑ ordering and starting/stopping as a unit

Fri Nov 5 04:01:25 EDT 2021

>>> "Neitzert, Greg A" <greg.neitzert at unisys.com> schrieb am 05.11.2021 um
06:28 in
Nachricht
<DM8PR07MB8854271EEC0543389891DC10888E9 at DM8PR07MB8854.namprd07.prod.outlook.com>

> Hello,
> 
> With a Pacemaker 1.1.13/Corosync 2.3.5 cluster is it possible to define a 
> relationship between two resources so that:
> 
> 1.      B depends on A (a normal order constraint)
> 
> AND
> 
> 2.      If either fails, they both need to be stopped and restarted, in the

> order defined above (B stops, A stops, A starts, then B starts)

That would mean A depends on B, and B depends on A, probably hard to start
;-)

Reading the scenario below, I must admit that I don't fully understand the
problem:
If SAN "seeds in" an error to LVM, and then FS (and eventually to the
application), what will be the net effect?
Filesystem hanging? FS seeing write errors? etc.?

How would a recovery be done? Fix the SAN? Just wait?

Could it be that your multipath configuration is simply done wrong?

regards,
Ulrich

> 
> 
> 
> In the normal configuration, if A fails, then A and B will be restarted, 
> because B depends on A.  However, if B fails, only B is restarted because A

> does not depend on it.  In most cases this is going to be fine, but we have
a 
> case where in some situations B is failing precisely because A above it is 
> having a failure (but we don't know it yet).
> 
> 
> 
> The order attribute takes care of the ordering of the start/stop (along with

> adding colocation so they stay on the same node).
> 
> 
> 
> The problem I am trying to address is the case where the monitor for B fires

> first, and B is attempted to be restarted, but it won't work until A is.
> 
> 
> 
> Case in point, LVM and Filesystem2 resources.
> 
> 
> 
> If LVM needs to be refreshed, the Filesystem above it stops working (e.g. 
> I/O fails).  However, Filesystem noticed a problem first, and LVM didn't
have 
> a chance to see it also had a problem.  Therefore, Filesystem will try to 
> restart itself until it exhausts its retries.  At that point, a cleanup is 
> required to get things going again, and LVM has to be manually restarted.
> 
> 
> 
> We have a case where the LVM cache needs to be refreshed and the volumes 
> reactivated to clear up a problem caused by paths going down and coming back

> up in a SAN causing the LVM VG to get in a compromised state, and the LVM 
> problem causes the Filesystem I/O to fail, and Filesystem notices first, 
> monitor fails, it stops itself, and tries in vain to restart, because it
will 
> not until the LVM resource is restarted.
> 
> 
> 
> I made the monitor interval longer for Filesystem than LVM which makes LVM 
> find the problem first, but that isn't foolproof.
> 
> If it was a rule that if a Filesystem resource needs to be stopped and 
> started that the LVM resource it depends on has to be restarted first, I 
> should be able to avoid the problem entirely.
> 
> 
> 
> In essence, what I'm asking is if I can make two resource start and stop in

> a particular order, but also define that if one has to be started or stopped

> the other must as well (in my defined order).
> 
> 
> 
> Thanks.
> 
> 
> 
> Greg Neitzert | Lead Software Engineer | RTC Software Engineering 2B ‑ 
> Middleware
> 
> Unisys Corp